Expert.ai on the Edge- Our Tech on Small Devices

On October 20th, Join Antonio Linari as he discusses expert.ai on the edge: Our Tech on Small Devices.

Watch the recording to learn more about:

Green AI: how running expert.ai hybrid NLU is almost carbon free
Privacy: running your knowledge model directly on your device
Low Processing and Memory footprint: example running on edge devices
Demo of use expert.ai technology on edge devices

Trascript:

Brian Munz:

Hey everyone and welcome once again to the NLP Stream, which is a weekly or biweekly livestream that we hold here with Expert AI. As usual, I’m your host, Brian Munz. I’m a product manager and expert. And what we do is every week we have someone on to talk about something relevant to the world of NLP and AI and just to … it can be things that are academic, some things that are fun. Hopefully it’s always going to be interesting. This week we have something fun. It’s always fun when Antonio comes. He’s the head of innovation at Expert AI and so he’s kind of the person who’s always a mad scientist and whenever I meet with him, he’s got something cooking up that’s interesting. So I’m really looking forward to this week where he’s going to talk about kind of NLP with as it relates to small devices and IOT. So welcome, Antonio.

Antonio Linari:

Thank you, Brian. And good morning, good afternoon and good evening to everybody from all over the world. So I will share my screen and I will start a presentation. It’s going to be more like demo than a presentation, but let’s start with the sharing the screen first. Okay. You should be able to see my screen and this is my presentation. Hold on.

Brian Munz:

Oh yeah, I think we’re seeing your presenter view. There you go.

Antonio Linari:

Okay, sorry for that. So today we are going to talk about Expert AI on the edge. And the idea here is to show that nowadays one of the issue that we have, especially with deep learning and machine learning, is the resource consumption, that have consequences on the time you take for training the model and the multiple iteration that you have to go through and the impact that is these iterations have on the environment, okay?

I will follow what my colleagues talked about in the previous sessions. Okay. And one thing I want to show you is that basically our technology doesn’t rely on any GPUs and this means that it’s quite fast, I would say very fast. And it become some really one 10th of sometimes 100 times less than traditional deep learning and machine learning. You can see from this slide with enough public data set that respect to this and Roberta that are on the hype today as a transformers. We are in a magnitude, you see we are 1.5 watts per hour against 220 watts per hour of RoBERTa.

Okay, so let’s see how we can manage this problem from a carbon footprint perspective. So I will show you a real case scenario, okay? This is a real architecture generalized for email medical claims automation where we basically have a data uploader, we have a bucket where our files can end up and these kind of files can be pdf, can be images or can be audio. And the audio … sorry. These three type of files are collected by Lambda on AWS. That sends the file to the right processor according to the type of documents that we are processing. So if it’s a PDF, if it goes in the PDF converter. If it’s an image, it goes into an OCR. And if it’s audio go into ASR, automatic speech recognition through SQLs, so through the Q system of AWS.

The output of these three services is text, is a transcript or the conversion of the images that audio into text. And another Lambda will collect this text and we’ll send it to our platform that will process these texts. And using a specific knowledge model … in this case we are talking about medical claims automation, so everything related to injuries for example, or doctors, hospitals and so forth. And the output ends up into a knowledge store where we can use another Lambda to run some search. In our case today we are going to do just the document of visualization.

So I will stop for a moment the presentation and I will show you the architecture in place. So as you can see here, I have a bunch of terminal opens and this follows basically the workflow. I have a data upload on the left here, I have an ASR for automatic speech recognition. I have a PDF converter. I have an OCR based on … by the way, the ASR is based on OpenAI Whisper, but we have a partnership with APEC for professional use. And the OCR and PDF converter in this case is Pi Tesseract. But we have partnership with [inaudible 00:06:44] for professional use. And then there’s the expert AI pipeline. And everything here is an AWS architecture. So we have a here a Terraform script, a bunch of scripts that provision the full infrastructure into AWS.

So the input of our demo today is a bunch of emails from Gmail. Some of them contains attachment that are relevant for the claim management and some others are not relevant, they just not contain anything interesting. So what we are going to do is we are going to use RPA tool, specifically UiPath Studio to collect these emails, read the content, send the content to our platform to check if the content is about something that we are interested in, specifically medical claims, and if it’s related, we send the attachment to the platform for the workflow that we just saw. And then we can at that point inspect our documents with some annotations that our technology provided.

I will show you first the results just in case something goes wrong … hopefully no. So what we are going to see at the end is something like this where we have a bunch of document here and this document will be annotated this way, okay? You will see on the top, you see that is a voice type and type tells you what kind of entity is this one. You see we have this type medical expert. This comes from our medical claims knowledge model.

And you see here we can have audio as well, if I can play this audio.Okay. So just to show you that this is the transcription provided by OpenAI Whisper, pretty accurate. And you see on top of it the notations provided by our technology. And we have another example here of medical report. You see that in this case, for example, I have a medical facility, but I can have also you see here diagnosis, injuries and so forth.

Same applies to images. This is an image, it’s been transformed into PDF and then into something that can be annotated. You see patient here. Earl Gary as a person and Jamesburg is a geographical place, okay?

So let’s see the workflow in action. I will run UiPath here. Okay, so in this UiPath Studio project, what I do is I collect emails from Gmail, I send the contact to our service and the output of the service … sorry, the output of the natural language understanding service will provide some extractions. And if there are any extractions, you can see here … okay, you see here I got the extractions. And if there are any extractions, then I collect the attachment and I send the attachment to our platform for the workflow that we just saw. And at the end we will have a bunch of documents in our collection that we can actually visualize the way I should.

I will run this workflow through UiPath, and you will start seeing things happening on the services specifically on top. You see that the top window is moving because it’s receiving documents. The same happened to other. We have bunch of things happening and if we go back to our browser, we can start seeing new document adding to the pipeline. You see that now we have 2MD1 and still processing. Of course there’s … this audio is five minutes, so it’s going to take a little bit before, so we’re not going to see it. But the idea was to show you that the pipeline is actually working. You see this is the previous one and this is the new one.

They are basically the same document and you can see that they are both annotated. And if I reload this, see the number of duplicates, it’s increasing because of course I’m reposting basically the same documents. But this will show you that this workflow is actually doing what is supposed to do and is annotating as it’s supposed to do.

Let’s go back to the … oh, let’s try one thing first, just to be sure that we are actually doing the thing live. So I am removing this window and here I have a simple text. Hans Zimmer is a famous composer. I send it to our NLU platform. And you see I have the output here is basically the JSON that our platform generates after analyzing, in this case, a simple sentence. This is what actually goes into the pipeline. Of course the documents are bigger.

And another test that we can do is, for example, we can take another file, I will take an audio file … let me see here. Here I have sample zero. I will call it audio sample and I will send it to the service. Okay? You see that the audio as being correctly acquired and the pipeline is working here. Let’s see. Of course there’s the demo effect probably is that … so UiPath is completed. Let me see if we have this new document here. Unfortunately not, so I apologize for this. Yeah, something happened in the … let’s see if we can try again.

No issue at all. I mean, it happens. Okay, sorry for that. But you already seen working it on UiPath. And I want to show you now the presentation. I want to go back to the presentation, probably going to take a little bit. There’s a reason for this. Let me go back to the presentation. So actually what you just saw is not running on AWS, but it’s running on this … let me go back to the presentation. It’s running on this cluster. Okay? So everything that you saw was running on Raspberry Pi 4 with a gigabyte of RAM each consuming five watts.

You can see which services were running on each of the Raspberry Pi. And local stack is basically a docker that simulate AWS on the desktop. And so basically everything that was related to AWS was running in the Xavier. And I can actually show you the real infrastructure. I have my iPhone here. This is the infrastructure that was running the deck. Okay?

So of course because this runs on Arm64, then we can run it on an iPhone. And so the last things that I want to show you today is a very simple application that collect emails, the same emails that you show, collect emails from Gmail and run our technology directly on the iPhone without communicating with any external services. It’s just everything in the form and the knowledge model as well. And basically what it does is to the emails and gives you the emails you can be interested in more and signing the emails that you’re not really interested on or that can be potential spam or something like that. Let’s see if I can do this a little bit bigger. Yes I can. So you can see my phone better.

And this is the application, the knowledge model is being loaded and now the system is starting collecting messages. You can see that these labels that you can see on the bottom are coming from our topic. And I can filter by saying, “Okay, show me only topics I’m interested in,” so the green one. And you see that there is also a segment with the red green circle on the left represent the segment. And of course what I can do, I can inspect the email. And this is the same output that I showed you before when I called [inaudible 00:18:16] the full analysis of using our technology. Okay?

So that’s basically it. Thank you. I hope it was interesting. I apologize for that document. That probably is still in the line waiting for being process, but thank you very much.

Brian Munz:

No, thanks. This is very impressive and interesting. I mean, I think one thing it especially highlights is something you mentioned in the beginning, which is around carbon footprint and things, which as we know, machine learning and AI can have a pretty large one. And so I think what this highlights again is just using the proper amount of resources for the proper use case. Because I think often we will make a larger footprint to assume or to prepare for someone who has some sort of a huge project, when in reality there are much smaller projects that take up much less space, so using the proper amount for what you need.

Antonio Linari:

Correct. And we have customers right now that they are not only sensitive to the topic, the carbon-free topic, and there’s a lot of investments in ESG to reduce cost related to processing. But they are also interested in IOT and how to put the NLP in IOT. And the good news is that with our technology, you can actually do that. We actually run … I can show you. We run our technology on this small device called [inaudible 00:20:03], and it has 512 megabytes of RAM and this is something that is really hard to do with deep learning or machine learning. Even using super compact models or distilled models, you still struggle reaching the same level of accuracy of our technology.

Brian Munz:

So what’s an example of a use case that would … where that’s more suited to this type of edge computing as compared to speaking to an API or something in the cloud?

Antonio Linari:

Of course. One of the first concern is privacy, because for example, how our technology running directly on your iPhone, you don’t have to share with anyone what you want to do with your emails. If you have your own criteria that you want to just keep yourself for yourself, you just create your own taxonomy and you or your knowledge model, you run it directly on the iPhone and no one knows exactly what are you doing with your emails. So nothing exit now, and it’s shared with [inaudible 00:21:15].

Brian Munz:

Oh that makes … yeah, of course. And in that way it’s … because I know you always end up in conversations with companies where it’s a cloud product and kind of asking them to trust you that you’re not storing anything where in this case there’s no real … it’s all pretty secure.

And I think one thing also is that within terms of regulations, you’d never know what’s coming down the road governmentally or whatever it might be. So there may be more restrictions placed on usage and footprint for devices as well as privacy. So in that way, it’s an option that will eliminate that concern, right?

Antonio Linari:

Yeah. And I would add also that most of the cloud providers are offering Arm64 solutions. And you can actually see … I mean you saw in a very small device, the speed was pretty impressive, even for PDF conversion and OCR. Even those, there was no GPUs involved, because the video Xavier results only used for simulating AWS. That’s it. So on purpose, I didn’t want to use the Xavier, for example, for OpenAI Whisper, okay? So everything was running on CPU. And it’s more powerful and I would say also cheaper. Most AWS, for example, says that are like 20 to 30% cheaper than traditional EC2 business.

Brian Munz:

Yeah, no, exactly. I think it’s one of those things where nowadays the incentive … there’s a variety of incentives, and it makes good business too. Not just reducing your footprint for the ESG purposes, but also often it is cheaper, which is good.

But yeah, this was super interesting, so thanks for presenting. And again, hopefully you’ll be back in the future to show us more things that you’re cooking up, because I always like to see this kind of stuff. And obviously it’s always a nail biter when you’re working with the Raspberry Pis and stuff to see what’s going to happen, but it seems have gone off. So thanks again for coming.

Antonio Linari:

You’re welcome and thank you for having me. Have a good day.

Brian Munz:

Yeah, you too. So make sure to join us everyone next week, we have a presentation on conversational process automation, so I’ll be interested to hear what they have to say about that. But until then, thanks for joining and we’ll see you all next week.