Orchestrate all the Things - Machine learning at the edge: TinyML is getting big. Featuring Qualcomm Senior Director Evgeni Gousev, Neuton CTO Blair Newman and Google Staff Research Engineer Pete Warden

Starting point is 00:00:00 Welcome to the Orchestrate All the Things podcast. I'm George Amadiotis and we'll be connecting the dots together. Being able to deploy machine learning applications at the edge is the key to unlocking a multi-billion dollar market. TinyML is the art and science of producing machine learning models frugal enough to work at the edge and it's seeing rapid growth. Edge computing is booming. Although the definition of what constitutes edge computing is a bit fuzzy, να δουλεύει στον κέντρο και βλέπει γρήγορη ανάπτυξη. Το κερδοστάσμα στο κέντρο είναι σκοπιστικό. Επομένως, η δεφή του τι γίνεται στο κέντρο

Starting point is 00:00:28 είναι λίγο παχύσια, η ιδέα είναι απλή. Είναι σχετικό με το να πάρουμε το κερδοστάσμα από το κέντρο της δάσης και να το φέρουμε όσο κοντά στον όρο που είναι η αξία, όσο μπορεί. Είτε είναι ιότιτς ασύρθωσες, εργασίες όλων των είδους, δρόμους ή αυτονομικές οδηγίες, υπάρχει ένα σημαντικό πράγμα. Εκττός, τα δεδομένα που δημιουργούνται στον κέντρο χρησιμοποιούνται για να φέρουν εργασίες που δημιουργούνται από μοτοσυκλένες μαθηματικών.

Starting point is 00:00:51 Υπάρχει μόνο ένα πρόβλημα. Οι μοτοσυκλένες μαθηματικών δεν ήταν ποτέ σχεδιασμένες για να δημιουργηθούν στον κέντρο, όχι τώρα, τουλάχιστον. Εντώνε το TinyML. Το TinyML είναι αρκετά σχεδιασμένο ως ένας γρήγορος κόσμος της μαθηματικ της μαθηματοδύνασης και εργασίας, συμβαίνοντας σε χαρτιά, αλγόρθυνες και εργασία, που είναι ιδιωτικά σε εξαρτάσεις ασκηρίας δεξιού ασφαλής δεξιού, τυπικά σε μινι-ΒΑΤ και πίσω, και έτσι εξαρτάται σε πολλές χρήσεις χρήσης και προστατεύοντας εργασίες με μηχανή.

Starting point is 00:01:25 Ελπίζω ότι θα χαράστηκαν τα πρόγραμμα. cases and targeting battery-operated devices. I hope you will enjoy the podcast. If you like my work, you can follow Link Data Orchestration on Twitter, LinkedIn, and Facebook. My name is Inna. I'm with Newton. And let me introduce you to each other. So I'll start with my colleague, Blair Nguyen. He's a CTO of Newton. We also have Evgeny Gusev, Senior Director of Engineering with Qualcomm Research.

Starting point is 00:02:00 We have Pete Warden, Technical Lead of the TensorFlow Lite Micro with Google. And George Anadiotis from ZDNet will moderate our discussion. So, I'll pass the word to George and I hope that everyone will enjoy our session. Well, thank you very much, Ina. Well, first of all, good to see you, good to connect actually in person as it were. And thanks also for doing part of my job actually, which I usually am the one who does the introduction. So thanks for that. And well, thank you everyone for making the time to connect today.

Starting point is 00:02:41 And well, let me start by saying that the topic we're here to address is TinyAML. very briefly describe as miniature ML algorithms to do inference basically on the edge. I was very familiar with the idea. I was not familiar with the actual term of tiny ML. So, and as a kind of introduction, let's say, I would also mention this that while doing a little bit of background research, let's say, on TinyML, the organization and the events, which you know much much better than I do and hopefully you can introduce us to that, I also noticed this that what seems to me like an obvious point in TinyML is that basically it refers to inference. It doesn't seem to be referenced at all. So I wonder if this is obvious to you as well, and this is why

Starting point is 00:03:53 you missed it, or there is some other reason that you don't mention that. So anyone who would like to start? Feel free. I guess this question has come up a fair amount. For me, people ask about, hey, what about doing training on the edge in general, you know, not just for tiny ML, but also running machine learning on phones or, you know, other less tiny devices. And one of the big challenges is that we still generally need labeled data to do a lot of training and you don't get much labeled data. You don't get many labels coming out of the typical sorts of sensors that you have at

Starting point is 00:04:45 the edge. So there is some work around doing training on the edge, for example, with federated learning. And Google actually uses that for the keyboard to learn new words as they kind of emerge in the language without, you know, having to send all the data to the data center. But most use cases don't have anything like that level of labeling. So inference is definitely the most common use case for this. But yeah, it's a good thing to call out. And just to add to this, I think another reason is compute and memory resources because to be able to

Starting point is 00:05:45 do a training, you still need to have quite a bit of memory to store your data. As Pete said, the data has to be labeled, so you really need to have quite a bit of memory. There are some approaches that people are exploring. I mean, it's still in the research labs, like federated learning and aggregated learning. So basically, let's say you have about a thousand of devices collecting data and maybe five percent of them have enough computer memory power to be able to do training. And then you can share these models among all of those other tiny nodes as well. But I think it's probably also a matter of time. I mean, we are still at the very beginning

Starting point is 00:06:24 of TinyML. Obviously, inference is the first step to start. And then as these devices are getting smarter, as algorithms are getting smarter, new approaches are coming, new memory technologies are coming. So it's probably going to be within, I would say,

Starting point is 00:06:40 five years or so, we are going to see some examples of training at the edge, not just inference. Okay, yeah, thanks. And thanks for actually giving me more than what I asked for, because my original starting point was, well, it's not probably even possible to do training on the edge. So I kind of assumed like, okay, we're only talking about inference. Training is really out of the question, at least at this point. But thanks for providing a timeline because, well, also depending on your definition of the edge, actually, well, you could argue that,

Starting point is 00:07:15 well, maybe you can't obviously do on like tiny devices, you can't do training, but maybe, you know, small data centers close to the edge, you may call them edge or not. So, yeah. And actually talking about definition, that's a very interesting and very important question. I think when we started TinyML and the foundation, Peter and I and the whole committee, we spent quite a bit of time defining what actually Tiny is, because it is really dependent on many factors like what is the use case, what kind of device you are using, what kind of battery you are using.

Starting point is 00:07:52 So at the end of the day, what matters is really how these devices can be deployed in the field and how they're going to perform. And the bar for tiny is a little bit fuzzy in a way that it can be 1 milliwatt, it can be 10 milliwatt, it can be 100 milliwatt. And I think what we decided to do is just to push the limit of the technology, really define tiny as like in the milliwatt or below type of range. And that basically gives enough battery lifetime for devices to operate in real life. Because you can make some demos and show you can do like object detection at 100 milliwatt, but to be able to have it like in a battery operated device that is going to last, let's say, six months or a year, you really need to be in the milliwatt

Starting point is 00:08:43 or milliamps range. That's kind of the definition of all of this. And that gives you the whole continuum of machine learning. So you start from tiny and then there is an edge and there is an endpoint, an access point, and then you go to the cloud. That's kind of how people look at this thing now. And the boundary is not very clear between all of those unless you go to the cloud. OK, well, thanks. the cloud. Okay, well, thanks.

Starting point is 00:09:06 You already touched upon, well, two of the topics that I would like to expand on. So one, definitions, and two, kind of the organizational background and history of the organization. But actually, before we delve into those, I think it may be a good idea if we do like a kind of first round with everyone here.

Starting point is 00:09:27 We already had Dina give us some introductions, but I was wondering if everybody would like to say a few words basically about their motivation for being active in this space and what is the use cases that they see for their respective organizations. So if you'd like to take turns addressing that. I guess maybe I'll start. So I think maybe George, we had an opportunity to speak before, maybe it was another topic

Starting point is 00:10:04 or event, but it's a pleasure to meet you. Thank you for inviting myself as well as Pete and Evgeny as well. When it comes to myself, when it comes to TinyML, one of the things that I've always kind of lived by as it relates to not only, let's say, working with TinyML, but in general, is I really kind of operated under a premise that in order to be successful, you need to be able to bring your services to the fingertips of your customers. And if you're capable of bringing services to the fingertips of your customers, then you have a real chance of being successful.

Starting point is 00:10:44 And when you begin to think about TinyML, at least for myself, it's really one fundamental thing that we're looking to accomplish, right? We're looking to bridge the physical world, let's say, with the digital world, and in essence, bringing that technology extremely close to the customer. And that's something, at least for myself personally, that I'm extremely fascinated about. From an organizational perspective, we've really approached TinyML in a couple of different ways. I know that I'm sure you had an opportunity to kind of take a look at what are some of the objectives around TinyML. And one of the premier objectives or primary objectives is really to proliferate

Starting point is 00:11:26 TinyML throughout the industry. We're looking to be able to accomplish or have billions of intelligent devices that are out there, right? And in order to accomplish this, this means that we need to expand the footprint of machine learning just beyond the data scientists. And if we don't accomplish this, then being able to proliferate, you know, this intelligence across all of the various ecosystems that we're targeting, then we'll have very limited success. So one area where we're targeting is we're looking to say, you know, how can we make machine learning available to everyone in a real practical sense? So that's one of the areas that we're focused on.

Starting point is 00:12:10 And then the second area, which is maybe from an industry perspective, something that we've seen before is rather cyclical, where we begin to see that, you know, hardware begins to outpace and accelerate beyond software. So we're seeing that hardware is becoming more and more optimized and to some degree commoditized. And we're seeing that, let's say, from a machine learning perspective, it's kind of really trailing. And what this means is there's a lot of different techniques that we have to take in order to be able to enable some of these smaller devices to really take advantage of the intelligence that is out there. And some of the approaches that are being currently taken today in order to enable those devices, we began to take a little bit of a different approach where we're building our models for purpose, fit for purpose. So instead of, let's say, taking the approach of putting a square peg into a round hole in order to, let's say, ensure that a model can fit into a hardware device,

Starting point is 00:13:17 we're taking the approach where we're building all of our models specific to a given use case or specific to a given device or piece of hardware, again, enabling every device to be able to take advantage of machine learning. So, just to kind of give you just a little bit of a high level for me personally, I really enjoy bringing that intelligence to the customer's fingertips. And then our mission as an organization really kind of overlays this where we're enabling really everyone to be able to take advantage of TinyML. Okay. Thank you. Anyone else wants to take a look?

Starting point is 00:13:55 Just to follow on what Blair said, I think he mentioned several very interesting points. One is the hardware component. And because I'm from Qualcomm, I'm representing the hardware part of the equation. I think the hardware is quite important as a software. I think that's the key differentiation and the key value proposition of TinyML as an organization. We are an end-to-end solution to our customers. It starts from hardware and goes to algorithm software and use cases and the whole scale-up deployment.

Starting point is 00:14:32 So if you go back to the hardware question, which I'm again representing at the hardware company, and that's kind of what my team and I do for a living, is you can think about this. If you compare what you can accomplish in silicon today compared to, let's say, 20 years ago, one or two millimeters of silicon now is equivalent to a Pentium computer 20 years ago. You can think about this. A big desktop became one millimeter of silicon. It's an enormous amount of compute, and that's

Starting point is 00:15:06 all due to more slow and other types of scaling. So now you can think, like, you have so much compute in this very little and low-power device. What can you do with this? And that becomes really interesting. The algorithmic part, the software part,

Starting point is 00:15:22 you put all the things together, and it becomes a really cool thing. And then I think the Blair touch upon another point is the customer angle. When you start sharing this type of technology to customers, you see really their eyes open wide like, wow, you can do so much cool stuff with this little silicon, low cost, low power.

Starting point is 00:15:45 And that's very inspirational. That basically closes the whole loop, feedback loop. It becomes so positive in a way you create cool technology, but then you see this technology having impact on customers and customer customers. So that basically makes it so innovative in a way on the technology side, but also so impactful on the other end, but also so impactful on the customer end. So that's kind of what keeps us going there.

Starting point is 00:16:10 It's a very exciting field. Cool. Pete, you want to weigh in as well? Yeah. I mean, for me, it all goes back to a moment in 2014 when I first joined Google. And we had a startup that was required. And I was quite proud that we were able to fit models in like two megabytes. And I was feeling pretty good about that.

Starting point is 00:16:38 And then I talked to the team behind OK, and I'm going to pause Google, just pausing there so that nobody's devices go off. We actually end up calling it OKG when we're working with it in meetings just to avoid that. And they had a 13 kilobyte model that they were using to recognize that wake word running on the little always-on DSPs that exist on Android phones so that the main CPU wasn't burning battery listening out for that wake word. And that really blew my mind. The fact that you could do something

Starting point is 00:17:26 actually really useful in that smaller model. And it really got me thinking about all of the other applications that might be possible if we can run, especially all these new machine learning, deep learning approaches, convolutional networks and things in a footprint that small on these tiny, cheap, low power devices. And really, since then, it's been following that thread and talking to product teams, getting inspiration for people who want to do really interesting stuff and trying to figure out how we can actually make that happen. Okay, well, thanks because you actually all guys make it quite easy for me because you just mentioned something that I intended to ask you about.

Starting point is 00:18:24 So you mentioned model size and I want to tie that in to what was mentioned before about power basically. I mean, I do realize that whatever kind of criterion you use is going to be like constantly moving goalposts because of the hardware changes and the power requirements change and the model keeps changing all the time. But my question is, how would you define, and I know it's a fuzzy one, but how would you define what actually fits into the tiny ML definition? Is it below a certain power threshold? Is it below a certain model size? Or do you have a kind of way of figuring out what falls into this category?

Starting point is 00:19:09 Yeah, I can speak to the Qualcomm way of doing things. When we started this tiny ML project in Qualcomm in 2014, about the same time when Pete mentioned this, we looked at vision because for us vision was one of the most challenging use cases because for vision you typically need to process a lot of data, images of big size, and then the big models to do detection and so on. Typically in those days, and the camera itself consumes a lot of power, And typically in those days, to be able to do like a phase detection, for example,

Starting point is 00:19:50 it required like maybe half a watt of power end-to-end, the sensor and the processor, CPU and algorithms and everything. And we started to think like, why was that? Why is it so high? I mean, you had like always on touch technologies back then.

Starting point is 00:20:04 You had always on audio technologies. you had always-on touch technologies back then, you had always-on audio technologies, you had always-on inertia sensors on your phone already. Why was it so challenging? And we looked at the whole thing holistically and kind of started to do it from the algorithms and software and everything, and we got to the slow power numbers. But back to the question, back in those days, we debated quite a bit, a lot internally, like where the bar should be for this tiny, for the low power. And I think from the Qualcomm perspective, we adopted this number one milliamp. And the reason for this was quite simple, because when we talked to smartphone users back then, they allocated about like

Starting point is 00:20:47 one milliamp of power to all sensors for them to qualify as an always-on type of iterations. It can include all sensors on the phone. So it's an inertia sensor, light sensor, audio type of sensor, touch sensors, all of them combined should not consume more than like one milliamp of power. And we basically put the bar, let it be one milliwatt. But this again, this bar is somewhat artificial. It really depends on the use case because for some devices, your duty cycle, you have to use it 100% of the time. It's always on. For other devices, let's say you have a tiny camera in a retail store. You need to take images maybe only once an hour to process what is the state of the shelf. So in this case, the duty cycle is like 0.1%.

Starting point is 00:21:33 So the battery lifetime is going to be different. But basically, again, to answer the question, we thought that milliamp, milliwatt type of range gives you this all-in-one functionality. Because at the end of the day, it's the customer experience, consumer experience. You don't want to replace your batteries every week. You want to make sure that the final product can last for some time, like a year at least, or six months at least. So that's kind of where this number came from. To be able to do intelligence in a device that can operate in a battery, like small coil cell battery, for quite some time. And if you kind of back of the envelope estimation or calculation that gives you this, like a milliwatt type of range of power.

Starting point is 00:22:18 And then you constrain your system this way, so it can still be powered. But again, I should highlight again, this is very use case dependent. There's not one size fits all. Okay, thank you. You already referred a few times to when you initiated this tiny ML effort. So I was wondering, and I know there is an institute that you have founded and you also do events. So I wonder if you'd like to share a few words on the background. So who started and who has joined up to now and the activities you are taking. Sure. So Pete, do you want to go rewind the feedback?

Starting point is 00:23:08 Yeah, because one of the first people I met when I started getting interested in, hey, what else can we do with TinyML, with these tiny models other than doing these wake words was Evgeny with the Glance project from Qualcomm, which still is able to run a really low power image sensor plus image processing computer vision algorithms in kind of a one all in one package with like one milliwatt and it really impressed me Evgeny's vision you know it was like really inspiring to see this and so we we kept in touch and decided to put on a small conference together um and we rapidly discovered that more people than we could accommodate in the large Google meeting room that we'd organized for the first conference wanted to come. And so we had, when was that first TinyML conference, Evgeny? Yeah, I think you and I met in July of 2018

Starting point is 00:24:50 on the Google campus. And then we said, we need to do something about this TinyML thing. It's going to be huge. So we need to start building a small ecosystem around this. And then we called a couple of colleagues of ours, friends, and shared, socialized this idea with them. They said, yeah a couple of colleagues of ours, friends, and shared, socialized this idea with them. They said, yeah, it's a great idea, but we are skeptical that we

Starting point is 00:25:10 are going to get like 30 people in the room if you have this type of conference. And they kind of peed on it. I said, okay, let's give it a try. And then in like three or four months, we were over capacity and people were trying to get on kind of wait list and other things and we said wow this thing is going to be big and thanks to Google and Pete's friends we were able to host this event on the Google campus that was in February of the 19th I believe, the first one, we had about like 200 people and that's how the whole thing started, basically. And from the very beginning, Pete and I, we kind of put a little bit of groundwork, a foundation for this. And what we decided fundamentally, that what we are going to be

Starting point is 00:25:58 doing is going to be a nonprofit organization because this technology is really for the benefit of all 10 EML. And also fundamentally, it's end to end. It's open to everyone. It should be diverse because we are really very strong believers that any ecosystem is as strong as diversity is. You need to have like every single player in this ecosystem. There is no insignificant member, all of the significance. So that was the fundamental principle of this foundation to be really

Starting point is 00:26:32 open, non-profit, and a global community of people who do things together and drive this field together. So we are kind of two years old, but like huge momentum. We started with the summit. We've done like three summits since then. The last one we had in March of this year, it was a big success. We had over like 5,000 people register. So you can see the progression from two people to a group of people to about 100 people, 150 people at the first event, and then 500 people at the second event, and now we have 5,000. And we see a lot of momentum there. And I think there are some fundamental reasons for this.

Starting point is 00:27:18 It's not just like FIT and me and other members of the community pushing it. There is like a huge pull force. The pull force comes fundamentally from several angles, from several directions. One, this technology is really affordable in a financial way. To be able to develop your own tiny ML application, you can get a board. Let's say, for example, the recent one from Raspberry Pi. The whole board with silicon and everything cost $4. You don't really need to spend thousands of dollars on the cloud and other types of things or GPUs.

Starting point is 00:27:53 So that's one reason. So it's affordable. Second, it's really the low power and the battery-operated stuff. And the third one is the variety of use cases. Every person we talked to said, I can use cases. I mean, every person we talk to, they say, I can use this for this problem, to solve this problem. I can solve it for this problem. But like one recent example, we talked to our people, friends in Kenya.

Starting point is 00:28:16 So they developed this technology to monitor beehives in Kenya because apparently, I didn't know this, that honey business is a big part of their agriculture and by doing this by monitoring in real time and making it accessible they can save 20% of their revenues it's a big number but my point there that you can give this technology to people and they can start develop applications to solve their problems that's kind of what resonates really well so it's the affordability you can give this technology to people and they can start to develop applications to solve their problems.

Starting point is 00:28:45 That's kind of what resonates really well. So it's the affordability, it's the low-power component of this, and the variety of these use cases for all. So we basically created this platform for people to collaborate and to work together, and we see it's really coming to huge fruition now. And George, if I may, because since I'm kind of like the, I guess you might say the new kid on the block considering, you know, Pete and Evgeny have obviously done an exceptional job in not only, let's say, birthing the movement of TinyML,

Starting point is 00:29:22 but also being the tip of the spear and kind of creating a platform for everyone to kind of, you know, bring their thoughts, bring their ideas, bring their innovations to this particular community. And certainly without them leading the way, we probably, at least from a Newton perspective, we wouldn't be really having the opportunity to be here today. So one of the things that I do kind of want to add as it relates to the community, and I'm going to touch a little bit on your prior question as well, is that, you know, Pete and Evgeny have done an exceptional job, I think, as he mentioned, maybe initially starting out, we're kind of thinking, you know, hey, can we get, you know, five or 10 people in the room? And now, you know, there's 5,000 people that are, you know, actively engaged in the community. And at least for ourselves, you know,

Starting point is 00:30:27 they've created the opportunity to really allow for us to begin to kind of really put on display some of the things that we've been able to build upon, especially as it relates to TinyML. Now, one of your previous questions or your prior question, you mentioned, you know, where do you kind of begin to delineate a really defined TinyML, right? And historically, it's really kind of started, I guess you might say, you know, from a hardware perspective, and then everyone has kind of taken a step of, can you now begin to build a model that can, you know, integrate into the devices that are out there today that is enabling TinyML? So, Pete and Evgeny has kind of opened the door to this. And one of the things that we kind of like to think, at least for ourselves, is as we move forward is as they've opened the door, you know, we've kind of taken the approach as to, you know,

Starting point is 00:31:16 how can we now kick down the door and kind of ask people to begin to kind of think about things in a different way. So whether it's starting with your hardware and then building a model to integrate with your hardware, we've kind of taken a completely different approach where we kind of flip things on its head and said, well, now we're going to begin to look at how can you build a model that is fit for purpose for your hardware? So instead of starting with your hardware and then building the model to integrate with your hardware, how can you now build models that are fit for purpose for your particular use case? Because I think Evgeny has kind of mentioned there's a variety of different use cases that span across multiple different verticals. And in order to help TinyML continue to grow, our approach is we're building models

Starting point is 00:32:08 that are fit for purpose for every single use case. Instead of the other way around, where you're kind of having a hardware, then you're needing to take particular approaches in order to get your model to integrate with that hardware, we're taking a completely different approach where we're saying, hey, let's build your model fit for purpose for your individual use case and for your hardware along the way. So we're certainly appreciative to that both Pete and Jenny has, you know, started this movement. And, you know, obviously, you know, we're becoming more and more active in the community. And we're hoping to continue to kind of share our innovations as the technology in the community grows. Thank you and yes that was something that's going to be my next question actually so yeah fine so now that we have the history and definitions

Starting point is 00:32:59 part figured out let's say what about the practicalities? And also, all of you mentioned on some of the opportunities that bringing tiny ML to fruition can bring. But what about the challenges? And actually, what about some practical scenarios, let's say? So, you have a machine learning model, whether it's, I don't know, Vision or whatever else. What are the challenges that you're facing if you want to scale it down to be able to deploy it on low-powered devices on the edge? And is there a clear path that people can take to make that happen? Yeah, I think Pete mentioned one before, which is both a challenge and an opportunity.

Starting point is 00:33:49 As for all machine learning type of problems, it's data. You really need to have data. That's kind of a universal problem. And that also creates some interesting questions on the business side, like who owns the data, who is going to monetize the data. But really, you have data to develop your models. And that's kind of one of the challenges, which is universal for machine learning and IDML as well. And I see this more like an opportunity, and there are some companies who build their business models now around data and collecting data and doing data crowdsourcing

Starting point is 00:34:23 and develop models and so on. The other one, I would say, is more on the timescale, which is the adoption because we are still at the very beginning. Many people, especially end users, are not aware of the capabilities that Dynamo can offer. That's why in the first year and the second year, when we started the foundation, the mission was really to build awareness, not just in the technical community, but move it up the stack and show people that these types of applications are possible. Because for us, engineers and technologists, we have all the tendency to over-engineer

Starting point is 00:35:01 things. But when you talk to end users, you really need to, like what George mentioned, Blair mentioned too, you really need to understand what is good enough. Once you understand this, you can start to develop products and that goes to the practicality. You really need to connect the technology capabilities to what people need, what is good enough, and then boom, it goes. But to me, I would say it's the data and this how do you use it. You turn on your imagination and develop something for finding out. And that's I see more as an opportunity, not as a challenge. So Pete, from your angle, what are the big ones you see on your radar screen? Yeah, I mean, for me, it all comes back to finding the big use cases and finding, you

Starting point is 00:35:53 know, it still feels like in a lot of ways we're in that sort of space in the late 1970s with microcomputers where a whole bunch of nerds are really, really excited about the possibilities of these devices, but it's still unclear what the actual killer applications are going to be for this technology that we can see is coming. It's really obvious that it totally makes sense from all of these technical trends coming together. And we can see that it can be useful in all of these different ways. But it's that long process of product and customer discovery that has to happen before we have a large number of cast iron customer use cases.

Starting point is 00:37:02 That's the part where I feel like we're in this really interesting stage of those emerging yeah yeah go ahead i'm sorry uh and um just to kind of add on with what they both mentioned i think in in order for really for tiny ml to take off we we ultimately kind of have to complete that life cycle and for me me, what I mean by life cycle is, and I think they both touched on it, is really the implementation of use cases or enabling those use cases so that we can begin to drive adoption. And when I began to think about it, you know, we need to get those business users. We need to get, you know, those individuals that in their mind, they have those use cases.

Starting point is 00:37:50 How can we get them into the game and enable them to test out those hypotheses, implement those use cases, which will further drive adoption? option. And when I begin to think about that, when you say from a challenges perspective, I think there's still just some of the fundamental challenges that are out there, whether you're talking about AI or ML or tiny ML, is how do you enable those organizations or how do you enable those individuals who have those use cases? In their mind, they're saying, hey, tiny ML can really bring value, but do I have that data scientist on board? And then even if I do have that data scientist on board from an operational or tactical perspective, can I actually produce a solution that I can then actually test and validate? And then if we really wanted to take off, how do we then make sure that once something is in production, that the organizations can then drive value out of it? And that's one of the things that we've kind of continued to take a look at is how can we enable, let's say, the entire lifecycle?

Starting point is 00:38:58 How can we enable organizations that may have those use cases, but may not have that data scientists, right? Then let's say we do enable those organizations and how we're doing that is from an auto ML perspective. Now, how can we enable them to implement those use cases? And in this particular case, let's say producing models that can actually integrate into those microcontrollers so that they can test out their use cases. And then lastly, we want to actually then get those use cases into production because then once they're into production, then other organizations, because this is obviously a me too world, once they begin to see that other organizations are implementing TinyML from a production perspective, then adoption will begin to accelerate from there. And that's another area that we're also focused on in an area where we call, hey, you know, once something is in

Starting point is 00:39:52 production from a tiny ML perspective, you know, how can you validate that what you're actually predicting is actually accurate? How can you validate that you can actually keep that solution into or in a production state? So I think for me, when we begin to think about what are some of the barriers and challenges is that we really need to begin to look at how can we enable the entire lifecycle of TinyML? Starting first with enabling organizations and then once we're able to get them into production, ensuring that they actually can have that transparency and realize the value of the solution. I want to thank you all. If I may, because I think Pete brought up a very important question

Starting point is 00:40:39 of the killer app, but just to calibrate, TinyML is not a science fiction anymore. It's real and it's production. So TinyML is being shipped into tens of millions of devices now in all kinds of verticals, in audio, vision-based technologies, in industrial IoT, predictive maintenance, many, many examples. But I think what Pete pointed out is that the killer app is not there yet. I mean, the killer app is the one that is going to be everywhere, just like a smartphone. And again, just to calibrate, if you look at the smartphone,

Starting point is 00:41:10 technology-wise, what is smartphone? Smartphone is a processor and a touchscreen on display, right? So processors were known for quite a while. Touchscreen was known since mid-'80s. You can find it in your laptop, and then the displays were there. It was just the brilliance of people that brought all the three technologies together and boom, it became a big thing. Same thing with Uber. What is Uber? It's a smartphone, a GPS from the technology perspective.

Starting point is 00:41:38 It's super easy, but then it took like 10 years for people to realize from smartphone to Uber to meet the 2010s to make this happen. And it became a killer app. So I'm quite optimistic and positive this is going to happen in just a matter of time because TinyML is fundamentally a game-changing technology. In addition to power and cost, I think we didn't mention one more very huge differentiation of TinyML. It's privacy. TinyML technology allows to do analytics on a device without transmitting data to the cloud, without sending any kind of raw data. All you get is metadata. You put in audio type of sensors like what Pete mentioned, keyword detection.

Starting point is 00:42:23 You're not recording your room like what Alexa and other type of devices do. All you do, you're just always on in the room. You just listen. If something happens, you send a trigger like, hey, somebody's in the room. Or there is a glass breakage. Or somebody is using a saw to cut trees in the forest. The same also with the vision. People are scared of cameras for a reason.

Starting point is 00:42:52 And a tiny ML vision allows you to do analytics without sending images. And this is a huge value proposition. Again, I'm very positive and very optimistic that the killer use case is going to come. But what we see today, we see a lot of smaller business cases and products bubbling up. And I think it's just again a matter of time when we have a critical mass and then we'll see this explosion to a trillion of devices probably within a decade or so. Thank you. I was just going to say earlier that I think you all approached it very correctly, actually, from a broad perspective and referring to things such as universal data science and machine learning challenges, such as finding the right data or finding the right expertise in your organization and finding use cases, actually, which you just referred to, which are all totally valid points. And I have to add that as an engineer myself, I really appreciate the fact that you went out of your engineer shoes and approached it in a different way.

Starting point is 00:43:58 However, now I would like to ask you to actually geek out a little bit because what I had in mind was a kind of more specific scenario. So suppose you have all of those challenges somehow figured out in an organization, right? So you have your expertise, you have your data scientists, you have your models, you have actually trained. You have your data and you have actually trained a model. It has converged

Starting point is 00:44:25 and it works perfectly for your criteria. I was wondering what would the specific challenge be and if there is like a clear pathway to take that model, which is potentially a very large one, and deploy it on a device on the edge, which may have trouble accommodating that in terms of size, in terms of power consumption. So is there some way to cut that model down and make it deployable on a device on the edge? Is there a clear path in doing that? Is there some techniques you're investigating to make that happen? Pete is a world expert. Pete, you're Pete actually, just to give a little introduction here, Pete is the

Starting point is 00:45:11 first guy who inspired the whole global community into these techniques. I remember it was 2018, I think, or 2019 when Pete showed that you can he'll probably talk about this too, you can quantize your model and you can, he's probably going to talk about this too, you can quantize your model and you can run a 13-bit model on 8-bits

Starting point is 00:45:30 without compromising the accuracy much. And it was like an eye-opening experience for the whole community. Like, hey, you can train your model on the regular floating point 32 and then you do integer 8-bit and you're not losing much and it was like wow is it possible it was again a big kind of mindset shift and i think since then a lot of people invested into this i'm just kind of setting up stage for pete but i'm saying his contribution there was it was really very instrumental because like it's really kind of your your changes switches in your brain

Starting point is 00:46:05 like because people tend to have a tendency to think in a certain way like and and saying no i'm going to challenge this like let's look at this differently and since then people develop even like binary models but again i just don't want to take yeah go ahead, Peter. Yeah, yeah. You know, the funny thing is as well, a lot of that came from what I'd seen inside of Google from people like Raziel Alvarez was one of the first people I saw doing full 8-bit calculations, again, for the sort of wake word applications. And so it's really helped me look good when I've been able to sort of help share

Starting point is 00:46:49 all of this work that's been happening inside of Google by these engineering teams and help popularize some of these techniques that I've sort of seen working internally, like the 8-bit work, which they were doing back in sort of, you know, 2014 for the Wake Word, but it wasn't something that was generally, you know, widely shared. So, Evgeny's very kind, but a lot of my contribution has been like more sort of, you know, helping publicize and document a bunch of these engineering practices that have emerged as people dive in and start sort of creating real products. Go ahead. I was going to say, now it kind of becomes obvious,

Starting point is 00:47:52 but back like three, four years ago, it wasn't. But the techniques that people used today is like quantization. Do you really need to have 32 bits? They became industry standards or compression, data compression, or pruning, because the way you design your networks, you have a bunch of connections there in your networks and quite a few of them are redundant and useless and people started to cut them one by one. And you can do it in a smart way. So those techniques are all over there and you can

Starting point is 00:48:19 definitely, I mean, Pete mentioned this, what used to be like tens of megabytes model, you can design them in a way now they are not in the tens of kilobytes now. And that's actually a game changer. And on top of this, it kind of went beyond research. Now you have tools you can use to do this type of things. You can develop your model, like your big model using big data, and you can use these tools to trim them down to those sizes.

Starting point is 00:48:43 It's again kind of a collective work of the whole ecosystem, innovations, tools, software people, application people. But I would say now it's actually pretty mature. There is more development obviously to make them even smaller and better, but again some of the groundwork that Pete and Google people did and then a lot of academic work on the binary networks for example people show now that you can run your neural nets and binary networks like only ones and zeros right and that's that's like wow that type of thing so a lot of kind of innovations coming from from the industry but also from the academia and from companies like i think i know that blair and newton is also doing quite a bit of research there and developing kind of these networks in a different way to make them small by design.

Starting point is 00:49:27 Yeah, and I was going to say that, you know, Pete inspired us, right? So, I mean, I think you guys have done a great job as to kind of, you know, really describing the approach today. I mean, the approach that kind of got us to where we are today. And, of course, historically, we've taken that approach where you typically build a model and then from there you begin to apply some of these techniques that Jenny has mentioned, whether it's quantization, whether it's pruning, etc. to get your model to the point where it can really align with your use case or the hardware device that you're using. And typically, you're kind of taking that approach where, what steps can I take so that I minimize the loss of accuracy as much as possible, right? And really, a lot of the techniques that you guys have just discussed, and of course, PETA spearheaded has really kind of inspired us to say, OK, now how can we take it to the next level?

Starting point is 00:50:33 What's the next generation approach that we've accomplished where now instead of maybe taking that top down approach where you have a model and then you optimize it. We've decided to take a bottom up approach where we build each model neuron by neuron. And then once the model is built, you don't need to perform any compression techniques, whether it's quantization, pruning, or whatever the case may be. So now we feel like we've kind of taken the baton from some of the techniques that Pete has spearheaded, and now we're beginning to look forward to demonstrate a different approach where you do take that bottoms-up approach, building everything fit for use, neuron by neuron, so that you don't have to take those additional steps. So as soon as the model is built, it's ready for production.

Starting point is 00:51:32 So that's, again, one of the things that we're really appreciative of, of this community, to kind of give us the opportunity to really highlight the methods and approaches that we're taking. And we're seeing oftentimes that now taking this approach more of a bottoms-up approach, that we're able to realize that our models are sometimes, you know, 100 times smaller than what you would typically see by taking that top-down approach. So we're pretty excited about that and we're looking to continue to demonstrate and share some of the approaches that we're taking. TinyML conference that's coming up in about a week and a half, where we'll have an opportunity to share this new approach, a fairly disruptive approach, as it relates to building models for TinyML implementations. Yeah, I know that. And actually, I cheated a little bit with Ina's help, so I had a look at the draft for your keynote and you posed some interesting questions there.

Starting point is 00:52:48 And hopefully we've managed to at least scratch the surface because I don't think it's really possible to address those to their full extent in the time that we have. But, excuse me. So I wanted to ask you something based on what you just said. So would they be correct in saying that, well, the kind of technique you're using

Starting point is 00:53:08 basically results in having very, very small footprint models regardless of where you're going to deploy them? So do you use the same models for deployment on the edge and in the data center and anywhere? Yeah, that's certainly our expectation. So we're leveraging this technique, let's just say, for standard ML implementations

Starting point is 00:53:35 as well as tiny ML implementations. So really, we've kind of really redefined how you build models moving forward, right? So we don't start out as an example with a predefined structure, not at all. We don't leverage some of the typical techniques that are out there, whether it's back propagation, et cetera. We truly take an approach where we are building each model per use case, neuron by neuron. And one of the things that we do is we allow for our customers, once they define their requirements, let's say

Starting point is 00:54:19 even from a hardware perspective, to be able to build a model, stop the building of the model based upon their particular requirements. And one of the things that we do is as we build the model, we perform cross-validation each step as we build the model. So if a customer has a specific set of requirements and then they stop building the model, it's already for production use. And that's kind of our new theme. When you begin to talk about automation, that's why we say build fast. When we begin to talk about leveraging our technique as it relates to building neuron by neuron from the bottom up, we take the approach of build once instead of build a model and then compress it to your specific use case. And then when we begin to talk about not compromising, this is where we begin to talk

Starting point is 00:55:10 about explainability, being able to understand what's driving the predictions behind your model, being able to understand the quality that's behind your model, but then also being able to understand, hey, maybe when your model is decaying, you may need to retrain. So we're kind of positioning ourselves so as the adoption continues and those killer apps begin to enter the market, that we're very well positioned for that. Okay, well, thank you. I think another way there, just to add, either your bottom-up or top-down, people are developing some sort of like AutoML tools and also some compilers. So if you, for example, develop a tool using TensorFlow or TensorFlow Lite, what Peter's team is doing,

Starting point is 00:55:51 companies and people who do sort of like the middle layer there, they can translate your model into the microcontroller type of size and dimension. And from the end-user perspective, you don't really need to worry about this. And if you kind of fast forward what is going to happen in a year or two, you're going to have several companies and several tools that are going to be commercially available for people just to use the conventional models you develop in the TensorFlow or PyTorch, other type of formats, and then translate them down to the microcontroller code. So that is happening already.

Starting point is 00:56:30 Good. That was a very forward-looking position that you took because my question was going to be so fine. We only have about five minutes to wrap up. So let's use that time among you and let's quickly give what you see as the next steps in the evolution of TinyML. Well, both the concept and if you also want to refer to that, well, the organization and the events as well. So I guess Pete Pete, what is your piece of advice? Everybody's looking to Pete. Pete, what do you think? I really come back to what's happening on the product side. we almost have an overabundance of technology and hardware and software and models and modeling techniques. And we're starting to see, like Evgeny said, tens of millions of devices shipping with

Starting point is 00:57:40 these tiny ML approaches. And if you know, if you count things like wake words on phones, there are, you know, billions of devices out there from all sorts of, you know, manufacturers. But I still feel like we're looking for that, you know, coming back to that killer application and that process of uncovering the really world-shaking use cases that I think we're all feeling our bones are going to emerge. We don't know what they're going to be yet. So that is the path I'm really interested in following, is working with teams who have really interesting problems that we might be able to solve that can have a really big impact. And I think the TinyML events, the meetups, as well as the main conference have been really good at connecting

Starting point is 00:58:47 people who have solutions and people who have problems. And if I'll go, I'll let Evgeny wrap up because I think you have a good umbrella view of everything that's going on. I'll just briefly say, at least for ourselves, we believe, you know, the future is now to some degree. I think both Pete and Evgeny has kind of highlighted, I mean, over the last couple of years, TinyML has kind of maybe to some degree been in this infancy stage, right? And typically technology, every couple of years, takes this quantum leap, takes this jump so that you can get to the next level. And I think this next level is really the beginning of implementing all of these various use cases

Starting point is 00:59:39 that are out there so that we can get to the killer app and then subsequently everything can accelerate. And we see it when I say the future is now, I'm really referring to enablement, right? So how can we enable organizations, individuals to participate in this community, right? Because really, when you think about the tiny ML community, it really truly stretches across a number of different disciplines. And in some cases, we are seeing individuals come from one end of the discipline or the other end of the discipline, but now we need to be able to try to bridge that gap. And that's one of the things that we're really focused on is how can we bridge that

Starting point is 01:00:26 gap so that we can really enable the community to really be knee-deep into this ecosystem so that we can get more and more use cases and we can get to that killer app and we can really drive that adoption into the millions and billions of devices that are enabled from an intelligence perspective. And that is truly the area that we're focused on. And so we really kind of see ourselves as now taking that baton and really accelerating TinyML into the community, whether it's through automation, excuse me, whether it's through enabling organizations to be able to implement machine learning, as well as being able to get them into a production state. So we kind of see ourselves as

Starting point is 01:01:20 the enablement piece now so that we can achieve some of the goals that Pete and Evgeny set out when they started this movement. Thank you, and I am so excited and passionate about TinyML to complete this in one minute, but I'll do my best. So the way I see what is going to be happening in the future in a there will be two things. One is simplification and growth and impact. So I think there will be several directions of this.

Starting point is 01:01:51 On the tool side of things, it will be, again, simplification and standardization happening, like easier for people to use this type of techniques without kind of knowing the deep science of there. At the ecosystem level, we are going to see a lot of growth and consolidation. And I think we already see this. I mean, like in the past months, we saw several M&As, several VC investments into this space. So there will be a lot of things happening there. And on the growth side, we also see a huge momentum on education side, academic education, going actually all the way to high school

Starting point is 01:02:25 education. So we see a lot of talent coming to the workforce to make things happen, and that's actually a big part of the ecosystem there. On the technology side, we'll continue seeing more hardware technologies coming, more algorithms, more software tools. But I think what will be important to see is co-designs. We already see a big trend there with NAS tool, neural architecture search tools. Basically, those tools design your hardware and software in one fashion, in a co-design fashion. But what is more important, I think we are going to see the impact of these technologies,

Starting point is 01:03:02 both the business impact and also the social impact. And that's one of the areas I'm super passionate about, TinyML for good, because I'm a technologist by my heart, but I'm a human being. My DNA is a human, right? We want to make sure that these technologies do good for people. And at the foundation level, we're starting several initiatives in TinyML for good. We are going to continue them at the European event, European Middle East Africa event coming in 10 days,

Starting point is 01:03:28 like what Blair mentioned. So I think we are going to see a lot of impact happening in this field. So the future is really bright, but as Blair said, the future is really now. So I think at the foundation level, just to wrap it up, I think we started two years ago and the mission was the awareness, building awareness. So this mission has been accomplished. So now we have this huge network of experts, enthusiasts, beginners, like thousands of people there. So what we're trying to do at the foundation level now is to connect people together.

Starting point is 01:04:00 So software people, hardware people. And we are moving more kind more from event-based things to project-based. We are doing, for example, TinyML Vision Challenge. We are asking people to use technology to solve problems. So it's really kind of moving

Starting point is 01:04:14 from the awareness phase to doing things together, making a difference type of thing. And events are still going to continue. We have the summit. We have the European Middle East, African event, the Asian event, basically having all this kind of local events.

Starting point is 01:04:27 But really the key is to connect all this community and start doing a real thing. In addition to what companies do like Google, Qualcomm and other companies doing in-house, but really to promote these collaborations, partnerships and solving big problems or problems around you using this technology. Because the technology is there, it's good enough.

Starting point is 01:04:46 And I think the next step for us is really to create this huge momentum in solving problems around us. And that will create a lot of opportunities, both on the business side, but also on the social impact side. And that's what keeps us actually excited about this. Good. Thank you. Thank you all for contributing to a very interesting discussion for me because as an outsider in the field I learned quite a few things from this and

Starting point is 01:05:15 thank you for wrapping up and summarizing your goals which I think are quite broad and I think they make sense in the way you describe them. I hope you enjoyed the podcast. If you like my work, you can follow Link Data Orchestration on Twitter, LinkedIn, and Facebook.

CODACE Plant Stand

Orchestrate all the Things - Machine learning at the edge: TinyML is getting big. Featuring Qualcomm Senior Director Evgeni Gousev, Neuton CTO Blair Newman and Google Staff Research Engineer Pete Warden

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.

Your Ad Here

CODACE Plant Stand

Orchestrate all the Things - Machine learning at the edge: TinyML is getting big. Featuring Qualcomm Senior Director Evgeni Gousev, Neuton CTO Blair Newman and Google Staff Research Engineer Pete Warden

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.