Orchestrate all the Things - Machine learning at the edge: TinyML is getting big. Featuring Qualcomm Senior Director Evgeni Gousev, Neuton CTO Blair Newman and Google Staff Research Engineer Pete Warden
Episode Date: June 7, 2021Being able to deploy machine learning applications at the edge is the key to unlocking a multi-billion dollar market. TinyML is the art and science of producing machine learning models frugal eno...ugh to work at the edge, and it's seeing rapid growth. Edge computing is booming. Although the definition of what constitutes edge computing is a bit fuzzy, the idea is simple. It's about taking compute out of the data center, and bringing it as close to where the action is as possible. Whether it's stand-alone IoT sensors, devices of all kinds, drones, or autonomous vehicles, there's one thing in common. Increasingly, data generated on the edge are used to feed applications powered by machine learning models. There's just one problem: machine learning models were never designed to be deployed on the edge. Not until now, at least. Enter TinyML. Tiny machine learning (TinyML) is broadly defined as a fast growing field of machine learning technologies and applications including hardware, algorithms and software capable of performing on-device sensor data analytics at extremely low power, typically in the mW range and below, and hence enabling a variety of always-on use-cases and targeting battery operated devices. Article published on ZDNet
Transcript
Discussion (0)
Welcome to the Orchestrate All the Things podcast.
I'm George Amadiotis and we'll be connecting the dots together.
Being able to deploy machine learning applications at the edge
is the key to unlocking a multi-billion dollar market.
TinyML is the art and science of producing machine learning models
frugal enough to work at the edge and it's seeing rapid growth.
Edge computing is booming.
Although the definition of what constitutes edge computing is a bit fuzzy, να δουλεύει στον κέντρο και βλέπει γρήγορη ανάπτυξη. Το κερδοστάσμα στο κέντρο είναι σκοπιστικό. Επομένως, η δεφή του τι γίνεται στο κέντρο
είναι λίγο παχύσια, η ιδέα είναι απλή.
Είναι σχετικό με το να πάρουμε το κερδοστάσμα
από το κέντρο της δάσης και να το φέρουμε
όσο κοντά στον όρο που είναι η αξία, όσο μπορεί.
Είτε είναι ιότιτς ασύρθωσες, εργασίες όλων των είδους,
δρόμους ή αυτονομικές οδηγίες,
υπάρχει ένα σημαντικό πράγμα. Εκττός, τα δεδομένα που δημιουργούνται στον κέντρο χρησιμοποιούνται για να φέρουν
εργασίες που δημιουργούνται από μοτοσυκλένες μαθηματικών.
Υπάρχει μόνο ένα πρόβλημα. Οι μοτοσυκλένες μαθηματικών δεν ήταν ποτέ
σχεδιασμένες για να δημιουργηθούν στον κέντρο, όχι τώρα, τουλάχιστον.
Εντώνε το TinyML. Το TinyML είναι αρκετά σχεδιασμένο ως
ένας γρήγορος κόσμος της μαθηματικ της μαθηματοδύνασης και εργασίας,
συμβαίνοντας σε χαρτιά, αλγόρθυνες και εργασία,
που είναι ιδιωτικά σε εξαρτάσεις ασκηρίας δεξιού ασφαλής δεξιού,
τυπικά σε μινι-ΒΑΤ και πίσω,
και έτσι εξαρτάται σε πολλές χρήσεις χρήσης και προστατεύοντας εργασίες με μηχανή.
Ελπίζω ότι θα χαράστηκαν τα πρόγραμμα. cases and targeting battery-operated devices. I hope you will enjoy the podcast.
If you like my work, you can follow Link Data Orchestration on Twitter, LinkedIn, and Facebook.
My name is Inna.
I'm with Newton.
And let me introduce you to each other.
So I'll start with my colleague, Blair Nguyen.
He's a CTO of Newton.
We also have Evgeny Gusev, Senior Director of Engineering with Qualcomm Research.
We have Pete Warden, Technical Lead of the TensorFlow Lite Micro with Google.
And George Anadiotis from ZDNet will moderate our discussion. So,
I'll pass the word to George and I hope that everyone will enjoy our session.
Well, thank you very much, Ina. Well, first of all, good to see you, good to connect actually in person as it were.
And thanks also for doing part of my job actually,
which I usually am the one who does the introduction.
So thanks for that.
And well, thank you everyone for making the time to connect today.
And well, let me start by saying that the topic we're here to address is TinyAML. very briefly describe as miniature ML algorithms to do inference basically on the edge.
I was very familiar with the idea.
I was not familiar with the actual term of tiny ML.
So, and as a kind of introduction, let's say, I would also mention this that while doing a little bit of background
research, let's say, on TinyML, the organization and the events, which you know much much better
than I do and hopefully you can introduce us to that, I also noticed this that what seems to me
like an obvious point in TinyML is that basically it refers to inference. It doesn't seem to
be referenced at all. So I wonder if this is obvious to you as well, and this is why
you missed it, or there is some other reason that you don't mention that. So anyone who
would like to start? Feel free. I guess this question has come up a fair amount. For me,
people ask about, hey, what about doing training on the edge in general, you know, not just for
tiny ML, but also running machine learning on phones or, you know, other less tiny devices.
And one of the big challenges is that we still generally need labeled data to do a
lot of training and you don't get much labeled data.
You don't get many labels coming out of
the typical sorts of sensors that you have at
the edge.
So there is some work around doing training on the edge, for example, with federated learning.
And Google actually uses that for the keyboard to learn new words as they kind of emerge in the language without, you know,
having to send all the data to the data center.
But most use cases don't have anything like that level of labeling.
So inference is definitely the most common use case for this.
But yeah, it's a good thing to call out.
And just to add to this, I think another reason is compute and memory resources because to be able to
do a training, you still need to have quite a bit of memory to store your data.
As Pete said, the data has to be labeled, so you really need to have quite a bit of
memory.
There are some approaches that people are exploring.
I mean, it's still in the research labs, like federated learning and aggregated learning.
So basically, let's say you have about a thousand of devices collecting data and maybe five percent of them have enough computer memory power to be
able to do training. And then you can share these models among all of those other tiny nodes as
well. But I think it's probably also a matter of time. I mean, we are still at the very beginning
of TinyML. Obviously, inference
is the first step to start.
And then as these devices are getting smarter,
as algorithms are getting smarter,
new approaches are coming,
new memory technologies
are coming. So it's probably
going to be within, I would say,
five years or so, we are going to see some
examples of training
at the edge, not just inference.
Okay, yeah, thanks. And thanks for actually giving me more than what I asked for, because
my original starting point was, well, it's not probably even possible to do training on the edge.
So I kind of assumed like, okay, we're only talking about inference.
Training is really out of the question, at least at this point. But thanks for providing a timeline
because, well, also depending on your definition of the edge, actually, well, you could argue that,
well, maybe you can't obviously do on like tiny devices, you can't do training, but maybe,
you know, small data centers close to the edge, you may call them edge
or not. So, yeah. And actually talking about definition, that's a very interesting
and very important question. I think when we started TinyML and the foundation,
Peter and I and the whole committee, we spent quite a bit of time defining what
actually Tiny is, because it is really dependent on many factors like
what is the use case, what kind of device you are using, what kind of battery you are
using.
So at the end of the day, what matters is really how these devices can be deployed in
the field and how they're going to perform.
And the bar for tiny is a little bit fuzzy in a way that it can be 1 milliwatt, it can be 10 milliwatt, it can be 100 milliwatt.
And I think what we decided to do is just to push the limit of the technology, really define tiny as like in the milliwatt or below type of range.
And that basically gives enough battery lifetime for devices to operate in
real life. Because you can make some demos and show you can do like object detection
at 100 milliwatt, but to be able to have it like in a battery operated device that is
going to last, let's say, six months or a year, you really need to be in the milliwatt
or milliamps range. That's kind of the definition of all of this.
And that gives you the whole continuum of machine learning.
So you start from tiny and then there is an edge and there is an endpoint,
an access point, and then you go to the cloud.
That's kind of how people look at this thing now.
And the boundary is not very clear
between all of those unless you go to the cloud.
OK, well, thanks. the cloud. Okay, well, thanks.
You already touched upon, well,
two of the topics that I would like to expand on.
So one, definitions,
and two, kind of the organizational background
and history of the organization.
But actually, before we delve into those,
I think it may be a good idea
if we do like a kind of first round with everyone here.
We already had Dina give us some
introductions, but I was wondering if everybody would like to say a few words
basically about their motivation for being active
in this space and what is the use cases that they see
for their respective organizations.
So if you'd like to take turns addressing that.
I guess maybe I'll start.
So I think maybe George, we had an opportunity to speak before, maybe it was another topic
or event, but it's a pleasure to meet you.
Thank you for inviting myself as well as Pete and Evgeny as well.
When it comes to myself, when it comes to TinyML,
one of the things that I've always kind of lived by as it relates to not only, let's say, working with TinyML, but in general,
is I really kind of operated under a premise that in order to be successful,
you need to be able to bring your services to the fingertips of your customers.
And if you're capable of bringing services to the fingertips of your customers,
then you have a real chance of being successful.
And when you begin to think about TinyML, at least for myself, it's really one fundamental
thing that we're looking to accomplish, right?
We're looking to bridge the physical world, let's say, with the digital world, and in
essence, bringing that technology extremely close to the customer.
And that's something, at least for myself personally, that I'm extremely fascinated
about. From an organizational perspective, we've really approached TinyML in a couple of different
ways. I know that I'm sure you had an opportunity to kind of take a look at what are some of the
objectives around TinyML. And one of the premier objectives or primary objectives is really to proliferate
TinyML throughout the industry. We're looking to be able to accomplish or have billions of
intelligent devices that are out there, right? And in order to accomplish this, this means that
we need to expand the footprint of machine learning just beyond the data scientists.
And if we don't accomplish this, then being able to proliferate, you know,
this intelligence across all of the various ecosystems that we're targeting,
then we'll have very limited success. So one area where we're targeting is we're looking to say, you know,
how can we make machine learning available to everyone in a real practical sense?
So that's one of the areas that we're focused on.
And then the second area, which is maybe from an industry perspective, something that we've seen before is rather cyclical, where we begin to see that, you know, hardware begins to outpace and accelerate beyond software.
So we're seeing that hardware is becoming more and more optimized and to some degree commoditized.
And we're seeing that, let's say, from a machine learning perspective, it's kind of really trailing.
And what this means is there's a lot of different techniques that we have to take in order to be able to enable some of these smaller devices to really take advantage of the intelligence that is out there. And some of
the approaches that are being currently taken today in order to enable those devices, we began
to take a little bit of a different approach where we're building our models for purpose, fit for purpose.
So instead of, let's say, taking the approach of putting a square peg into a round hole
in order to, let's say, ensure that a model can fit into a hardware device,
we're taking the approach where we're building all of our models specific to a given use case
or specific to a given device or piece of hardware,
again, enabling every device to be able to take advantage of machine learning. So, just to kind
of give you just a little bit of a high level for me personally, I really enjoy bringing that
intelligence to the customer's fingertips. And then our mission as an organization really kind
of overlays this where we're enabling really everyone to be able to take advantage of TinyML.
Okay. Thank you.
Anyone else wants to take a look?
Just to follow on what Blair said, I think he mentioned several very interesting points.
One is the hardware component.
And because I'm from Qualcomm, I'm representing the hardware part of the equation.
I think the hardware is quite important as a software.
I think that's the key differentiation and the key value proposition of TinyML as an
organization.
We are an end-to-end solution to our customers. It starts from hardware and goes to algorithm software
and use cases and the whole scale-up deployment.
So if you go back to the hardware question,
which I'm again representing at the hardware company,
and that's kind of what my team and I do for a living,
is you can think about this.
If you compare what you can accomplish in
silicon today compared to, let's say, 20 years ago, one or two millimeters of silicon now
is equivalent to a Pentium computer 20 years ago. You can think about this. A big desktop
became one millimeter of silicon. It's an enormous amount of compute, and that's
all due to more
slow and other types of scaling.
So now you can think, like, you have
so much compute in this very
little and low-power device.
What can you do with this? And that
becomes really interesting. The
algorithmic part, the software part,
you put all the things together,
and it becomes a really cool thing.
And then I think the Blair touch upon another point
is the customer angle.
When you start sharing this type of technology to customers,
you see really their eyes open wide like,
wow, you can do so much cool stuff
with this little silicon, low cost, low power.
And that's very inspirational.
That basically closes the whole loop, feedback loop.
It becomes so positive in a way you create cool technology, but then you see this technology
having impact on customers and customer customers.
So that basically makes it so innovative in a way on the technology side, but also so
impactful on the other end, but also so impactful on the customer
end.
So that's kind of what keeps us going there.
It's a very exciting field.
Cool.
Pete, you want to weigh in as well?
Yeah.
I mean, for me, it all goes back to a moment in 2014 when I first joined Google.
And we had a startup that was required.
And I was quite proud that we were able to fit models in like two megabytes.
And I was feeling pretty good about that.
And then I talked to the team behind OK, and I'm going to pause Google, just pausing there so that nobody's devices
go off. We actually end up calling it OKG when we're working with it in meetings just
to avoid that. And they had a 13 kilobyte model that they were using to recognize that wake word running on the little always-on DSPs
that exist on Android phones
so that the main CPU wasn't burning battery
listening out for that wake word.
And that really blew my mind.
The fact that you could do something
actually really useful in that smaller model. And it really got me thinking about all of the
other applications that might be possible if we can run, especially all these new machine learning, deep learning approaches,
convolutional networks and things in a footprint that small on these tiny, cheap, low power devices.
And really, since then, it's been following that thread
and talking to product teams,
getting inspiration for people who want to do really interesting stuff and trying to figure out how we can actually make that happen.
Okay, well, thanks because you actually all guys make it quite easy for me
because you just mentioned something that I intended to ask you about.
So you mentioned model size and I want to tie that in to what was mentioned
before about power basically. I mean, I do realize that whatever kind of criterion you
use is going to be like constantly moving goalposts because of the hardware changes
and the power requirements change and the model keeps
changing all the time. But my question is, how would you define, and I know it's a fuzzy one,
but how would you define what actually fits into the tiny ML definition? Is it below a certain
power threshold? Is it below a certain model size? Or do you have a kind of way of figuring out what falls into this
category?
Yeah, I can speak to the Qualcomm way of doing things. When we started this tiny ML project
in Qualcomm in 2014, about the same time when Pete mentioned this, we looked at vision because
for us vision was one of the most
challenging use cases because for vision you typically need to process a lot of data, images
of big size, and then the big models to do detection and so on. Typically in those days,
and the camera itself consumes a lot of power, And typically in those days,
to be able to do like a phase detection,
for example,
it required like maybe half a watt of power end-to-end,
the sensor and the processor,
CPU and algorithms and everything.
And we started to think like,
why was that?
Why is it so high?
I mean, you had like always on
touch technologies back then.
You had always on audio technologies. you had always-on touch technologies back then, you had always-on
audio technologies, you had always-on inertia sensors on your phone already. Why was it
so challenging? And we looked at the whole thing holistically and kind of started to
do it from the algorithms and software and everything, and we got to the slow power numbers.
But back to the question, back in those days, we debated quite a bit, a lot internally,
like where the bar should be for this tiny, for the low power.
And I think from the Qualcomm perspective, we adopted this number one milliamp.
And the reason for this was quite simple, because when we talked to smartphone users back then, they allocated about like
one milliamp of power to all sensors for them to qualify as an always-on type of iterations.
It can include all sensors on the phone. So it's an inertia sensor, light sensor, audio type of
sensor, touch sensors, all of them combined should not consume more than like one milliamp of power.
And we basically put the bar, let it be one milliwatt. But this again, this bar is somewhat
artificial. It really depends on the use case because for some devices, your duty cycle,
you have to use it 100% of the time. It's always on. For other devices, let's say you have a tiny
camera in a retail store.
You need to take images maybe only once an hour to process what is the state of the shelf. So in this case, the duty cycle is like 0.1%.
So the battery lifetime is going to be different.
But basically, again, to answer the question,
we thought that milliamp, milliwatt type of range gives you this all-in-one functionality.
Because at the end of the day, it's the customer experience, consumer experience. You don't want to replace your batteries every
week. You want to make sure that the final product can last for some time, like a year at least,
or six months at least. So that's kind of where this number came from. To be able to do intelligence
in a device that can operate in a battery, like small coil cell battery, for quite some time.
And if you kind of back of the envelope estimation or calculation that gives you this, like a milliwatt type of range of power.
And then you constrain your system this way, so it can still be powered.
But again, I should highlight again,
this is very use case dependent. There's not one size fits all.
Okay, thank you. You already
referred a few times to when you initiated
this tiny ML effort. So I was wondering, and I know there is an institute that you have founded
and you also do events. So I wonder if you'd like to share a few words on the background. So who
started and who has joined up to now and the activities you are taking. Sure. So Pete, do you want to go rewind the feedback?
Yeah, because one of the first people I met when I started getting interested in,
hey, what else can we do with TinyML, with these tiny models other than doing these wake words was Evgeny with the Glance project
from Qualcomm, which still is able to run a really low power image sensor plus image processing computer vision algorithms in kind of a one all in one
package with like one milliwatt and it really impressed me Evgeny's vision you know it was like really inspiring to see this and so we we kept in touch and decided to put on a small
conference together um and we rapidly discovered that more people than we could accommodate in the large Google meeting room
that we'd organized for the first conference wanted to come.
And so we had, when was that first TinyML conference, Evgeny?
Yeah, I think you and I met in July of 2018
on the Google campus.
And then we said,
we need to do something about this TinyML thing.
It's going to be huge.
So we need to start building a small ecosystem around this.
And then we called a couple of colleagues of ours,
friends, and shared, socialized this idea with them. They said, yeah a couple of colleagues of ours, friends, and shared,
socialized this idea with them. They said, yeah, it's a great idea, but we are skeptical that we
are going to get like 30 people in the room if you have this type of conference. And they kind
of peed on it. I said, okay, let's give it a try. And then in like three or four months,
we were over capacity and people were trying to get on kind of wait list and other things and we said wow this thing is going to be big and thanks
to Google and Pete's friends we were able to host this event on the Google
campus that was in February of the 19th I believe, the first one, we had about like
200 people and that's how the whole thing started, basically.
And from the very beginning, Pete and I, we kind of put a little bit of groundwork,
a foundation for this. And what we decided fundamentally, that what we are going to be
doing is going to be a nonprofit organization because this technology is really for the benefit of all 10 EML.
And also fundamentally, it's end to end.
It's open to everyone.
It should be diverse because we are really
very strong believers that any ecosystem is as strong as diversity is.
You need to have like every single player in this ecosystem.
There is no insignificant member, all of the
significance. So that was the fundamental principle of this foundation to be really
open, non-profit, and a global community of people who do things together and drive this
field together. So we are kind of two years old, but like huge momentum. We started with the
summit. We've done like three summits since then. The last one we had in March of this year,
it was a big success. We had over like 5,000 people register. So you can see the progression
from two people to a group of people to about 100 people, 150 people at the first event,
and then 500 people at the second event, and now we have 5,000.
And we see a lot of momentum there.
And I think there are some fundamental reasons for this.
It's not just like FIT and me and other members of the community pushing it.
There is like a huge pull force.
The pull force comes fundamentally from several angles, from several directions. One, this technology is really affordable in a financial way. To be able to develop your own
tiny ML application, you can get a board. Let's say, for example, the recent one from Raspberry
Pi. The whole board with silicon and everything
cost $4.
You don't really need to spend thousands of dollars on the cloud and other types of things
or GPUs.
So that's one reason.
So it's affordable.
Second, it's really the low power and the battery-operated stuff.
And the third one is the variety of use cases.
Every person we talked to said, I can use cases. I mean, every person we talk to, they say,
I can use this for this problem, to solve this problem.
I can solve it for this problem.
But like one recent example, we talked to our people, friends in Kenya.
So they developed this technology to monitor beehives in Kenya
because apparently, I didn't know this,
that honey business is a big part of their agriculture
and by doing this by monitoring in real time and
making it accessible they can save 20% of their revenues
it's a big number but my point there that you can
give this technology to people and they can start develop applications to solve their
problems that's kind of what resonates really well so it's the affordability you can give this technology to people and they can start to develop applications to solve their problems.
That's kind of what resonates really well.
So it's the affordability, it's the low-power component of this, and the variety of these
use cases for all.
So we basically created this platform for people to collaborate and to work together,
and we see it's really coming to huge fruition now.
And George, if I may, because since I'm kind of like the, I guess you might say the new kid on the block considering,
you know, Pete and Evgeny have
obviously done an exceptional job in not only, let's say, birthing the movement of TinyML,
but also being the tip of the spear and kind of creating a platform for everyone to kind of, you know, bring their thoughts, bring their ideas, bring their innovations to this particular community.
And certainly without them leading the way, we probably, at least from a Newton perspective, we wouldn't be really having the opportunity to be
here today. So one of the things that I do kind of want to add as it relates to the community,
and I'm going to touch a little bit on your prior question as well, is that, you know,
Pete and Evgeny have done an exceptional job, I think, as he mentioned, maybe initially starting
out, we're kind of thinking, you know, hey, can we get,
you know, five or 10 people in the room? And now, you know, there's 5,000 people that are,
you know, actively engaged in the community. And at least for ourselves, you know,
they've created the opportunity to really allow for us to begin to kind of really put on display some of the things that we've been able to build upon, especially as it relates to TinyML. Now, one of your previous
questions or your prior question, you mentioned, you know, where do you kind of begin to delineate
a really defined TinyML, right? And historically, it's really kind of started, I guess you might say,
you know, from a hardware perspective, and then everyone has kind of taken a step of,
can you now begin to build a model that can, you know, integrate into the devices that are
out there today that is enabling TinyML? So, Pete and Evgeny has kind of opened the door to this.
And one of the things that we kind of like to think, at least for ourselves, is as we move
forward is as they've opened the door, you know, we've kind of taken the approach as to, you know,
how can we now kick down the door and kind of ask people to begin to kind of think about things
in a different way. So whether it's starting with your hardware and then building a model to integrate with your hardware,
we've kind of taken a completely different approach where we kind of flip things on its head and said,
well, now we're going to begin to look at how can you build a model that is fit for purpose for your hardware?
So instead of starting with your hardware and then building the model to integrate with your hardware, how can you now build models
that are fit for purpose for your particular use case? Because I think Evgeny has kind of mentioned
there's a variety of different use cases that span across multiple different verticals.
And in order to help TinyML continue to grow, our approach is we're building models
that are fit for purpose for every single use case. Instead of the other way around,
where you're kind of having a hardware, then you're needing to take particular approaches
in order to get your model to integrate with that hardware, we're taking a completely different
approach where we're saying, hey, let's build your model fit for purpose for your individual use case and for your hardware
along the way. So we're certainly appreciative to that both Pete and Jenny has, you know, started
this movement. And, you know, obviously, you know, we're becoming more and more active in the
community. And we're hoping to continue to kind of share our innovations as the technology in the community grows. Thank you and yes that was something that's
going to be my next question actually so yeah fine so now that we have the history and definitions
part figured out let's say what about the practicalities? And also, all of you mentioned
on some of the opportunities that bringing tiny ML to fruition can bring. But what about
the challenges? And actually, what about some practical scenarios, let's say? So, you have
a machine learning model, whether it's, I don't know, Vision or whatever else.
What are the challenges that you're facing if you want to scale it down
to be able to deploy it on low-powered devices on the edge?
And is there a clear path that people can take to make that happen?
Yeah, I think Pete mentioned one before, which is both a challenge and an opportunity.
As for all machine learning type of problems, it's data. You really need to have data. That's
kind of a universal problem. And that also creates some interesting questions on the
business side, like who owns the data, who is going to monetize the data. But really, you have data to develop your models.
And that's kind of one of the challenges,
which is universal for machine learning and IDML as well.
And I see this more like an opportunity,
and there are some companies who build their business models now
around data and collecting data and doing data crowdsourcing
and develop models and so on.
The other one, I would say, is more on the timescale, which is the adoption because we
are still at the very beginning. Many people, especially end users, are not aware of the
capabilities that Dynamo can offer. That's why in the first year and the second year,
when we started
the foundation, the mission was really to build awareness, not just in the technical
community, but move it up the stack and show people that these types of applications are
possible. Because for us, engineers and technologists, we have all the tendency to over-engineer
things. But when you talk to end users, you really need to, like what
George mentioned, Blair mentioned too, you really need to understand what is good enough. Once you
understand this, you can start to develop products and that goes to the practicality. You really need
to connect the technology capabilities to what people need, what is good enough, and then boom,
it goes. But to me, I would say it's the data and this
how do you use it. You turn on your imagination and develop something for finding out. And that's
I see more as an opportunity, not as a challenge. So Pete, from your angle, what are the big ones
you see on your radar screen? Yeah, I mean, for me, it all comes back to finding the big use cases and finding, you
know, it still feels like in a lot of ways we're in that sort of space in the late 1970s with microcomputers
where a whole bunch of nerds are really, really excited
about the possibilities of these devices,
but it's still unclear what the actual killer applications are going to be for this technology that we can see is coming.
It's really obvious that it totally makes sense from all of these technical trends coming together.
And we can see that it can be useful in all of these different ways.
But it's that long process of product and customer discovery that has to happen before
we have a large number of cast iron customer use cases.
That's the part where I feel like we're in this really interesting stage
of those emerging yeah yeah go ahead i'm sorry uh and um just to kind of add on with what they
both mentioned i think in in order for really for tiny ml to take off we we ultimately kind
of have to complete that life cycle and for me me, what I mean by life cycle is, and I think they both touched on it, is really
the implementation of use cases or enabling those use cases so that we can begin to drive
adoption.
And when I began to think about it, you know, we need to get those business users.
We need to get, you know, those individuals that in their mind, they have those use cases.
How can we get them into the game and enable them to test out those hypotheses, implement those use cases, which will further drive adoption? option. And when I begin to think about that, when you say from a challenges perspective, I think there's still just some of the fundamental challenges that are out there,
whether you're talking about AI or ML or tiny ML, is how do you enable those organizations or how
do you enable those individuals who have those use cases? In their mind, they're saying, hey,
tiny ML can really bring value, but do I have that data scientist on board?
And then even if I do have that data scientist on board from an operational or tactical perspective, can I actually produce a solution that I can then actually test and validate?
And then if we really wanted to take off, how do we then make sure that once something is in production,
that the organizations can then drive value out of it? And that's one of the things that we've
kind of continued to take a look at is how can we enable, let's say, the entire lifecycle?
How can we enable organizations that may have those use cases, but may not have that data scientists, right? Then let's say we do enable those organizations and how we're doing that is from an auto ML perspective.
Now, how can we enable them to implement those use cases?
And in this particular case, let's say producing models that can actually integrate into those microcontrollers so that they can test out their use cases.
And then lastly, we want to actually then get those use cases into production because then once they're into
production, then other organizations, because this is obviously a me too world, once they begin to
see that other organizations are implementing TinyML from a production perspective, then adoption
will begin to accelerate from there. And that's
another area that we're also focused on in an area where we call, hey, you know, once something is in
production from a tiny ML perspective, you know, how can you validate that what you're actually
predicting is actually accurate? How can you validate that you can actually keep that solution
into or in a production state?
So I think for me, when we begin to think about what are some of the barriers and challenges is that we really need to begin to look at how can we enable the entire lifecycle of TinyML?
Starting first with enabling organizations and then once we're able to get them into production, ensuring that they actually can have that transparency and realize
the value of the solution.
I want to thank you all.
If I may, because I think Pete brought up a very important question
of the killer app, but just to calibrate, TinyML is not
a science fiction anymore. It's real
and it's production. So TinyML is being shipped into tens of millions of devices now in all
kinds of verticals, in audio, vision-based technologies, in industrial IoT, predictive
maintenance, many, many examples. But I think what Pete pointed out is that the killer app
is not there yet. I mean, the killer app is the one that is going to be everywhere,
just like a smartphone.
And again, just to calibrate, if you look at the smartphone,
technology-wise, what is smartphone?
Smartphone is a processor and a touchscreen on display, right?
So processors were known for quite a while.
Touchscreen was known since mid-'80s.
You can find it in your laptop, and then the displays were there.
It was just the brilliance of
people that brought all the three technologies together and boom, it became a big thing.
Same thing with Uber. What is Uber? It's a smartphone, a GPS from the technology perspective.
It's super easy, but then it took like 10 years for people to realize from smartphone to Uber to meet the 2010s to make this happen.
And it became a killer app.
So I'm quite optimistic and positive this is going to happen in just a matter of time
because TinyML is fundamentally a game-changing technology.
In addition to power and cost, I think we didn't mention one more very huge
differentiation of TinyML. It's privacy. TinyML technology allows to do analytics on a device
without transmitting data to the cloud, without sending any kind of raw data. All you get is
metadata. You put in audio type of sensors like what Pete mentioned, keyword detection.
You're not recording your room like what Alexa and other type of devices do.
All you do, you're just always on in the room.
You just listen.
If something happens, you send a trigger like, hey, somebody's in the room.
Or there is a glass breakage.
Or somebody is using a saw to cut trees in the forest.
The same also with the vision.
People are scared of cameras for a reason.
And a tiny ML vision allows you to do analytics without sending images. And this is a huge value proposition. Again, I'm very positive and very optimistic that the killer use case is going
to come. But what we see today, we see a lot of smaller business cases and products
bubbling up. And I think it's just again a matter of time when we have a critical mass and then
we'll see this explosion to a trillion of devices probably within a decade or so.
Thank you. I was just going to say earlier that I think you all approached it very correctly, actually, from a broad perspective and referring to things such as universal data science and machine learning challenges, such as finding the right data or finding the right expertise in your organization and finding use cases, actually, which you just referred to, which are all totally valid points.
And I have to add that as an engineer myself,
I really appreciate the fact that you went out of your engineer shoes
and approached it in a different way.
However, now I would like to ask you to actually geek out a little bit
because what I had in mind was a kind of more specific scenario.
So suppose you have all of those challenges
somehow figured out in an organization, right?
So you have your expertise, you have your data scientists,
you have your models, you have actually trained.
You have your data and you have actually trained a model.
It has converged
and it works perfectly for your criteria. I was wondering what would the specific challenge
be and if there is like a clear pathway to take that model, which is potentially a very
large one, and deploy it on a device on the edge, which may have trouble accommodating
that in terms of size, in terms of power consumption.
So is there some way to cut that model down and make it deployable on a device on the edge?
Is there a clear path in doing that? Is there some techniques you're
investigating to make that happen? Pete is a world expert. Pete, you're Pete actually, just to give a little
introduction here, Pete is the
first guy who inspired the whole global
community into these
techniques. I remember it was
2018, I think, or 2019
when Pete showed that you can
he'll probably
talk about this too, you can quantize your model and you can, he's probably going to talk about this too,
you can quantize your model and you can run a 13-bit model on 8-bits
without compromising the accuracy much.
And it was like an eye-opening experience for the whole community.
Like, hey, you can train your model on the regular floating point 32
and then you do integer 8-bit and you're not losing much and it was like
wow is it possible it was again a big kind of mindset shift and i think since then a lot of
people invested into this i'm just kind of setting up stage for pete but i'm saying
his contribution there was it was really very instrumental because like it's really kind of
your your changes switches in your brain
like because people tend to have a tendency to think in a certain way like and and saying no
i'm going to challenge this like let's look at this differently and since then people develop
even like binary models but again i just don't want to take yeah go ahead, Peter. Yeah, yeah. You know, the funny thing is as well,
a lot of that came from what I'd seen inside of Google from people like Raziel Alvarez
was one of the first people I saw
doing full 8-bit calculations,
again, for the sort of wake word applications.
And so it's really helped me look good when I've been able to sort of help share
all of this work that's been happening inside of Google by these engineering teams
and help popularize some of these techniques that I've sort of seen working internally,
like the 8-bit work, which they were doing back in sort of, you know,
2014 for the Wake Word, but it wasn't something that was generally,
you know, widely shared.
So, Evgeny's very kind, but a lot of my contribution has been like more sort of, you know, helping publicize and document a bunch of these engineering practices that have emerged as people dive in and start sort of creating real products.
Go ahead.
I was going to say, now it kind of becomes obvious,
but back like three, four years ago, it wasn't.
But the techniques that people used today is like quantization.
Do you really need to have 32 bits?
They became industry standards or compression, data compression,
or pruning, because the way you design your
networks, you have a bunch of connections there in your networks and quite a few of
them are redundant and useless and people started to cut them one by one.
And you can do it in a smart way. So those techniques are all over there and you can
definitely, I mean, Pete mentioned this, what used to be like tens of
megabytes model, you can design
them in a way now they are not in the tens of kilobytes now.
And that's actually a game changer.
And on top of this, it kind of went beyond research.
Now you have tools you can use to do this type of things.
You can develop your model, like your big model using big data, and you can use these
tools to trim them down to those sizes.
It's again kind of a collective work of the whole ecosystem, innovations, tools, software people, application people. But
I would say now it's actually pretty mature. There is more development obviously to make them even
smaller and better, but again some of the groundwork that Pete and Google people did and then
a lot of academic work on the binary networks for example people show now that
you can run your neural nets and binary networks like only ones and zeros right and that's that's
like wow that type of thing so a lot of kind of innovations coming from from the industry but
also from the academia and from companies like i think i know that blair and newton is also doing
quite a bit of research there and developing kind of these networks in a different way to make them small by design.
Yeah, and I was going to say that, you know, Pete inspired us, right?
So, I mean, I think you guys have done a great job as to kind of, you know,
really describing the approach today.
I mean, the approach that kind of got us to where we are today.
And, of course, historically, we've taken that approach where you typically build a model and then from there you begin to apply some of these techniques that Jenny has mentioned, whether it's quantization, whether it's pruning, etc. to get your model to the point where it can really align with your use case or the hardware device
that you're using. And typically, you're kind of taking that approach where, what steps can I take
so that I minimize the loss of accuracy as much as possible, right? And really, a lot of the
techniques that you guys have just discussed, and of course, PETA spearheaded has really kind of inspired us to say, OK, now how can we take it to the next level?
What's the next generation approach that we've accomplished where now instead of maybe taking that top down approach where you have a model and then you optimize it.
We've decided to take a bottom up approach where we build each model neuron by neuron.
And then once the model is built, you don't need to perform any compression techniques, whether it's quantization, pruning, or whatever the case may be.
So now we feel like we've kind of taken the baton from some of the techniques that Pete has
spearheaded, and now we're beginning to look forward to demonstrate a different approach
where you do take that bottoms-up approach, building everything fit for use, neuron by neuron,
so that you don't have to take those additional steps.
So as soon as the model is built, it's ready for production.
So that's, again, one of the things that we're really appreciative of, of this community,
to kind of give us the opportunity to really highlight the methods and approaches that we're taking. And we're seeing
oftentimes that now taking this approach more of a bottoms-up approach, that we're able to realize
that our models are sometimes, you know, 100 times smaller than what you would typically see by taking
that top-down approach. So we're pretty excited about that and we're looking to continue to demonstrate and share some of the approaches that we're taking. TinyML conference that's coming up in about a week and a half, where we'll have an opportunity
to share this new approach, a fairly disruptive approach, as it relates to building models
for TinyML implementations. Yeah, I know that. And actually, I cheated a little bit with
Ina's help, so I had a look at the draft for your keynote and you posed some interesting questions there.
And hopefully we've managed to at least scratch the surface
because I don't think it's really possible
to address those to their full extent
in the time that we have.
But, excuse me.
So I wanted to ask you something based on what you just said.
So would they be correct in saying that,
well, the kind of technique you're using
basically results in having
very, very small footprint models
regardless of where you're going to deploy them?
So do you use the same models for deployment
on the edge and in the data center and anywhere?
Yeah, that's certainly our expectation.
So we're leveraging this technique,
let's just say, for standard ML implementations
as well as tiny ML implementations.
So really, we've kind of really redefined
how you build models moving forward, right?
So we don't start out as an example with a predefined structure, not at all.
We don't leverage some of the typical techniques that are out there,
whether it's back propagation, et cetera.
We truly take an approach where we are building each model per use case, neuron by neuron. And one of the
things that we do is we allow for our customers, once they define their requirements, let's say
even from a hardware perspective, to be able to build a model, stop the building of the model based upon their
particular requirements. And one of the things that we do is as we build the model, we perform
cross-validation each step as we build the model. So if a customer has a specific set of requirements
and then they stop building the model, it's already for production use. And that's kind of our new theme.
When you begin to talk about automation, that's why we say build fast.
When we begin to talk about leveraging our technique as it relates to building neuron by neuron from the bottom up,
we take the approach of build once instead of build a model and then compress it to your specific use
case. And then when we begin to talk about not compromising, this is where we begin to talk
about explainability, being able to understand what's driving the predictions behind your model,
being able to understand the quality that's behind your model, but then also being able to understand,
hey, maybe when your model is decaying, you may need to retrain. So we're
kind of positioning ourselves so as the adoption continues and those killer apps begin to enter the
market, that we're very well positioned for that. Okay, well, thank you.
I think another way there, just to add, either your bottom-up or top-down, people are developing
some sort of like AutoML tools and also some compilers. So if you, for example, develop a tool
using TensorFlow or TensorFlow Lite, what Peter's team is doing,
companies and people who do sort of like the middle layer there,
they can translate your model into the microcontroller type of
size and dimension. And from the end-user perspective, you don't
really need to worry about this. And if you kind of fast forward what is going to happen in a year or two,
you're going to have several companies and several tools that are going to be commercially available
for people just to use the conventional models you develop in the TensorFlow or PyTorch, other type of
formats, and then translate them down to the microcontroller code.
So that is happening already.
Good. That was a very forward-looking position that you took because my
question was going to be so fine. We only have about five minutes to wrap up. So let's use that time among you and let's quickly give
what you see as the next steps in the evolution of TinyML. Well, both the concept and if you also
want to refer to that, well, the organization and the events as well. So I guess Pete Pete, what is your piece of advice?
Everybody's looking to Pete.
Pete, what do you think?
I really come back to what's happening on the product side. we almost have an overabundance of technology and hardware and software and models and modeling
techniques. And we're starting to see, like Evgeny said, tens of millions of devices shipping with
these tiny ML approaches. And if you know, if you count things like
wake words on phones, there are, you know, billions of devices out there from all sorts of,
you know, manufacturers. But I still feel like we're looking for that, you know, coming back to that killer application and that process of uncovering the really world-shaking use cases that I think we're all feeling our bones are going
to emerge. We don't know what they're going to be yet. So that is the path I'm really interested in following,
is working with teams who have really interesting problems
that we might be able to solve that can have a really big impact.
And I think the TinyML events, the meetups,
as well as the main conference have been really good at connecting
people who have solutions and people who have problems.
And if I'll go, I'll let Evgeny wrap up because I think you have a good umbrella view of everything
that's going on. I'll just briefly say, at least for ourselves, we believe, you know, the future is now to some degree.
I think both Pete and Evgeny has kind of highlighted, I mean, over the last couple of years, TinyML has kind of maybe to some degree been in this infancy stage, right? And typically technology, every couple of years,
takes this quantum leap, takes this jump
so that you can get to the next level.
And I think this next level is really the beginning
of implementing all of these various use cases
that are out there so that we can get to the killer app
and then subsequently everything can accelerate.
And we see it when I say the future is now, I'm really referring to enablement, right?
So how can we enable organizations, individuals to participate in this community, right?
Because really, when you think about the tiny ML community,
it really truly stretches across a number of different disciplines. And in some cases,
we are seeing individuals come from one end of the discipline or the other end of the discipline,
but now we need to be able to try to bridge that gap. And that's one of the things that we're really focused on is how can we bridge that
gap so that we can really enable the community to really be knee-deep into this ecosystem so that
we can get more and more use cases and we can get to that killer app and we can really drive that
adoption into the millions and billions of devices
that are enabled from an intelligence perspective. And that is truly the area that we're focused on.
And so we really kind of see ourselves as now taking that baton and really accelerating TinyML
into the community, whether it's through automation,
excuse me, whether it's through enabling organizations to be able to implement machine
learning, as well as being able to get them into a production state. So we kind of see ourselves as
the enablement piece now so that we can achieve some of the goals
that Pete and Evgeny set out when they started this movement.
Thank you, and I am so excited and passionate about TinyML to complete this
in one minute, but I'll do my best.
So the way I see what is going to be
happening in the future in a there will be two things.
One is simplification and growth and impact.
So I think there will be several directions of this.
On the tool side of things, it will be, again, simplification and standardization happening,
like easier for people to use this type of techniques without kind of knowing the deep science of there.
At the ecosystem level, we are going to see a lot of growth and consolidation.
And I think we already see this.
I mean, like in the past months, we saw several M&As, several VC investments into this space.
So there will be a lot of things happening there.
And on the growth side, we also see a huge momentum on education side, academic education,
going actually all the way to high school
education. So we see a lot of talent coming to the workforce to make things happen, and
that's actually a big part of the ecosystem there. On the technology side, we'll continue
seeing more hardware technologies coming, more algorithms, more software tools. But
I think what will be important to see is co-designs.
We already see a big trend there with NAS tool, neural architecture search tools.
Basically, those tools design your hardware and software in one fashion, in a co-design
fashion.
But what is more important, I think we are going to see the impact of these technologies,
both the business impact and also the social impact.
And that's one of the areas I'm super passionate about, TinyML for good, because I'm a technologist
by my heart, but I'm a human being.
My DNA is a human, right?
We want to make sure that these technologies do good for people.
And at the foundation level, we're starting several initiatives in TinyML for good.
We are going to continue them at the European event,
European Middle East Africa event coming in 10 days,
like what Blair mentioned.
So I think we are going to see a lot of impact happening in this field.
So the future is really bright, but as Blair said, the future is really now.
So I think at the foundation level, just to wrap it up,
I think we started two years ago and the mission was the awareness, building awareness.
So this mission has been accomplished.
So now we have this huge network of experts, enthusiasts, beginners, like thousands of people there.
So what we're trying to do at the foundation level now is to connect people together.
So software people, hardware people.
And we are moving more kind more from event-based things
to project-based.
We are doing, for example,
TinyML Vision Challenge.
We are asking people to use technology
to solve problems.
So it's really kind of moving
from the awareness phase
to doing things together,
making a difference type of thing.
And events are still going to continue.
We have the summit.
We have the European Middle East,
African event, the Asian event,
basically having all this kind of local events.
But really the key is to connect all this community
and start doing a real thing.
In addition to what companies do like Google, Qualcomm
and other companies doing in-house,
but really to promote these collaborations, partnerships
and solving big problems or problems around you
using this technology.
Because the technology is there, it's good enough.
And I think the next step for us is really to create this huge momentum
in solving problems around us.
And that will create a lot of opportunities,
both on the business side, but also on the social impact side.
And that's what keeps us actually excited about this.
Good. Thank you.
Thank you all for contributing to a very interesting discussion
for me because as an outsider in the field I learned quite a few things from this and
thank you for wrapping up and summarizing your goals which I think are quite broad and
I think they make sense in the way you describe them.
I hope you enjoyed the podcast.
If you like my work, you can follow Link Data Orchestration on Twitter, LinkedIn, and Facebook.