Orchestrate all the Things - Useful Sensors launches AI in a Box, aiming to establish a different paradigm for edge computing and TinyML. Featuring Pete Warden, Useful Sensors CEO / Founder

Starting point is 00:00:00 Welcome to Orchestrate All the Things. I'm George Anadiotis and we'll be connecting the dots together. Stories about technology, data, AI and media and how they flow into each other, saving our lives. Would you leave a Google Staff Engineer role just because you want your TV to automatically pause when you get up to get a cup of coffee? Actually, how is that even relevant, you might ask.

Starting point is 00:00:23 Let's see what Pete Worden, former Google staff research engineer and now CEO and founder of Usual Sensors, has to say about that. Although naturally much of what he did was based off of things others were already working on, Worden is sometimes credited as having kick-started the TinyML subdomain of machine learning. Either way, TinyML is getting big and Επομένως, το TinyML αυτοκίνησε και ο Worden είναι ένας μεγάλος μέρος του. Ο Usual Sensors είναι η τελευταία εμπειρία του Worden. Απλώσαν μόνο το πρόγραμμα AI in a Box, το οποίο οδηγούν σε offline, private, open-source,

Starting point is 00:00:58 μεγάλο μάθημα για συζήτησεις και περισσότερα. Αν και δεν είναι το πρώτο πρόγραμμα που έχει δημιουργήσει ο Useful Sensors has created, it's the first one that's officially launched. That was a good opportunity to catch up with Worden and talk about what they're working on. I hope you will enjoy the podcast. If you like my work, you can follow Link Data Orchestration on Twitter, LinkedIn, and Facebook. Yeah, I'm Pete Warden. I'm currently CEO of Useful Sensors. I was on TensorFlow when we last spoke at Google. working on tiny ML and generally running machine learning on devices outside of the cloud. All right, cool. So obviously you're not doing that anymore and you have a new, well, it's how we call it venture uh which you just mentioned so useful sensors and so what was the motivation so what what drove you to uh to leave this otherwise i guess for many

Starting point is 00:02:15 people enviable position you had that google and just start this this new thing um so there were a couple of reasons. One of them was, as you know, I was working on TensorFlow, an open source machine learning library from Google. And specifically, I'd been focused on TensorFlow Lite Micro, which is aimed at running machine learning on embedded devices. And I was working on that because I really wanted to see everyday objects in our world. I want to be able to look at a lamp and say on and have the lamp come on. Or I wanted my TV to automatically pause when I got up to get a cup of tea without me having to find the remote every time.

Starting point is 00:03:10 And when I went to talk to companies that made light switches or TVs, and I'd tell them all about this wonderful open source code that they could get for free and all the conferences and documentation and examples and books, they would hear me out. And then at the end of it, they'd usually say something like,

Starting point is 00:03:40 that's great, but we barely have a software engineering team. We definitely do not have a machine learning team. So can you just give us something that gives us a voice interface or tells us when somebody sits down in front of the TV? That was really the starting point for Useful Sensors was, okay, can we actually build something or some things that just provide these machine learning capabilities to all of these consumer electronics and appliance and other companies who could benefit from it, but without them having to worry about the fact there's machine learning happening under the hood. Okay, I see. Yeah, I mean, that's a valid motivation, but I have to say, I also had the chance, you know, just doing a little bit of basic research, let's say, before the conversation, to hear yourself speak in a video that's actually recorded

Starting point is 00:04:49 and available on the Useful Sensors website, in which he also cites something else. So you share an anecdote there when you were talking to some person you know and he mentioned, he or she, I don't know, actually mentioned what I usually refer to as the Google creepiness factor. So, and it's been, it's something that actually keeps coming back for, not specifically for Google, I have to say that, but Google is one of the perpetrators, let's say, of this. So, you know, the usual thing, like, okay, I had this conversation about, I don't know, XYZ,

Starting point is 00:05:26 and I had my mobile loan. And then the next day, or, you know, for the next week or month or number of months, actually, I get bombarded with ads about this XYZ thing that I was talking about. And this person asks you, like many other people ask, so how is that possible? That obviously must mean that, you know, they're actually tapping to my conversation. And well, I had the impression that that was also a deciding factor, let's say that sort of motivated you to do that because the thing that you just mentioned, so being able to offer like a very, very simple interface

Starting point is 00:06:02 that people will be able to use, I'm pretty sure you could have just as well done that uh wearing your google hat as well well that's definitely true that was my second motivation was really trying to um you know as say, tackle the creepiness head on. And what I was able to say about that common perception that, you know, big tech companies are spying on our conversations, at least I know for Google, when I was there, I worked on that code. We weren't, but I couldn't prove it because everything is internal inside Google. So the second big motivation was to build systems that were simple enough that they could actually be verified and checked by a third party. Somebody that, you know know consumers can trust you know I would love to get the um EU data regulators involved in this I would love to get in the US people like consumer reports

Starting point is 00:07:18 or the FTC actually you know trying to check this, have nutrition labels, so that there's actually, at least in part, that seems to imply open source so that software that people and third party auditors in general are able to inspect and then review. But before we get to that part, I'd like to ask you, so, all right, so I estimate that usual sensors must be about a year or a year and a half old by now, more or less. Yeah. Okay. So where are you right now? So what's your current plan and what's your status in the execution of this plan?

Starting point is 00:08:24 Do you already have products? Do you have some people on board? What are you working towards? Yeah, so that might actually be a good time to sort of show you this demo of text-to-speech. And hopefully you can see that it's actually providing uh live captions of this conversation yeah so uh this is our latest product that we're going to be launching on September 19th through CrowdSupply.com. So prior to this, we've also launched the Person Sensor, which is a small board, which you can kind of see here, that provides a indication of whether there's a

Starting point is 00:09:30 person nearby running entirely locally on a, at the moment, we're retailing this for $10. And we've also launched a tiny QR code reader that you can get for $7. So we've actually been launching products to sort of back up our vision of running machine learning locally and really being able to do it in a private and a checkable way. Okay. So, well, based on what I've just seen, then it would seem that, well, what you actually mean by products is actually chips because, well, that's what you just showed me. So I guess in that sense, well, let's just frame it another way. So who is your audience?

Starting point is 00:10:27 Is it going to be people who buy those chips and then have the skills, the capacity, and also the infrastructure, like the boards and the ability to connect those chips to those boards? And actually, I guess you're also going to be needing sensors to make those products actually viable. So, for example, if you're going to be able to detect a person's presence, I don't know, you need a camera or, I don't know, something else that you're able to leverage in order to make that happen. So it sounds to me like they're actually more geared towards people who have the ability, the skill set, and the rest of the equipment

Starting point is 00:11:06 to sort of integrate them into something bigger. So not products in the, I don't know, more traditional sense, like self-contained. Yeah, and that's a good question. So these modules, the person sensor and the tiny qr code reader um they are self-contained so they contain a small camera um as well as the um all of the pre-loaded software needed to um make it so that all you have to worry about with this is you plug it in and then you get, for example, one pin that goes high whenever there's a person around.

Starting point is 00:11:51 So we're trying to take all of the capabilities of machine learning, but wrap them up in a package that's no different than a temperature sensor or a pressure sensor and provide very very very simple interfaces now you're correct that these are very much aimed at you know both makers and you know at the largest scale consumer electronics companies appliance companies we've worked with people who make audio equipment tvs um you know coffee machines vacuum cleaners um but uh the new ai in a box which we're launching uh which i showed you doing the captions um that's actually uh it is aimed at makers. It's something that you can go in, you can sort of change the code and things like that.

Starting point is 00:12:50 But it does a bunch of useful stuff just kind of out of the box as well. So it gives you live captions, as you've seen. It lets you talk to a large language model that's actually running locally as well. And we're going to be implementing live translation as well. So that is still something that is, you know, maker friendly, but also has some immediate use cases out of the box okay so you know wearing my i don't know startup consultant hat then i would probably say something like well it sounds like your primary audience then in terms of what's the media the biggest market that you can reach would actually not be end users, so consumers, but

Starting point is 00:13:46 rather electronics makers and companies that may actually have the need to consume what you package as products en masse and then will use it for their own purposes and resell it as part of something bigger yes i mean that's really um where we're a venture-backed business and you know to justify that investment we need to uh grow and we need to have um you know volumes uh of units that we can sell and that's and as well, like I said at the start, I really want to see these capabilities in everyday objects. And so it's a really great opportunity to try and engage with some of these large manufacturers and see if we can actually improve these everyday objects.

Starting point is 00:14:44 Indeed. And I was actually going to ask you about your venture capital backing anyway, so I will return to that point in a bit. But before we do, there's something else that sort of popped up while listening to your line of thought about your potential audiences. So in my mind, there is, well, there's a few of those, let's say, big potential, big customers that you aspire to reach out to that could actually also have the impact of reaching out to end users, to consumers. So, well, mobile device makers come to mind or, you know, companies that in whatever other way make end products that are meant to be sold to consumers.

Starting point is 00:15:37 So, do you have access to any of those? And I understand that you may not be at liberty to disclose things that you have not entirely sealed off yet so you don't have to mention names but you know just broadly speaking are you in conversations with any of those types of companies? We are we actually have evaluation agreements with several large consumer electronics companies for example um to uh to check out you know to to you know the first stage of trying to get these into their products um and i am hoping you know we're just launching this speech to text, you know, work, you actually use it, you'll find that you can get automatic captions running on device for any video or any other content that you actually play on the phone. But that's not widely available in the sort of Android ecosystem.

Starting point is 00:16:59 So one of the applications I'd love to see is, you know, make everyone's Android phone able to have that capability too. Yeah, indeed. I mean, not a great amount of people are Pixel owners. And so if you somehow manage to get that kind of capability in just, you know, every other phone out there, I'm pretty sure that people will make use of that and it's also going to be easier to to reach more a greater number of people that way rather than you know starting to sell something which is standalone like oh buy this

Starting point is 00:17:39 device x so you can have this thing it's much easier because everybody already does have a mobile phone so it's much much easier to reach people that way yeah that's very true so i guess that in order to make that happen well obviously the fact that you are well well-known figure in the community and you know you have been with google and you have this body of work behind you obviously helps in opening doors but i'm guessing that well you obviously you can't do everything yourself and well opening the door is not enough you have to be persistent and you have to reach the right people and you know these things can take lots of time and energy and all of that so obviously you're not alone doing that so what what kind of team do you have working with you

Starting point is 00:18:27 at Useful Sensors? Well, I'm very lucky to have my co-founder, Manjanath, who was also one of the founders of TensorFlow. He also helped found the CUDA compiler team at NVIDIA. And he worked at Cerebrus, the large chip company, for several years, starting their compiler team. So he's been an amazing producer of technological miracles, I'll say. He's really helped us, for example, run these modern transformer models much faster

Starting point is 00:19:14 than other people have managed on especially this particular device we're using for the caption box. And then I have a fantastic team of people that either me or Manjanath have worked with in the past from places like Google and Cerebus. So there's just eight of us. So we're still a pretty small team. But I've been really blown away by how much they've been able to accomplish.

Starting point is 00:19:56 Okay, so that's a good segue to return to that venture funding topic. So even for a team of eight, and even if these people are very much self-motivated and able to work independently and all that, well, they still have to pay bills at the end of the day. And I also guess that you don't have much sales at this point because, well, you're an early stage startup, which sort of directly implies that you must have some sort of funding. Well, maybe you were able to bootstrap initially, but that can only take you so far. So obviously, you must have venture capital backing and I was wondering if you're able to disclose who are your backers and well have you had like a seed round or pre-seed?

Starting point is 00:20:54 Yeah we had a seed round back in May last year where we raised $5 million roughly. And our lead investor was Mike Daubert at Amplify Ventures. And we've also had investment alongside that from James Cham at Bloomberg Ventures, Ava Ho at Fika Ventures, and Anthony Goldblum uh who you might know from kaggle uh actually led um from ai x ventures a pretty new um firm from him and chris manning and richard socha um so uh most of those people i've known for over a decade so it's actually been really great to uh great to work with them yeah well i'll be honest i don't know all of the names that you mentioned but some I do know and knowing judging from them I'm able to tell that well it sounds like a sort of insider round if you will so people who are actually able to

Starting point is 00:22:15 understand the kind of work that you do and therefore wanted were motivated to be early investors in that so and I'm guessing from the sound of it all, that you'll probably reach a point where you will actually raise Series A. And if you had like a 5 million seed round, then, well, I don't imagine you can have anything less

Starting point is 00:22:37 than 10 for your seed. Yeah, for the Series A, that's probably going to be about right. We're still figuring out the timing of that um but uh you know it's it's been really good um you know seeing the reaction that we've had to the ai in a box um you know that has been uh you know, I think that that's going to be a, you know, a flagship product for us as we, you know, sort of go into the next fundraising round. Well, definitely in terms of awareness, because that's probably the one thing among your products that has the potential to reach more people directly. Indirectly, I still maintain that the rest of your products

Starting point is 00:23:29 are eventually probably going to be used by more people, but on the things that you can directly send to end users, probably AI in the box is it. Yeah, I think you're right. Which brings me to, well, something completely different. And, well, I'm going to actually ask you to kick out freely on this one if you want to, because we didn't do that so far. So even from the start, when you mentioned this AI in a box

Starting point is 00:24:01 and how you are able to somehow pack a large language model in a presumably not so powerful chip, I got curious. So I wanted to ask you if, again, you're able to share which model exactly it is that you're using. I'm guessing that probably it's something open source with a permissive license that actually enables you to do this sort of application. And I also wonder if you have tweaked it in any way, whether by using something like LoRa,

Starting point is 00:24:38 which many people use these days, or, I don't know, some kind of custom training or fine-t tuning or whatnot? Yeah, so interestingly enough, like most of the, you know, a lot of what we focused on is the speech to text, trying to get that real time. And running large language models locally, there's a very active community of people doing that. So we actually had, you know, once we were able to get the real time speech to text working, we actually had a lot of choices around which models, which large language models we could actually run locally. And I'm actually not certain exactly which variation we are using in the current one,

Starting point is 00:25:36 because we've gone through, I think, things like Vicuna, Orca, a lot of stuff based on like llama 2 and we have been looking at doing some of our own fine-tuning we've been able to get a long way with just providing kind of prompt contexts for the interactions. So that has been, you know, very interesting, you know, sort of asking the large language model to be a bit of a comedian, or to, you know, try and give short answers, or, you know, these other ways that you can actually steer it. So, yeah, we're, you know, anybody who's familiar with sort of the large language model work, they would sort of look at what we're doing and be like, okay, yeah, that's, that's, that's using kind of known, you know, known models, and it kind of makes sense. And what's really new is that you're actually able to talk to it in natural language and actually have it talk back. Okay, so this is the part, the text-to-speech is a part that's,

Starting point is 00:26:56 well, not really proprietary because I was under the impression that at least some part of your work was proprietary, but I'm not so sure anymore actually so well something that that you built in-house let's say yes exactly and i i think i shared a link on email earlier to the useful transformers library uh which is a way of taking advantage of the neural processing unit that's actually available on these rock chip socs that we're using they're mostly arm cortex a you know sort of they'd be familiar to anybody who's used like a phone soc except that they have this accelerator for neural networks on it and we've actually been able to take advantage of that um to run our uh speech to text um several times faster than uh would otherwise be possible yeah well thank you i was actually also going to get to that part because we did be a little bit more expensive,

Starting point is 00:28:27 but also have more processing power. And we, so it's, it looks very much like a Raspberry Pi or phone SOC. The biggest difference is that, you know, we picked this one because it does have this neural network accelerator on it. Yeah, yeah, makes sense. And well, since we sort of touched upon the open source slash community issue, indeed, you did share that link to your GitHub, which I didn't see recently, but that sort of triggered me to look around again and i have a very good explanation for why i miss it because it's nowhere in your site so i wonder yeah no i mean that's that's that's true um we at partly because we have not launched the um the uh ai in a box yet. So we actually, you may notice,

Starting point is 00:29:47 we don't have that up on the site yet either because that's going to be launching on September 19th. So that's where we normally put the links to the open source repos is in the product description. So that's why it doesn't show up on the website yet. We're talking about this sort of pre-release. But you can find it if you go to github.com slash useful sensors. You'll be able to see everything that we've open sourced there.

Starting point is 00:30:23 So presumably at this point in time, the only contributors are going to be, well, you and your team, basically. Yes. So I'm wondering if sort of community building, let's say, and evangelizing and so on is actually part of your plan as well. And if yes, how do you intend to to go about it or

Starting point is 00:30:48 what you know maybe you're just going to say oh you know you're just we're just going to release the product and then we'll just sit back and wait let people come to us but i don't know well i think One thing to do is look at our projects on Hackster that we've both done ourselves in terms of sample code for things like the person sensor and the tiny code reader. projects using these, you know, using our examples as a starting point, but actually, you know, building their own systems on top of this stuff. So it's not so much that we're expecting a lot of engagement with the open source repos that we've made available. That's sort of almost like the foundation layer. What we want people to be able to do is, for example, take the speech to text that we do through our AI in a box. And one of the things you can do is use it like a USB keyboard. So it just sends the text that it's hearing to the, you know, to whatever device it's plugged into as if it were a, you know,

Starting point is 00:32:22 somebody typing at a keyboard. So that means that it's actually quite a nice way to integrate with something like a Raspberry Pi, where the Pi can actually then, you know, control a robot, or a sculpture or something like that, using voice, you know, speech to text, speech recognition. So we're hoping that these, you know, these things we're doing will be useful building blocks for people to build their own projects around. And that's where we've been seeing a lot of community engagement already with our existing products and where we're hoping to see more.

Starting point is 00:33:08 Yeah, well, I would add to what you just said that I can also picture this kind of scenario, but I think an important building block to make that happen is actually what you also already have. So having this large language model somewhere in the middle because that can be your interface to whatever sort of api or programming language your device in the in the back end can can communicate to so you can you can say something in natural language, like, I don't know, turn on the lights or whatever. And well, obviously the text, the speech text part will get you the textual command.

Starting point is 00:33:54 You can, and then you can pass that to the large language model, which may be able to translate that to an API call and which can actually operate on the physical switch and do what you wish it to do. Yeah, exactly. You know, that's a big reason why we've included the large language model is, you know, it's fun to chat to and ask it to tell you jokes and, you know, you can ask questions and sometimes it will make stuff up. But a lot of the time it will, you know, it will tell you, you know, tell you something factual.

Starting point is 00:34:47 But it gets really interesting once you start the soup of natural language that we use when we're having conversations with each other. Yeah, indeed. So overall, it sounds like your positioning, let's say, would be that of an ecosystem creator. So maybe you have sort of tapped with a couple of demo applications or maybe even not so demo, like actual applications. But it sounds like you're positioning useful sensors as an intermediary, as someone who provides the infrastructure that other people and other companies will be able to use to create applications and products.

Starting point is 00:35:34 So, well, first of all, I wonder if my understanding is indeed what you're thinking of. And then if it is, what are some favorite current applications of yours and what are some applications that you would like people to to come up with yeah and i think that's that's a good way of describing our positioning is we um we want to enable other people to take advantage of all of these ML capabilities without having to kind of go down the person sensors, is actually by Thomas Burns.

Starting point is 00:36:34 He created a robot called Alexatron, and its eyes are actually controlled using a person sensor to kind of track your face. So it's just a little detail, but it's always, you know, on the commercial side. Blues Wireless have actually been doing some great work around, hey, is there a person here or not for remote sensing so just being able to tell um you know for safety reasons is somebody in an area where you know there there could be you know danger um and uh you know being a we've had people just using um we actually had one person out in new york who's uh been looking at using the person sensor he's an actor and he want he has to pay somebody to train a spotlight on him when he's doing solo performances um so using the person sensor he's actually able to um or he's trying to sort of use that to automatically control the spotlight uh which is not something I would have ever thought of,

Starting point is 00:38:08 but it's the sort of thing that is really exciting once the stuff gets out into the world. Yeah. Well, that's also a good trigger for me to ask you another sort of closing question, let's say. Well, it's a good example of well uh what happens when you get like new technological capabilities and innovation i mean yes on the one hand it's it's sort of cool that this this person is able to do that on the other

Starting point is 00:38:38 hand somebody might say well you know and what about the person who used to get paid to shine that spotlight on him when he was performing? So what's your take on that? I mean, obviously, it can go either way. And it's a much, much broader question than the specifics of what you do. But still, I wonder what are your thoughts on that? Yeah, and I think, you know, the whole question of, hey, is AI going to, you know, get rid of jobs? You know, from my perspective, I do see, you know, some jobs may go away. It's hard to deny that, you know some jobs may go away it's hard to deny that you know even when we're looking at things

Starting point is 00:39:29 like driverless cars you know that's likely to have a big impact there but I tend to be fairly optimistic that this opens up the door to actually people um being able to take uh you know take other jobs um you know and i'm really trying to be quite thoughtful about you know the stuff we're doing to see if we can actually make people's jobs easier versus trying to completely replace them. But yeah, you're right. That's a really big question with anything around innovation. If we're kind of, quote unquote, making things more efficient, what are the societal impacts of that?

Starting point is 00:40:23 Obviously, it's not something well either you or myself or anyone on their own can can tackle but it's it's always something to keep to keep in mind at least yeah and a big part of what i'm trying to do is actually get these technologies into people's hands so that it's not just a bunch of engineers who are making these decisions about what we should do but you know people can actually try these models for themselves and see for example how useful but also flawed like the current generation of large language models are you know they'll happily if you ask them about somebody that you don't it doesn't know about it'll happily tell you oh yeah that person's a criminal um and you know so it takes a little a little of the

Starting point is 00:41:19 well it gives people a much you know i'm hoping i hoping, I don't want us, you know, technocrats to be the ones making these decisions. I want a well-informed public who are actually able to say, Hey, this is what we want. Yeah. Well, that's, you know, that's, that's an opener for another like, I don't know, a 45 minute conversation on, on, on its own. So let's, let's leave it at that for the time being.

Starting point is 00:41:48 Well, thanks for dropping in, really, and for letting me and anyone who may be listening know about what you're up to. It sounds really interesting. And, well, good luck with everything. No, thank you so much, George. Thanks for sticking around. For more stories like this, check the link in bio and follow link data orchestration.

Orchestrate all the Things - Useful Sensors launches AI in a Box, aiming to establish a different paradigm for edge computing and TinyML. Featuring Pete Warden, Useful Sensors CEO / Founder

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.