Big Technology Podcast - How Amazon Rebuilt Alexa From The Ground Up — With Panos Panay and Daniel Rausch

Episode Date: March 5, 2025

Panos Panay is the senior vice president of Devices & Services at Amazon. Daniel Rausch is the Vice President of Alexa at Amazon. The two join Big Technology Podcast to discuss how the company rearchi...tected Alexa, blending a deterministic system with the latest generative AI technology to create something that can both turn your lights off and speak with you about philosophy. We also discuss how all big tech companies seem to be converging on the same contextually aware, general AI assistant, and why Amazon believes Alexa has a chance to win. Tune in for a front row perspective on one of the tech industry's biggest AI projects. --- Enjoying Big Technology Podcast? Please rate us five stars ⭐⭐⭐⭐⭐ in your podcast app of choice. For weekly updates on the show, sign up for the pod newsletter on LinkedIn: https://www.linkedin.com/newsletters/6901970121829801984/ Want a discount for Big Technology on Substack? Here’s 40% off for the first year: https://tinyurl.com/bigtechnology Questions? Feedback? Write to: bigtechnologypodcast@gmail.com

Transcript
Discussion (0)
Starting point is 00:00:00 The Amazon leaders who spearheaded the new Alexa are here in studio to talk about what it took to rebuild the pioneering AI and where voice AI is headed in the age of large language models. That's coming up right after this. Welcome to Big Technology Podcast, a show for cool-headed, nuanced conversation of the tech world and beyond. We are joined today by Panos Penae, Amazon Senior Vice President of Devices and Services and Daniel Rausch, Amazon's vice president of Alexa for a a fascinating conversation about what it took to rebuild Alexa, effectively from the ground up. Gentlemen, so great to see you. Welcome to the show. Thanks, man. So great to be here. It sounded kind of fun.
Starting point is 00:00:41 You both must be relieved to have this out. Yeah. I mean, excited. Relieved is a tricky word on this one. You know, we're finishing the product now. It's coming out next month. So we're pumped that we're through the event. And yeah, there's some relief, I would say. Would you agree? You feel a little bit of relief? but the truth is, like, it's all about getting it in a customer's hands as fast as possible. So you still, the team's feeling that urgency right now. Yeah, that's the big moment for the team, right? You get that first customer response.
Starting point is 00:01:09 So we're, we still feel like we're building towards it. But yesterday was great. Okay, so I have three echo devices in my house. We have three rooms. Yeah, what are they? The house is generous. But in my apartment, there's one in the bedroom. There's one in the kitchen slash diner room and there's one in the office.
Starting point is 00:01:24 Yeah. So they are first generation. I'm really looking forward to getting these updates. working hopefully within these devices and getting a chance to use a new and improve Alexa. I've been hanging on to the echoes for a long time in the hope that something like this would happen. So we're here. And I was at your event where you were announcing it. I'll give listeners a little bit understanding of what I saw. And then we're going to go into some questions about what it was like to build this. So this new Alexa, it's called Alexa Plus. It is conversational.
Starting point is 00:01:54 So it understands natural language. It understands your context. And you don't have to say Alexa every time, we'll sort of have a back and forth with you. It is, I think you could call it agentic. It allows you to take action like book a table, call on Uber. It will go out on the world and help monitor ticket prices for you, for instance. And it's also deeply integrated into Amazon services, and namely Prime. It's going to be free for Prime members, 1999 a month if you're not a Prime member. And the coolest thing I saw in the demo was that you, I think one of you asked for a the song with
Starting point is 00:02:29 what was it Bradley Cooper and Lady Gaga I didn't say Lady Gaga I just said Bradley Cooper Brad the Cooper in that what was the movie called
Starting point is 00:02:37 Stars Born Stars Born Star is born great movie and then it called up play the song and then you said
Starting point is 00:02:45 now let me see it in the movie and it connects to Prime video and you could see in the movie so very cool product definitely I think what a lot
Starting point is 00:02:54 of us have been hoping to see from the Alexa team and from Amazon on Alexa. We're going to talk a little bit about what it took to build it and then the strategy here. So I think the first question I need to ask you both is what did take so long? Because I think that for all of us who, you know, I think there's 500 or 600 million Alexa enabled devices out there, we've been wondering as open AIs of the world and other companies have made these big advances on voice AI when Amazon was going to make its move.
Starting point is 00:03:24 And you have made the move. But what was the process that made it take as long as it has, Panos? I think the easiest way to say it is when you have hundreds of millions of customers that are active right now, I mean, this is we talked a little bit about it yesterday, but every one of them matter. How do we make sure they all get the great experience they need, meaning you can't start from zero and ignore it. And if you could, it could be much faster, although it's not that easy to hook up the thousands of. of APIs and all the partners that we're bringing together and all the experts. It's not, it takes time. But the first thing is, and there's two parts to it, but the first thing is you got
Starting point is 00:04:04 hundreds of millions of customers. They love certain things that they do on Alexa today. They might not love everything, but they love certain things for sure. You can't, you can't leave that behind. Can't wake up one day and whatever you use Alexa for, whether it's timers or music, you can't not make it better and great. And so you don't feel like something was taken away from. you. When you take something away from a customer, you've just missed. You've missed.
Starting point is 00:04:31 And so that's one. It takes time to make sure you can get it all done. So everything on what you would call Alexa, not Alexa Plus, works on Alexa Plus, but better. And that was just the first point part of the vision, can't leave anyone behind, which was important. We can talk about devices and so forth, but customers who love their products that are in and they need them, we can't take that away. That was one. Second piece is you're re-architecting from the ground up. So you've got first the weight of keeping hundreds of millions of customers, and then you're re-architecting from the ground up. If we started from zero customers, I think this is a different story. You can move a lot faster. We can solve problems and then just add features as we go,
Starting point is 00:05:12 if that makes sense. So maybe we just had a conversationalist, a pretty cool one, then we can add personalization, then we can add memory, then we could add the experts, and people would just get updates along the way and maybe learn and be great. However, on day one, we need to support everything people love and know about Alexa, day one. And so a little bit of patience there and it takes a little bit longer. And the vision was, the vision was clear, like we're going to go bring a conversational agent forward, an assistant for everyone that is smart, has memory, can personalize to you, and then ultimately be incredibly useful. And so when we had that laid out, out, we're okay, great, but we can't leave any customers behind. And right at that point, you
Starting point is 00:05:58 kind of step back. Once you put the vision together, you realize you need a full re-architecture, but you're not going to leave your customers out. So you're re-architecting pretty much two stacks at that point. One, what is classically known as Alexa, to be awesome and come into this conversational world. And the other is everything new that it has to do. Yeah, and I want to go a level deeper with Daniel on this one, because, upon us, what you're talking about, a re-architecture is sort of what I've heard has been the holdup here with Alexa for all these years, which is that, and Daniel, tell me if I'm wrong. But basically what folks have told me is that the old version or the original version of Alexa
Starting point is 00:06:34 was built with a lot of like if then commands, right? So, you know, it will understand some structured commands, turn on the lights, okay, then it will take that and almost like deterministically say, okay, I understand this command, this is what I'm going to do, turn the switch with large language models. it's a completely different ballgame because you have to make room for uncertainty. So actually, the fact that you've been able to introduce an Alexa with large language models, which I think will be able to keep that functionality as an engineering feat, that's my perspective from the outside.
Starting point is 00:07:07 What is it actually like on the inside and how close is that assessment to the challenge? Well, the team would love to hear you say engineering feat, because I do think that's— I don't think there's no lack of feet. It is real. That is the size of the task, for sure. I think you're on to it for sure. You know, large language models, the one thing I'd add just in terms of thinking through the technical architecture to what Pano said is that it's really just the latest generations of large language models that can even do the things that Alexa needs to be able to do. So you're talking about our NOVA models, right, which we announced within the last few months and starting to get into customers' hands.
Starting point is 00:07:45 That's super exciting, you know, partnership that we have with Anthropics. Like, you really need very state-of-the-art technology at the base of the architecture in those large language models. And in large part, because of what you said, we need them to behave in ways that we can predict and are certain. Someone says, lock my door or, you know, play that song. You want it to happen, right? Some are higher consequence than others, and you really need to get it right. But you also want all the elegance and nuance and understanding and non-deterministic behaviors of large language models themselves. right? So we would call that a stochastic system that, you know, it's literally at runtime that
Starting point is 00:08:24 you're making those determinations. So if you want to integrate tens of thousands of services on day one, day one out of the box, take advantage of everything that Alex has always been able to do, as Panos was saying, and introduce all of this new, unbelievable behavior that you can get out of large language models. That is a big engineering feat. So how does it know when the user is saying turn the lights on versus like something more esoteric. Like, is there something built within the technology that's kind of like a switcher that determines first your intent and then decides which part of the model to send it out to? The way to think about it is, you know, at the base level, you have large language models and
Starting point is 00:09:03 you have this model agnostic system that's even itself going to choose the right model for the job. And the models play different roles in there. What's already happened is, even honestly, sort of in the way you asked a few of the questions, is that people assume the large language model is the product. A product like Alexa is so much more than, quote, unquote, just a large language model. So you have models playing many different roles in the system overall, even models helping us decide which model, and models themselves deciding if they're the best, you know, tool for the job, so to speak. So then you have a system that progressively decides how to get something done. I wouldn't think about it like a switch or something in classic computer science that is a, you know, it's
Starting point is 00:09:48 a gate. That's not how the system works. It's a collection of model behaviors and systems downstream of that that complete specific tasks. And that's where we introduce this term expert to try to help coalesce around the system behavior and explain it better. The large language models are interacting with these experts that do things like get you the sports score, play a song, play a video, know where you are in the song so that you can go to the video, like all the things that you saw yesterday at the event. And so, Panos, this is a mixture of experts' model. It is.
Starting point is 00:10:20 If you think about it, a mixture of experts model, but each expert theoretically has its own model as well. So you're building on top of it. Each expert is smarter. When you think experts, it's like it's a weird term, yeah, but there's think photos, smart home, entertainment, whether that's music or video, local info, all the partners that connect.
Starting point is 00:10:41 Do you have communication expert, you have an artifact expert, you have a memory expert, you have a personalization expert. Each of them play a role, and they kind of arbitrate with each other at all times. So like the model is just lighting up when it determines that that's what you want to do. That's right. Daniel kind of said it well, like, because the LLM at the bottom of that stack is, it's deterministic. It's choosing which model to use, then the experts come into play on top of it. It's a pretty phenomenal way to, you know, it's a pretty interesting way to think about it. This is a mixture of experts model for those at home.
Starting point is 00:11:13 It's been part of what DeepSeek has used to become much more efficient in its reasoning, for instance, because instead of lighting up the entire large language model is deciding to light up certain areas that might be, I mean, it's not a deep seek innovation, but they've just kind of used it to an extreme extent. Has that, has using that architecture helped you build this in a way that's, for instance, like reducing latency or sort of lightening the compute burden that you also? otherwise might add. If you want something incredibly fast, stable, even secure, like the paths on data, right, where you're really taking care of customers, this is the fundamental approach, I think,
Starting point is 00:11:54 that that is state of the art. And accurate. And for sure. Don't forget accurate. So important. Yeah. But on that note, I mean, are the new Alexa, is there going to be some sacrifice to having those Alexa commands, those standard, turn the lights on, set the alarm, in order to enable all the LLMs to work the way that they're going to?
Starting point is 00:12:15 I think you just called out the sacrifice and it's time, like how long it's taken us to get to where we are. That's why it's my favorite question. Like, why is it taking you so on? Like, if I told you where we were four months ago on, somebody said, lock that door. And then we had to determine what that meant versus in the past, lock my front. door. And you had to know it was the front door and you had to say front door. It's pretty phenomenal, but, you know, six months ago, it took longer than anyone would wait to lock a door. And, you know, our customers need immediate response and we won't make that tradeoff. So to be that
Starting point is 00:12:56 accurate with the latency that's needed, with the speed sub two seconds at the end of the day, you end up needing a little bit more time or finding the expert so the expert can be quicker and the model can pick the right model quicker, and the smaller model can be trained to make sure it knows where the door is. He gave an example earlier, which I thought it's a nuance, but let me just share it with you. Previously in Alexa, you couldn't say
Starting point is 00:13:21 play that song. It would look for a song called That, right? It was that simple. Now the model has to reason and say that song. I wonder what he's asking, I wonder what she's asking, I wonder what the person's asking. That's what's happening in the system.
Starting point is 00:13:38 Then the expert shows up, looks at the history, the personalization, what conversation were we having, play that song. Oh, he's talking about the conversation we just had about Bradley Cooper and Lady Gaga, shallow play. That all happens in, you know, sub two thousand, you know, how many milliseconds are we talking? We count in single milliseconds now in system component. So now you're, all that is going on and the stacks working through it versus today, which is play. shallow and that's the only way you're going to play shallow yeah that's it and so i think it's just understanding that nuance um in where natural language comes in where you can talk to the you can talk to Alexa without being precise just like you can talk to me and i'll use some micro tells
Starting point is 00:14:24 to get you know are you asking me a rude question a great question a nice question are you leading me um and then from those micro tells i can then move to the words and then determine where you're taking me and you don't have to write it down, type it and read it exactly. All that is happening now in the machine, which is pretty powerful. There was a cool scene in your demo at the launch event where I think Panos, it was you where you said, don't play the music in the baby's room. Yeah, so it's really, I didn't say that. So that's very explicit too, right? Don't play the music in the baby's room. The model will come up, the expert will show up, the music expert, this is where it's super powerful and go, got it, play it everywhere else. Or you can, you can
Starting point is 00:15:05 just say, don't wake the baby, play the music everywhere. Then the model will go, don't play it in the baby's room. I know what they're asking. So this is where just that small model in the expert does its job. And the fact that you can just naturally move it around, in that demo, I don't know if you noticed, by the way, nerve-wracking. Yeah, so for listeners, Panos did this entire demo live. I mean, we're going to talk about Apple Intelligence in a second.
Starting point is 00:15:30 But Apple Intelligence, I was at the WWDC launch event, and it was all a vision. And what we saw at this Alexa launch event was a working demo. Now, look, I mean, we know what to reserve, us commentators know to reserve complete judgment until it's in our hands. Yeah, you have to. You have to, for sure. But it was real. It was all real. Real and working.
Starting point is 00:15:50 Yeah. But what makes you nervous in an event like that, you're not worried about the product working. I mean, six months ago, I would have worried about the product working. And I would have showed you more vision demos. like videos, but the product's working. The challenge is the infrastructure, the thousands of Wi-Fi signals that are pinging around that room.
Starting point is 00:16:12 Like, it's just an unusual. These live environments are very unusual. Turns out tech reporters like tech, and they're using a lot of it. We're all on the Wi-Fi. Well, more, more. I mean, the signals that are being pulled from Bluetooth to Wi-Fi to, I mean, who knows what's in pockets.
Starting point is 00:16:28 And one of my favorite tech demo moments is Steve Jobs just losing his shit on stage because all the reporters are connected to Wi-Fi. And he's like, you could either be connected to Wi-Fi or you can have a demo. You pick. Totally. I mean, I'm not, you know, I think bloggers have a right to blog. But if we want to see the demos, we're not going to be able to do it unless we turn off all these Wi-Fi base stations and laptops, set them on the floor. Yes. We didn't have to have that situation.
Starting point is 00:17:00 And then you got, you know, the servers have to be lit up and you're, you know, you're worried about latency and what's happening in the room. So you got all that going on and now you're going to live and this is your baby, right? I mean, you love what you're about to show. You love it. And if it doesn't go off, like, I don't want to tell you what the backup plan was, you know. What was the backup plan? We're not going to talk about it. For real?
Starting point is 00:17:21 Let's not talk about the backup plan. Let me just say. You can't tease the backup plan and then just share the backup plan. They were really good. really good. It was not a great. It was not a great backout. No, it was. They were great. They weren't going to work, but they were great plans, I would say. I'm looking over here at some of the team that was helping yesterday. I, I, uh, but during that moment, um, you, you may have, you may have heard. It's, it was very nuanced. At one point I said, move the music, bring the music here.
Starting point is 00:17:53 I want to hear the music over there. And the reason I use different sentences, I know what the model's going to reason over and do, but I wanted to make it clear. Like, you don't have to think about what you want to happen. You just have to talk. Like, I want the music over there. Okay. And if the model doesn't know, or if Alexa doesn't know,
Starting point is 00:18:14 she'll ask you, do you mean in the living room? Yeah, so are we going to have a speed tradeoff here from the traditional Alexa tasks? Just quickly, Daniel, I'm curious. Like, is it, is the stuff I was doing beforehand, like, or doing, I'm doing now set an alarm. Is it going to take a little longer because of this process or it'll be the same amount of time? No. I mean, this is where we have such a high bar before we're willing to put it out.
Starting point is 00:18:38 And deterministic systems are incredibly fast. Right. It is straightforward computer science in this day and age with an AWS cloud and the great connectivity that everyone has in their homes to make a deterministic system fast on something exactly like you said. making a nondeterministic system fast that can respond in any way, gathers all the context, figures out legions of different things, which experts to invoke making that system fast on something as simple as an instruction or, you know, is hard. So what technological breakthroughs or innovations did you rely on to get it from a place
Starting point is 00:19:16 where you were dissatisfied with latency to a point now or you're happy? I think it's another version of using the right tool for the job and building building a system of that's frankly just more complex overall to get the simple things done. So it's a bit, you know, there's like an irony in that, but you need a system that creates very fast paths for simple things. Even though you started with an incredibly complex system already, you're adding these kinds of complexity to get simple things done. So that, I mean, I won't go into the specific technical details here, but that's the upshot. You need to be able to figure out you're trying to do something simple so that you can do it fast. And it gets tricky.
Starting point is 00:19:56 You know, people understand how to speak to Alexa today. I think our new customers, we want to, you know, and current customers, we want to open their minds on what they can ask for and how to get something done. Take the simple tasks that we have, timers, alarms. There's a different way to think about them. And then in the non-deterministic world, how to translate what's being said into what's being asked for, which is different. an example you said how quick we'll be setting an alarm it'll be lightning fast and you'll likely set it the way you always have i need an alarm set an alarm for 8 a.m i think that's the classic way to set an alarm or you can say Alexa i need to i need to wake up tomorrow at 8 okay and now that's nondeterminist and now it's going i think you need an alarm and then it'll offer you an alarm or just set it same with the timer set me a timer by the way how long do you want the timer for you say the time you can move that to set me a timer to I'm cooking my steak medium rare and then she'll say I'm setting you a timer for six minutes okay and so it's a you understand like when you get into that natural language non-deterministic what's happening what are you asking for you're cooking your steak okay I'll get you six minutes on each side or tell me how thick it is and then the answer is you know two inch thick whatever or I want a ramen egg right that's eight minutes I got you tell me when you start I'm starting eight minute timer started for you
Starting point is 00:21:27 and so the world just change from even these most simple tasks they it just changes in the spirit of by the way I never knew how long it took to cook a ramen egg so I'd always have to go to TikTok open it spend 20 seconds watching somebody make ramen eggs and then eventually it says put it in the water for eight minutes like got all you see on and then I would say and then I would say yeah it's very true by By the way, don't search ramen eggs. You get hammered with ramen eggs. But I think, and then all of a sudden you're like, got it eight minutes. Set a timer for eight minutes.
Starting point is 00:21:58 Now, just change it. Just to ask for a ramen egg. And Alexa will just determine what you're looking for and give you an eight minute timer. Okay. So just to wrap this section on the technical side, my note that I wrote to myself that said they spent too much time building the Alexa microwave and the Alexa alarm clock and not focusing on the technology. Maybe I underestimated the technological lift here a little bit. I don't know. We can't determine what you were thinking for sure, but I think there's a lift here. You said it's a feat of engineering. That's where you started. We have one of the best teams on the planet working on this. A lot of it has 10 years of history in it. There's so many people that work on Alexa today that have been there since its inception. You've got a lot of passion around that in the engineering team and the product, you know, just the product team all up. We call product makers when you put them all in a collection. And yeah, it's a feat. It's it. It's okay, though.
Starting point is 00:22:52 It doesn't matter if somebody thinks it should be easier or it's not easier or whatever. It doesn't matter. Actually, if it feels like it's easy, that sounds pretty good to me. Right. I mean, I don't mind. It means the customer's happy. Like, this must have been easy. Like, yeah, okay.
Starting point is 00:23:06 I don't care. Do you like it? Like, do you love it? Right. And I think that's where we go. So I want to talk about the vision of this product because, and the strategy that you're going to put into play here. Because, again, I was sitting in the audience and I talked about Apple intelligence before. I guess this segment of our conversation is, I have headlined, it's Apple intelligence, but it works.
Starting point is 00:23:28 And, you know, it's a little facetious. I tried not to read anything you posted coming in today because I was like, oh, no, I don't want to defend or have a preconceived notion. So that's interesting. Do you have to keep sharing? We've been talking on the show a lot about how, you know, and yeah, just we talked a lot about the buildup to WWC, the reveal. and it was a it seems like every big tech company has almost the same vision and tell me if I'm wrong here but like Apple was like the Apple intelligence demo was like you talk to Siri and ask when your flight is and you're switching flights and it's helping you pick your kids up and
Starting point is 00:24:04 that demo looked a lot like the Google assistant demo that I've seen like almost every year at Google I.O and and then I saw your demo and I was also just like, Like, this is a similar idea, which is that it's a contextually, it's a contextually aware, smart AI assistant that helps you get things done and makes your life easier. So I'm curious if you both see the competitive landscape in the same way I do, if there's something different about Alexa than the others, and how you plan to win given the landscape is developing the way it is. You want to jump in?
Starting point is 00:24:47 So I got a long one here. So why don't you just, no, you start, and then I'll go. I mean, look, the vision for Alexa has been super consistent, actually. For 10 years, I think Panos, it made it into your final deck, I believe, yesterday. You know, we have always wanted to just make lives easier and better, simpler, and be the world's best personal assistant. That's been the vision for Alexa from the beginning. And so now we just have a technical leap that lets us get closer to that vision. But nothing, you know, that's been the vision since, for all 10 years that Alexa's been out there.
Starting point is 00:25:25 We have a much more capable AI assistant that's conversational, that is personal and personalized now, that can get an incredible amount of things done for you. But the vision is consistent. Okay, I want to go to Pano's in a second, but I need to follow up on that. Because, you know, the reaction to this reveal has been, this is great, it's personalized, it has your data to help you figure things out. But then you look at a company like Apple, which has so much personal data that people have trusted Apple with because it's all, it has this security messaging or Google, which, you know, has your, you know, maybe your Gmail, your Google Calendar, Google Maps. This is, these are the services that you use to get around the world and interact with people. so if you're going to be this personalized assistant like you are coming up against these companies that basically have already been deeply integrated into people's daily routines so what is the play
Starting point is 00:26:21 there I mean the phone you're basically asking about the role of the phone not just the phone because Google has I'm plenty of services on the desktop I mean I'm on an Apple machine I got Gmail open maps to figure out how to get your calendar. And so it's almost the operating system for your life. I mean, look, you told us you have echoes in every room in your home. And that's great. That's also true. I'm starting to think that I have too much to echo.
Starting point is 00:26:51 And we might. Look at your job. I mean, come on. If you didn't, this would be a problem. I'm just saying customers, you know, we do so much for customers in the home today. And of course, we're Amazon. So that's not just thank you, by the way, for having echoes in every room in your home. That's awesome.
Starting point is 00:27:05 But also, we probably put some packages on your doorstep and probably stream you some content. And we've got great deep relationships with our customers. Prime is an incredibly valuable program, for example. And, you know, hundreds of millions of customers literally take value in that and love it and use it all the time. So we love our relationship with our customers, too, and think that we can deeply integrate any services customers want as well. We work with Gmail. We have the Outlook calendar. We integrate Apple calendar.
Starting point is 00:27:37 I think it's a very powerful point. You have to take that and understand. Like we're both kind of a, we have this, if you will, you have music, shopping, movies. This is real things that people love doing in the home. I mean, these are personal at every level. Photos. But also, we're such an open platform with thousands of partners. It's hard to say it's a platform.
Starting point is 00:28:02 So I'd be careful with the word. At the end of the day, every single integration point across Alexa gives us so many of those insights as well. But the key, Daniel hit it, when he asks you a question, it might have been rhetorical at some level. I don't think there's anyone close to be able to understand your home as Amazon, as Alexa. it's a super important element for us Alex the idea that smart home is connected to your music to your entertainment to your life the fact that we're now bringing in memory to Alexa
Starting point is 00:28:42 and you can have that conversation it'll hold the context for you I think I don't think there's anything else like it because then it's connected to all your services in a natural way too I don't think it replaces the centerpiece of the phone I think it just adds value to your life in a very different way
Starting point is 00:28:58 and I think there might be a little little bit of opportunity, and this is me understating it, but the ambient devices in your house right now and the ones that you can buy from us and some of the beautiful products that we're both making now and have released recently, they're in your home. And you don't have to think. You don't have to open anything. You don't have to log in to anything. You just have to be there and speak. And it's a powerful concept when natural language shows up. Yeah, I was with speaking with Jamil Gandhi, the head of prime at your event yesterday, and he was talking about how the family calendar is on his Alexa device, and it is a Google calendar. So the fact that there is that
Starting point is 00:29:41 interoperability, I think, where you don't have a phone, that actually might, maybe that's an advantage. I'm just trying to. It is an advantage. Just think of it this way. Like, we're not asking you to start something that you knew that you don't already do. Right. We just want to make it simpler for you. So Google Calendar is a great example. Okay, just attach all four of your family's calendar. We'll make it a family calendar and put it front and center for you. And then when you decide if you're going to dinner on Friday night, we'll rationalize it. And, you know, that concept that there's a communal device in your house that everyone can see, you know, it's something that people have been asking for for a long time. But now that you have so much intelligence in the product and
Starting point is 00:30:24 it can do the rationalization for you, you know, I feel. like we stand alone there. I do think I would I think this calendar example is one that helps flip the question a little bit in my mind because it really is like how often do you say well it was just on my calendar? I didn't know to meet you there. Why? I was on my work calendar. I say that to my wife Tully, you know, all the time. She's like we missed the restaurant. We missed the reservation. So anyway, having one spot that can be communal and personal, pretty powerful. I want to press a little bit on this because the phone seems to be the place where people, like, it's all about, like, where do people interact with these assistants? Yep.
Starting point is 00:31:09 The phone seems like it's going to be a pretty important place. It will be. So if you don't have a phone, I mean, again, there's some advantage in that, like, you can bring any service in. But, like, if people are, like, on an Android and they're summoning a Google assistant, whatever the name is that week, or they're on an iPhone and they're summoning Apple Intelligence or Siri, where does Alexa fit in on that? Are you going to have to look at deeper integrations with these phone makers? Will they even allow you to do that? I think people use different assistants. I don't think there's any question about it. I don't think there's one.
Starting point is 00:31:43 Although if you lean into Alexa, we have the Alexa app on the phone. And with one touch of the button on your iPhone, you're having the same conversation. You're actually carrying the conversation from your home to your phone, to your car, to your PC with Alexa.com. We thought that through because we needed that thread for sure. So, you know, as she becomes more personal to you and then, you know, more needed, you want to have her with you everywhere. That app is doing a crazy cool job right now. I know we haven't released the new Alexa app yet. It's coming with if you get Alexa Plus, you get the Alexa Plus app, the Alexa Plus app as well as Alexa.com.
Starting point is 00:32:18 Right, that's going to be a web version of this. There is. And usually see the more traditional long form. work that you do with any AI browser at this point. It's the easiest way to say it. But you also get all the personalization. You also get the context of carryover. If you had a conversation in your kitchen, it'll just remind you what conversations you've had lately. If you've booked a reservation, whatever you've done, it'll collect it there. So it'll be on your PC and your phone as well. So I think we just want to provide that for our customers so they have the opportunity
Starting point is 00:32:42 to see it. I want my single assistant with me everywhere. You might use your phone for different things. You might use a different AI assistant on your phone. I think that's a fair you know, fair proxy, I don't, I wouldn't disagree. It just depends on what's the best path to get something done. I think Alexa will provide a lot of that best path. Okay, I want to take a quick break and then talk a little bit about the agenic elements in your new Alexa release, where agents might be going, and then maybe we dream a little bit about where this technology is going to lead. We'll be back right after this. Hey, everyone. Let me tell you about the Hustle Daily Show, a podcast filled with business, tech news, and original stories to keep you in the loop on what's
Starting point is 00:33:22 trending. More than 2 million professionals read The Hustle's daily email for its irreverent and informative takes on business and tech news. Now, they have a daily podcast called The Hustle Daily Show, where their team of writers break down the biggest business headlines in 15 minutes or less and explain why you should care about them. So, search for the Hustle Daily show and your favorite podcast app, like the one you're using right now. And we're back here on Big Technology podcast with two Amazon executives responsible for the new Alexa. We have Panos Pinae here is Amazon's senior vice president of devices and service. And Daniel Rauch is Amazon's vice president of Alexa and Fire TV.
Starting point is 00:34:02 So it's interesting that this agentic buzzword is now starting to be translated into things that we're seeing in product. And it's kind of interesting because Alexa's had skills for a while. call me an Uber. And now you can use Alexa to call you an Uber. So is this actually like a really a new moment for agentic AI or is this rebranding of some stuff that works a little better than it has? Panas, what do you think?
Starting point is 00:34:32 I can't get it to work anywhere else. I mean, I think this is a, at the end of the day, it's incredibly new. But it's also solving so many different things at the same time. First, you have to always go back to. how much understanding is in an utterance, just in natural language, being able to translate it. We've talked about this already.
Starting point is 00:34:54 Getting down to calling a service, calling the right API, making the right partnership so that API is called to make it as simple as possible. It's, I don't think it's been accomplished. I don't think you're seeing it out there anywhere connected to an assistant right now.
Starting point is 00:35:09 I think there's a lot of, maybe you've seen it. You've got to share with me where it is, but I don't think you have. I have not. What, agents? Fundamentally, like using, you know, a core LLM with an agent, non-deterministic, calling the right API, calling that service,
Starting point is 00:35:25 booking that service, bringing it back and tying it back into all your other services. It's a demo we've all seen a thousand times, but haven't been able to use, I think, as consumers. Yeah. Okay. I think, yeah, maybe that's the case. I haven't seen those demos myself, but I do, I believe it. I believe it. I maybe didn't need to watch closer.
Starting point is 00:35:45 But I do think it's new. I think it's new what we've created and what we're doing and building it up. I think it is. I also think we mean, we might mean different things by agent. And so I'm just curious, Alex, what do you, what? Yeah, just to make sure we're grounded in your definition. For sure, there's a grounding difference between us. Just in passing, I mentioned yesterday in my own part of our event, you know, that, boy, everyone just uses this term agent.
Starting point is 00:36:09 And I do think people use it in different ways. What does it mean to you? Yeah, it's such a great question because I do think that in some ways that agent has been used to rebrand automation. We've been seeing automation demos forever. I mean, even, so just to give you one example upon, I wasn't trying to shade the Amazon demo. I was just to give you one example.
Starting point is 00:36:28 Yeah. We were all, I mean, a lot of folks watching the tech world were at Google I.O. When they demoed a voice assistant that will call a restaurant for you and book you a table. And like they did the actual conversation and the assistant has like human utterance goes, well, maybe we could have a table for
Starting point is 00:36:46 and then it would actually go and book you the restaurant I just don't remember using it but correct so again there's the demo there's the demo and then there's real life and but I think it was also just like you gave a tech command and it would go out and do that for you but a lot of this stuff
Starting point is 00:37:02 like I said we've seen demos we haven't seen it actually work my definition for agent is something that can go out and accomplish for you So, you know, you had a good demo that I enjoyed watching about trying to go see a Red Sox Yankee game. By the way, for folks listening, the reveal event was in New York. Daniel's apparently a Red Sox fan.
Starting point is 00:37:27 He trolled the entire audience, including the guy sitting directly in front of me. It was almost like he planned it. I kept saying, are you sure you want to do this Red Sox bit? He's like, for sure, for sure. It goes through the entire off-season acquisitions, which Alexis, I mean, as a Mets fan, I will say, you were fine. You were fine. By the way, you saw the info expert in action right there. That's what it was.
Starting point is 00:37:49 Yeah, because you're now, and it was not deterministic. And then, of course, it's a different answer every time, Alex, every time Daniel did the demo. At the end of the day, I mean, it was Alexa's decision to talk about Alex Bregman. It wasn't Daniels. Like, you couldn't lead that. You can't plan that. And so a bit of a risky demo, because if Alexa decided not to talk about Bregman, I don't know where you would have taken the rest of the car. I do know a lot about the Red Sox, so I figured, you know, maybe eventually we get to buy some tickets is what I was thinking.
Starting point is 00:38:18 But it was to set an example of that kind of agentic capability and sort of set the baseline of what we mean, which is, hey, I just, I actually was just having a chat about the Red Sox. Could I get some tickets? Actually, that's a tough game to get. Oh, they're expensive. Can you watch for tickets for me? I mean, that was where we ended up with the demo. Could have ended up in a lot of different places, but being able to set an agent off, if you want to call it an agent in that case, we think about it a little bit differently,
Starting point is 00:38:45 but in that case, that agentic capability to say, first of all, I could buy you these tickets right now. Second of all, you don't like the price. I'll watch for you. Infinite patience never runs out of gas. If those tickets do drop below a certain price, I'm notified and can buy them. That's a hugely useful thing for a customer. Yeah, and you could buy it with a command. Yeah.
Starting point is 00:39:08 Because you're integrated with Ticketmaster. Exactly. Yep. So, yeah, to me, I would say that's agentic behavior. Great. I would say it qualifies. We had some questions in a, we have a big technology discord. I was like sharing notes with the crew as the event was going on.
Starting point is 00:39:25 And we had some notes from people about what they want sort of beyond those simple use cases. I mean, I call it simple, but, you know, obviously there's a tech, there's a tech lift to get it done. So one of our listeners said, is it going to, is Alexis still going to be reactive to requests or can it be proactive and suggest at the start of the day some smart ideas based on the context that Amazon has? For instance, I would say, you know, do I need to order any birthday gifts? And it would then go out and say, well, look, on your calendar, there are. are, you know, five birthdays coming up, these are the dates and these are our suggestions. So is it going to, because that's, I think, a step further. I think you're stepping in, you said you want to talk a little bit about the future
Starting point is 00:40:13 and how proactive Alexa can be. Like, there's a balance. One, we think Alexa can be incredibly proactive, like, to the point of when you wake up in the morning, you walk into the kitchen, it's like, Alex, you didn't sleep well, you know. And then you can imagine integration with some partners that he's like, okay, let's have the conversation. Also say, hey, your day looks pretty packed today. You should probably find some time. That proactivity is there.
Starting point is 00:40:38 It's in the system. We're using it in a very different way. We don't want to be intrusive with it. We've got to learn from our customers first. Like, how much productivity do you want? I think it's very, very important to, you know, you don't want to jump to that future. You've got to be right. So, yeah, it's a good example.
Starting point is 00:40:52 I'll wake up in the morning. And if I need to buy a birthday gift, can you just remind me? We can create reminders. We can create a conversational piece. But I don't think a lot of people want Alexa just to wake up and start. talking to you. No, I do think that, yeah. Don't want to be intrusive.
Starting point is 00:41:05 You've got to be really careful. We've got to be so smart about, you know, we have 10 years of lessons. This is what's so awesome about it. And, you know, how much privacy matters. And when you want to invoke Alexa to be part of the conversation versus when you how proactive you want, you want it to be. And, you know, we have a balance on it. But I think it's a good push.
Starting point is 00:41:28 She's already proactive in the spirit of. Um, she has a way to, if I, if I went out there and said, hey, I've been looking for this, I watched this movie last week. What was that song that was playing in that movie? Okay, give it that little information. Check prime video. What was he watching? Okay, I got it. I think you're watching this movie. It was this song. Proactivity also includes, do you want me to play that song or you just want the name of it? And a lot of times, Alexa will say, do you want me to play it for you? That's a subtle proactive. It's not intrusive. It's using context, you know, contextual information, some memory, some of your history. And in the past, you've asked me to play it every time. So why don't I just ask you to play it? I think those are different forms of productivity, but our vision includes Alexa being proactive. It has to be. That we believe the next step customers will ask for is I want her more, not less. Right. And so instead of me thinking, oh, I should ask Alexa, is there a point where Alexa will know to ask me?
Starting point is 00:42:23 I think that's a real question. I don't think that's today. I think that is the future. And I think, you know, back to where, you know, we're pretty well positioned for that. If that's what customers want, I think we can do it for them? But I think what this listener was asking is, can I just, with natural language, say, can't, you know, take a look at my calendar and tell me something. Okay, so that's different. That's different.
Starting point is 00:42:45 Sorry, I went all the way to my vision, but here's what I'll pitch back. That already happens. Okay. So when you wake up in the morning, whoever that listener is, here's the answer, yes. With Alexa plus or with Alexa plus? Okay. Sorry, not with Alexa. Right.
Starting point is 00:42:56 So this is new. There's no way it's going to happen with Alexa. Okay. It's not. But with Alexa plus, 100%. Wake up in the morning, get your daily brief, tell me what's going on. And, you know, Alexa knows what time you start work. We'll warn you of the traffic. You should probably lead by 8, 20 if you've got to be there by 9 today. Like that level of proactivity, that's in the system, but you have to engage first. Okay. This idea of Alexa being proactive. Like it is, it's definitely, I see where your caution is coming from because there are, these proactive notifications that you get with Alexa, I've had to turn them off. Yeah, yeah, we learn from that. Yeah, so okay, that's good that there's learning there. I could go with some other Alexa product feedback, but I feel let's use our time. Let's stick with Alexa Plus for a minute.
Starting point is 00:43:43 But if you wanna talk about Alexa, and we can tell you if Alexa Plus is fixed your frustration, which we'll do that too. Well, the one thing I'll say is I use it to play alarms and there have been moments where it will play the ad before it will play a song in the work. So, um, but, but that kind of goes to a question that we did also get in the discord where people talked about, um, they talked about who, who's assistant do you trust?
Starting point is 00:44:09 And in the back of some people's head, there will be this perspective, um, Amazon is just going to try to sell me something. Like, for instance, that example of you didn't sleep very well. Like, all right. It's like a suggestion for sleeping pills coming up. I don't know exactly what it is. But like, how do you get past this perception of like, I'm going to get, because you do with an assistant, you trust it with a lot of data. So how do you get to the point where people are comfortable sharing this data and feeling good about the fact that it won't be used to lead to purchases? Well, I mean, first, I think even before you get to that part of the question, it's just how do you manage a customer's data? How do they see transparently what you're doing, what they've
Starting point is 00:44:51 told the system? Do they have control over their data? So all of, that's so paramount that you have to start there, actually. It's like one question earlier than that, which is, do you trust Alexa? And the answer has to be yes. So we've been building on a foundation of transparency and control. There's the Alexa privacy dashboard, which is one great place to see everything in terms of system settings and your data, et cetera. I just want to make clear all of that carries forward to Alexa Plus. I think that's sort of the important point to make at the top.
Starting point is 00:45:23 And then if the question is, you know, is the question, boy, should I be, you know, should I be offered a product in a given case where a system thinks I need it? I find that great when it's great. It is great when it's great. Like, I found a pair of shoes. I don't even think it was on Amazon recently through something I was reading online. And I've got orthotics. And, you know, it's great when it's great, basically. I was referred something.
Starting point is 00:45:54 They're awesome. Altras. They have a wide towbox. I'm not going to sell ultrass on your show. I'm just telling you that I found them. Alters, if you're listening. We need sponsors. Is this the camera?
Starting point is 00:46:05 That's the ultra's sponsor. Yeah. Let's give them a heads out of it. It's an art. We need sponsors. Alex needs sponsors. It's an arcane example, but the bottom line is like, it's great when it's great. And why is it great?
Starting point is 00:46:17 It's contextual. It's relevant. It's offering me something that I actually need. and so building systems where you can do that elegantly, like customers actually love that. We get feedback that that's great. It's not what's terrible is when you get inundated with things that are irrelevant to you.
Starting point is 00:46:34 And so we're building a system that doesn't do that. Does Alexa need to have a screen? I mean, for this to you, you're the head of devices at Amazon. A lot of the demos that you did at your launch event were with Alexa with a screen. Again, I have like first or second generation echoes in my house. It might be time to upgrade. You should upgrade.
Starting point is 00:46:52 Like, there's a couple of things you're missing. One, you're missing speed that you could have, that you don't have. And I think speed is time for me. Okay. It's comfort, you know, to confidence. Like, there's so much. Like, first, yeah, I would always encourage, not just because I want to sell the next device. That's not why.
Starting point is 00:47:10 I just having something modern. If your device is nine years old, you're missing eight years of tech. Okay. So I'm judging you, giving what you do, you know? And so your feedback is like, half heard at this point. But I would say, I'd say that, you know, jokingly, but I go, look, you need a more, it's better. The product's just better as it, you know, generationally. Generation over generation, always got better. Now it's, does it need a screen. Incredible. Yeah, it does. Okay. It doesn't have to have a screen. It's a better experience with a screen. Okay. It really is. Now, let me, let me qualify it because you have a screen in your pocket that works with Alexa. You have a screen on your desktop that works with Alexa. the screen in your home, you should have one. It's very powerful.
Starting point is 00:47:56 It's nuanced. It's not intrusive. The new design is elegant. It's soft, if that makes sense. It's what you want in the home, something softer. You can get the expression from Alexa from that screen, and she brings visual expressions as much as anything. But here's the trick.
Starting point is 00:48:13 It will come with you in your earbuds. It'll come on your Alexa frames. It'll be in your pocket. It will be in your car. you don't always need a screen, but in your home, I mean, the command and control, the information management, what you get off of it, it is powerful. Will it work without a screen? Absolutely.
Starting point is 00:48:34 Absolutely. And it'll be great. So need is a relative term. I want you to have a screen. Okay. Because the experience is that much better. And there's a nuance in it. Like when we start rolling out preview, the first customers to get preview will be our screen-based
Starting point is 00:48:49 customers because it's the best. experience. Okay. That's simple. And so you'll be like, I want the preview and I'll say get a screen. Get a screen. All right. Maybe two. And then we'll light up all your, we'll light up all your echoes, but, but you need a screen. Okay. Maybe one in the kitchen, one in the office. You only need one. Yeah. I mean, it's up to you. But yeah, I mean, it's up to you. Keep the screen out the bedroom. At least, that's my perspective. Totally. Like, you know. The only screen I allow in the bedroom is the Kindle. It's a cool product. But I'm using mine here now. I'm just taking out. Listening to you, by the way, I got the alarm in the morning note I get that bug filed like I got you but the but the but the idea that different devices work in different places is real right but I think you need a central hub right now I think Alexa plus is so dynamic um and the more you can learn to do the screen will teach you like hey get after it yeah you saw Daniel's thumbtack demo which is a little bit even is was more agentic than if you will for us then the grubhub slash Did we do Grubhub or OpenTable last night?
Starting point is 00:49:53 Open table, yeah. With Uber. Right. But the Thumbtack demo was, you know, conversation, let's, I need a repair person. Well, that agent goes out and starts booking it for you on the website. And then you need the screen to give you a status, like working on it back in a bit. Don't worry about it. Okay.
Starting point is 00:50:10 I think that's what you want that ambience for in the background. So I think the can't be more clear, I don't think. I think it'd be great. Okay. I'm sold. I'm going to get one. All right. We're running up on time here.
Starting point is 00:50:20 want to give you both a minute to answer this question and then we'll head out. But it's got to be a minute or your team here will have my head. We talked about how voice AI might be the future of AI or the catalyst for these large language models on the show a while back. Open AI, for instance, debuted or introduced this advanced form of AI called with GPT-40. And you can see the inflection point of chat GPT that the second they announced that, bam. It goes from 100 million to 300 million users. Is voice AI the future of artificial intelligence? Start and I'll close us out.
Starting point is 00:50:56 I mean, we've believed for a long time that voice is the most natural interface. We're using it right now. We're using it with your listeners. We're using it with each other. It's incredibly expressive. You can load an unbelievable amount of context and power in it.
Starting point is 00:51:10 You can be definite. You can be vague. You can be nuanced. And it's just we're born with the knowledge of how to use it. And it's completely intuitive. So I think we do strongly believe that it's one of the best ways to get things done. It is not the only way to get things done.
Starting point is 00:51:27 But I do think it's pushing us. It's challenging us to get more and more human, more natural. And that's why it's always been one of the kind of centerpieces of our vision for Alexa. So yes, my answer is yes. And I think it's really pushing the envelope now. Okay, I'm in it to you, Panos. I think we're at that time where this is the inflection point. And I mentioned it yesterday, you know, that I believe the vision for Alexa's,
Starting point is 00:51:49 incredibly ambitious. It centers around voice for sure. I don't think it ends at voice. I think the interaction model needs to be the one that's most natural to you. No doubt. If you need to touch the screen to complete a task, if you need to get to your computer and write the long form, I think it's a flow. And the thing you don't want to do is you don't want to block the customer from the interaction that they need to go get something done. It's why we're on the phone. It's why we're on the PC. It's why we're in your glasses. It's why we're in your ears. And ultimately, though the anchoring point of all of it is the voice because it is natural it's innate to all of us the trick is getting to natural conversation the trick is trusting that you can just talk
Starting point is 00:52:26 and and realize that as we talk to each other it's pretty sure you can talk that way with Alexa and you're going to find that and I think that is the transformation that's coming I think it finishes you know um the next chapter ends the first chapter and starts the next chapter and leads us to getting, so finishing is the wrong word there, but getting us to that next, that next leap over the next 10 years, this is that starting point. The technology is enabling it right now and that inflection is happening. Um, and it's compelling. So it was a longer way to say, yeah, it starts with voice, but I don't think it ends with voice. It never will. Like we, it is also innate to us. You always, we as humans, we're always going to find the best
Starting point is 00:53:13 and easiest path to get something done. And we think voice will lead to most of that. But not all of it. Like, we don't want to overstate it. Like, we will find the best, easiest, which means basically the fastest path to completion, which is why you need to upgrade your devices and get a screen.
Starting point is 00:53:32 Are you with me? I told you already I'm buying. All right, well, get on it, man. We sold at least one device here in New York. Good news. Thank you. While we're here. Our goal this week was not to sell devices,
Starting point is 00:53:41 but we'll do that soon. Very efficient and scaling. We're killing it now. We have a new sponsor. We sold a device. This is good. We're killing it. Well, look, Panos and Daniel, I want to just say while we're recording that I don't take
Starting point is 00:53:54 it for granted to be speaking on record with Amazon. It's always great for me to be able to hear what you're doing and be able to ask these questions. And I'm sure for listeners, it'll be great as well. So thank you both for being here. Thanks, man. It's been a job. Thank you so much.
Starting point is 00:54:07 Yeah, it's been great. Awesome. Well, thank you everyone for listening. And we'll see you next time on Big Technology Podcast.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.