Big Technology Podcast - How Amazon Rebuilt Alexa From The Ground Up — With Panos Panay and Daniel Rausch
Episode Date: March 5, 2025Panos Panay is the senior vice president of Devices & Services at Amazon. Daniel Rausch is the Vice President of Alexa at Amazon. The two join Big Technology Podcast to discuss how the company rearchi...tected Alexa, blending a deterministic system with the latest generative AI technology to create something that can both turn your lights off and speak with you about philosophy. We also discuss how all big tech companies seem to be converging on the same contextually aware, general AI assistant, and why Amazon believes Alexa has a chance to win. Tune in for a front row perspective on one of the tech industry's biggest AI projects. --- Enjoying Big Technology Podcast? Please rate us five stars ⭐⭐⭐⭐⭐ in your podcast app of choice. For weekly updates on the show, sign up for the pod newsletter on LinkedIn: https://www.linkedin.com/newsletters/6901970121829801984/ Want a discount for Big Technology on Substack? Here’s 40% off for the first year: https://tinyurl.com/bigtechnology Questions? Feedback? Write to: bigtechnologypodcast@gmail.com
Transcript
Discussion (0)
The Amazon leaders who spearheaded the new Alexa are here in studio to talk about what it took to rebuild the pioneering AI and where voice AI is headed in the age of large language models.
That's coming up right after this.
Welcome to Big Technology Podcast, a show for cool-headed, nuanced conversation of the tech world and beyond.
We are joined today by Panos Penae, Amazon Senior Vice President of Devices and Services and Daniel Rausch, Amazon's vice president of Alexa for a
a fascinating conversation about what it took to rebuild Alexa, effectively from the ground up.
Gentlemen, so great to see you. Welcome to the show.
Thanks, man.
So great to be here. It sounded kind of fun.
You both must be relieved to have this out.
Yeah. I mean, excited. Relieved is a tricky word on this one.
You know, we're finishing the product now. It's coming out next month. So we're pumped that we're through the event.
And yeah, there's some relief, I would say. Would you agree? You feel a little bit of relief?
but the truth is, like, it's all about getting it in a customer's hands as fast as possible.
So you still, the team's feeling that urgency right now.
Yeah, that's the big moment for the team, right?
You get that first customer response.
So we're, we still feel like we're building towards it.
But yesterday was great.
Okay, so I have three echo devices in my house.
We have three rooms.
Yeah, what are they?
The house is generous.
But in my apartment, there's one in the bedroom.
There's one in the kitchen slash diner room and there's one in the office.
Yeah.
So they are first generation.
I'm really looking forward to getting these updates.
working hopefully within these devices and getting a chance to use a new and improve Alexa.
I've been hanging on to the echoes for a long time in the hope that something like this would
happen. So we're here. And I was at your event where you were announcing it. I'll give listeners
a little bit understanding of what I saw. And then we're going to go into some questions about
what it was like to build this. So this new Alexa, it's called Alexa Plus. It is conversational.
So it understands natural language. It understands your context. And you don't have to say
Alexa every time, we'll sort of have a back and forth with you. It is, I think you could call
it agentic. It allows you to take action like book a table, call on Uber. It will go out on the
world and help monitor ticket prices for you, for instance. And it's also deeply integrated into
Amazon services, and namely Prime. It's going to be free for Prime members, 1999 a month if you're
not a Prime member. And the coolest thing I saw in the demo was that you, I think one of you asked
for a
the song with
what was it
Bradley Cooper and Lady Gaga
I didn't say
Lady Gaga
I just said Bradley Cooper
Brad the Cooper
in that
what was the movie called
Stars Born
Stars Born
Star is born
great movie
and then it
called up
play the song
and then you said
now let me see it
in the movie
and it connects to Prime
video and you could see
in the movie
so very cool product
definitely
I think what a lot
of us have been
hoping to see
from the Alexa team
and from Amazon on Alexa.
We're going to talk a little bit about what it took to build it and then the strategy here.
So I think the first question I need to ask you both is what did take so long?
Because I think that for all of us who, you know, I think there's 500 or 600 million Alexa enabled devices out there,
we've been wondering as open AIs of the world and other companies have made these big advances on voice AI when Amazon was going to make its move.
And you have made the move.
But what was the process that made it take as long as it has, Panos?
I think the easiest way to say it is when you have hundreds of millions of customers that are active right now, I mean, this is we talked a little bit about it yesterday, but every one of them matter.
How do we make sure they all get the great experience they need, meaning you can't start from zero and ignore it.
And if you could, it could be much faster, although it's not that easy to hook up the thousands of.
of APIs and all the partners that we're bringing together and all the experts.
It's not, it takes time.
But the first thing is, and there's two parts to it, but the first thing is you got
hundreds of millions of customers.
They love certain things that they do on Alexa today.
They might not love everything, but they love certain things for sure.
You can't, you can't leave that behind.
Can't wake up one day and whatever you use Alexa for, whether it's timers or music,
you can't not make it better and great.
And so you don't feel like something was taken away from.
you. When you take something away from a customer, you've just missed. You've missed.
And so that's one. It takes time to make sure you can get it all done. So everything on what you
would call Alexa, not Alexa Plus, works on Alexa Plus, but better. And that was just the first
point part of the vision, can't leave anyone behind, which was important. We can talk about devices
and so forth, but customers who love their products that are in and they need them, we can't
take that away. That was one. Second piece is you're re-architecting from the ground up.
So you've got first the weight of keeping hundreds of millions of customers, and then you're
re-architecting from the ground up. If we started from zero customers, I think this is a different
story. You can move a lot faster. We can solve problems and then just add features as we go,
if that makes sense. So maybe we just had a conversationalist, a pretty cool one, then we can
add personalization, then we can add memory, then we could add the experts, and people would just
get updates along the way and maybe learn and be great. However, on day one, we need to support
everything people love and know about Alexa, day one. And so a little bit of patience there
and it takes a little bit longer. And the vision was, the vision was clear, like we're going to go
bring a conversational agent forward, an assistant for everyone that is smart, has memory, can
personalize to you, and then ultimately be incredibly useful. And so when we had that laid out,
out, we're okay, great, but we can't leave any customers behind. And right at that point, you
kind of step back. Once you put the vision together, you realize you need a full re-architecture,
but you're not going to leave your customers out. So you're re-architecting pretty much two
stacks at that point. One, what is classically known as Alexa, to be awesome and come into this
conversational world. And the other is everything new that it has to do. Yeah, and I want to go a
level deeper with Daniel on this one, because, upon us, what you're talking about, a re-architecture
is sort of what I've heard has been the holdup here with Alexa for all these years,
which is that, and Daniel, tell me if I'm wrong.
But basically what folks have told me is that the old version or the original version of Alexa
was built with a lot of like if then commands, right?
So, you know, it will understand some structured commands, turn on the lights, okay, then it will take that
and almost like deterministically say, okay, I understand this command, this is what I'm going
to do, turn the switch with large language models.
it's a completely different ballgame because you have to make room for uncertainty.
So actually, the fact that you've been able to introduce an Alexa with large language models,
which I think will be able to keep that functionality as an engineering feat,
that's my perspective from the outside.
What is it actually like on the inside and how close is that assessment to the challenge?
Well, the team would love to hear you say engineering feat, because I do think that's—
I don't think there's no lack of feet.
It is real.
That is the size of the task, for sure.
I think you're on to it for sure.
You know, large language models, the one thing I'd add just in terms of thinking through the technical architecture to what Pano said is that it's really just the latest generations of large language models that can even do the things that Alexa needs to be able to do.
So you're talking about our NOVA models, right, which we announced within the last few months and starting to get into customers' hands.
That's super exciting, you know, partnership that we have with Anthropics.
Like, you really need very state-of-the-art technology at the base of the architecture in those large language models.
And in large part, because of what you said, we need them to behave in ways that we can predict and are certain.
Someone says, lock my door or, you know, play that song.
You want it to happen, right?
Some are higher consequence than others, and you really need to get it right.
But you also want all the elegance and nuance and understanding and non-deterministic behaviors of large language models themselves.
right? So we would call that a stochastic system that, you know, it's literally at runtime that
you're making those determinations. So if you want to integrate tens of thousands of services
on day one, day one out of the box, take advantage of everything that Alex has always been able
to do, as Panos was saying, and introduce all of this new, unbelievable behavior that you can
get out of large language models. That is a big engineering feat.
So how does it know when the user is saying turn the lights on versus like something more
esoteric. Like, is there something built within the technology that's kind of like a switcher that
determines first your intent and then decides which part of the model to send it out to?
The way to think about it is, you know, at the base level, you have large language models and
you have this model agnostic system that's even itself going to choose the right model for the
job. And the models play different roles in there. What's already happened is, even honestly,
sort of in the way you asked a few of the questions, is that people assume the large language model
is the product. A product like Alexa is so much more than, quote, unquote, just a large language
model. So you have models playing many different roles in the system overall, even models helping
us decide which model, and models themselves deciding if they're the best, you know, tool for the
job, so to speak. So then you have a system that progressively decides how to get something done. I wouldn't
think about it like a switch or something in classic computer science that is a, you know, it's
a gate. That's not how the system works. It's a collection of model behaviors and systems downstream
of that that complete specific tasks. And that's where we introduce this term expert to try to
help coalesce around the system behavior and explain it better. The large language models are
interacting with these experts that do things like get you the sports score, play a song, play a video,
know where you are in the song so that you can go to the video,
like all the things that you saw yesterday at the event.
And so, Panos, this is a mixture of experts' model.
It is.
If you think about it, a mixture of experts model,
but each expert theoretically has its own model as well.
So you're building on top of it.
Each expert is smarter.
When you think experts, it's like it's a weird term, yeah,
but there's think photos, smart home, entertainment,
whether that's music or video, local info,
all the partners that connect.
Do you have communication expert, you have an artifact expert, you have a memory expert, you have a personalization expert.
Each of them play a role, and they kind of arbitrate with each other at all times.
So like the model is just lighting up when it determines that that's what you want to do.
That's right.
Daniel kind of said it well, like, because the LLM at the bottom of that stack is, it's deterministic.
It's choosing which model to use, then the experts come into play on top of it.
It's a pretty phenomenal way to, you know, it's a pretty interesting way to think about it.
This is a mixture of experts model for those at home.
It's been part of what DeepSeek has used to become much more efficient in its reasoning, for instance,
because instead of lighting up the entire large language model is deciding to light up certain areas that might be,
I mean, it's not a deep seek innovation, but they've just kind of used it to an extreme extent.
Has that, has using that architecture helped you build this in a way that's, for instance,
like reducing latency or sort of lightening the compute burden that you also?
otherwise might add.
If you want something incredibly fast, stable, even secure, like the paths on data, right,
where you're really taking care of customers, this is the fundamental approach, I think,
that that is state of the art.
And accurate.
And for sure.
Don't forget accurate.
So important.
Yeah.
But on that note, I mean, are the new Alexa, is there going to be some sacrifice to having those Alexa commands,
those standard, turn the lights on, set the alarm, in order to enable all the LLMs to work the way that they're going to?
I think you just called out the sacrifice and it's time, like how long it's taken us to get to where we are.
That's why it's my favorite question.
Like, why is it taking you so on?
Like, if I told you where we were four months ago on, somebody said, lock that door.
And then we had to determine what that meant versus in the past, lock my front.
door. And you had to know it was the front door and you had to say front door. It's pretty
phenomenal, but, you know, six months ago, it took longer than anyone would wait to lock a door.
And, you know, our customers need immediate response and we won't make that tradeoff. So to be that
accurate with the latency that's needed, with the speed sub two seconds at the end of the day,
you end up needing a little bit more time or finding the expert so the expert can be quicker
and the model can pick the right model quicker,
and the smaller model can be trained
to make sure it knows where the door is.
He gave an example earlier, which I thought
it's a nuance, but let me just share it with you.
Previously in Alexa, you couldn't say
play that song.
It would look for a song called That, right?
It was that simple.
Now the model has to reason and say that song.
I wonder what he's asking,
I wonder what she's asking,
I wonder what the person's asking.
That's what's happening in the system.
Then the expert shows up, looks at the history, the personalization, what conversation were we having, play that song.
Oh, he's talking about the conversation we just had about Bradley Cooper and Lady Gaga, shallow play.
That all happens in, you know, sub two thousand, you know, how many milliseconds are we talking?
We count in single milliseconds now in system component.
So now you're, all that is going on and the stacks working through it versus today, which is play.
shallow and that's the only way you're going to play shallow yeah that's it and so i think
it's just understanding that nuance um in where natural language comes in where you can talk to the
you can talk to Alexa without being precise just like you can talk to me and i'll use some micro tells
to get you know are you asking me a rude question a great question a nice question are you leading
me um and then from those micro tells i can then move to the words and then determine where you're
taking me and you don't have to write it down, type it and read it exactly. All that is happening
now in the machine, which is pretty powerful. There was a cool scene in your demo at the launch
event where I think Panos, it was you where you said, don't play the music in the baby's room.
Yeah, so it's really, I didn't say that. So that's very explicit too, right? Don't play the music
in the baby's room. The model will come up, the expert will show up, the music expert,
this is where it's super powerful and go, got it, play it everywhere else. Or you can, you can
just say, don't wake the baby, play the music everywhere.
Then the model will go, don't play it in the baby's room.
I know what they're asking.
So this is where just that small model in the expert does its job.
And the fact that you can just naturally move it around, in that demo, I don't know if you
noticed, by the way, nerve-wracking.
Yeah, so for listeners, Panos did this entire demo live.
I mean, we're going to talk about Apple Intelligence in a second.
But Apple Intelligence, I was at the WWDC launch event, and it was all a vision.
And what we saw at this Alexa launch event was a working demo.
Now, look, I mean, we know what to reserve, us commentators know to reserve complete judgment until it's in our hands.
Yeah, you have to.
You have to, for sure.
But it was real.
It was all real.
Real and working.
Yeah.
But what makes you nervous in an event like that, you're not worried about the product working.
I mean, six months ago, I would have worried about the product working.
And I would have showed you more vision demos.
like videos, but the product's working.
The challenge is the infrastructure,
the thousands of Wi-Fi signals
that are pinging around that room.
Like, it's just an unusual.
These live environments are very unusual.
Turns out tech reporters like tech,
and they're using a lot of it.
We're all on the Wi-Fi.
Well, more, more.
I mean, the signals that are being pulled
from Bluetooth to Wi-Fi to, I mean, who knows what's in pockets.
And one of my favorite tech demo moments is Steve Jobs just losing his shit on stage because all the reporters are connected to Wi-Fi.
And he's like, you could either be connected to Wi-Fi or you can have a demo.
You pick.
Totally.
I mean, I'm not, you know, I think bloggers have a right to blog.
But if we want to see the demos, we're not going to be able to do it unless we turn off all these Wi-Fi base stations and laptops, set them on the floor.
Yes.
We didn't have to have that situation.
And then you got, you know, the servers have to be lit up and you're, you know, you're worried about latency and what's happening in the room.
So you got all that going on and now you're going to live and this is your baby, right?
I mean, you love what you're about to show.
You love it.
And if it doesn't go off, like, I don't want to tell you what the backup plan was, you know.
What was the backup plan?
We're not going to talk about it.
For real?
Let's not talk about the backup plan.
Let me just say.
You can't tease the backup plan and then just share the backup plan.
They were really good.
really good. It was not a great. It was not a great backout. No, it was. They were great. They weren't
going to work, but they were great plans, I would say. I'm looking over here at some of the team that
was helping yesterday. I, I, uh, but during that moment, um, you, you may have, you may have heard.
It's, it was very nuanced. At one point I said, move the music, bring the music here.
I want to hear the music over there. And the reason I use different sentences,
I know what the model's going to reason over and do,
but I wanted to make it clear.
Like, you don't have to think about what you want to happen.
You just have to talk.
Like, I want the music over there.
Okay.
And if the model doesn't know, or if Alexa doesn't know,
she'll ask you, do you mean in the living room?
Yeah, so are we going to have a speed tradeoff here from the traditional Alexa tasks?
Just quickly, Daniel, I'm curious.
Like, is it, is the stuff I was doing beforehand, like, or doing,
I'm doing now set an alarm.
Is it going to take a little longer because of this process or it'll be the same amount of time?
No.
I mean, this is where we have such a high bar before we're willing to put it out.
And deterministic systems are incredibly fast.
Right.
It is straightforward computer science in this day and age with an AWS cloud and the great connectivity that everyone has in their homes
to make a deterministic system fast on something exactly like you said.
making a nondeterministic system fast that can respond in any way, gathers all the context,
figures out legions of different things, which experts to invoke making that system fast
on something as simple as an instruction or, you know, is hard.
So what technological breakthroughs or innovations did you rely on to get it from a place
where you were dissatisfied with latency to a point now or you're happy?
I think it's another version of using the right tool for the job and building
building a system of that's frankly just more complex overall to get the simple things done.
So it's a bit, you know, there's like an irony in that, but you need a system that creates very fast paths for simple things.
Even though you started with an incredibly complex system already, you're adding these kinds of complexity to get simple things done.
So that, I mean, I won't go into the specific technical details here, but that's the upshot.
You need to be able to figure out you're trying to do something simple so that you can do it fast.
And it gets tricky.
You know, people understand how to speak to Alexa today.
I think our new customers, we want to, you know, and current customers, we want to open their minds on what they can ask for and how to get something done.
Take the simple tasks that we have, timers, alarms.
There's a different way to think about them.
And then in the non-deterministic world, how to translate what's being said into what's being asked for, which is different.
an example you said how quick we'll be setting an alarm it'll be lightning fast and you'll likely set it the way you always have i need an alarm set an alarm for 8 a.m i think that's the classic way to set an alarm or you can say Alexa i need to i need to wake up tomorrow at 8 okay and now that's nondeterminist and now it's going i think you need an alarm and then it'll offer you an alarm or just set it same with the timer set me a timer
by the way how long do you want the timer for you say the time you can move that to set me a timer to I'm cooking my steak medium rare and then she'll say I'm setting you a timer for six minutes okay and so it's a you understand like when you get into that natural language non-deterministic what's happening what are you asking for you're cooking your steak okay I'll get you six minutes on each side or tell me how thick it is and then the answer is you know two inch thick whatever or I want a ramen egg
right that's eight minutes I got you tell me when you start I'm starting eight minute timer started for you
and so the world just change from even these most simple tasks they it just changes in the spirit of
by the way I never knew how long it took to cook a ramen egg so I'd always have to go to TikTok open it
spend 20 seconds watching somebody make ramen eggs and then eventually it says put it in the water for eight minutes
like got all you see on and then I would say and then I would say yeah it's very true by
By the way, don't search ramen eggs.
You get hammered with ramen eggs.
But I think, and then all of a sudden you're like, got it eight minutes.
Set a timer for eight minutes.
Now, just change it.
Just to ask for a ramen egg.
And Alexa will just determine what you're looking for and give you an eight minute timer.
Okay.
So just to wrap this section on the technical side, my note that I wrote to myself that said they spent too much time building the Alexa microwave and the Alexa alarm clock and not focusing on the technology.
Maybe I underestimated the technological lift here a little bit.
I don't know. We can't determine what you were thinking for sure, but I think there's a lift here. You said it's a feat of engineering. That's where you started. We have one of the best teams on the planet working on this. A lot of it has 10 years of history in it. There's so many people that work on Alexa today that have been there since its inception. You've got a lot of passion around that in the engineering team and the product, you know, just the product team all up. We call product makers when you put them all in a collection. And yeah, it's a feat. It's it.
It's okay, though.
It doesn't matter if somebody thinks it should be easier or it's not easier or whatever.
It doesn't matter.
Actually, if it feels like it's easy, that sounds pretty good to me.
Right.
I mean, I don't mind.
It means the customer's happy.
Like, this must have been easy.
Like, yeah, okay.
I don't care.
Do you like it?
Like, do you love it?
Right.
And I think that's where we go.
So I want to talk about the vision of this product because, and the strategy that you're going to put into play here.
Because, again, I was sitting in the audience and I talked about Apple intelligence before.
I guess this segment of our conversation is, I have headlined, it's Apple intelligence, but it works.
And, you know, it's a little facetious.
I tried not to read anything you posted coming in today because I was like, oh, no, I don't want to defend or have a preconceived notion.
So that's interesting.
Do you have to keep sharing?
We've been talking on the show a lot about how, you know, and yeah, just we talked a lot about the buildup to WWC, the reveal.
and it was a it seems like every big tech company has almost the same vision and tell me if
I'm wrong here but like Apple was like the Apple intelligence demo was like you talk to Siri and
ask when your flight is and you're switching flights and it's helping you pick your kids up and
that demo looked a lot like the Google assistant demo that I've seen like almost every year at
Google I.O and and then I saw your demo and I was also just like,
Like, this is a similar idea, which is that it's a contextually, it's a contextually aware,
smart AI assistant that helps you get things done and makes your life easier.
So I'm curious if you both see the competitive landscape in the same way I do,
if there's something different about Alexa than the others,
and how you plan to win given the landscape is developing the way it is.
You want to jump in?
So I got a long one here.
So why don't you just, no, you start, and then I'll go.
I mean, look, the vision for Alexa has been super consistent, actually.
For 10 years, I think Panos, it made it into your final deck, I believe, yesterday.
You know, we have always wanted to just make lives easier and better, simpler, and be the world's best personal assistant.
That's been the vision for Alexa from the beginning.
And so now we just have a technical leap that lets us get closer to that vision.
But nothing, you know, that's been the vision since, for all 10 years that Alexa's been out there.
We have a much more capable AI assistant that's conversational, that is personal and personalized now, that can get an incredible amount of things done for you.
But the vision is consistent.
Okay, I want to go to Pano's in a second, but I need to follow up on that.
Because, you know, the reaction to this reveal has been, this is great, it's personalized, it has your data to help you figure things out.
But then you look at a company like Apple, which has so much personal data that people have trusted Apple with because it's all, it has this security messaging or Google, which, you know, has your, you know, maybe your Gmail, your Google Calendar, Google Maps.
This is, these are the services that you use to get around the world and interact with people.
so if you're going to be this personalized assistant like you are coming up against these companies
that basically have already been deeply integrated into people's daily routines so what is the play
there I mean the phone you're basically asking about the role of the phone not just the phone
because Google has I'm plenty of services on the desktop I mean I'm on an Apple machine I got
Gmail open maps to figure out how to get your calendar.
And so it's almost the operating system for your life.
I mean, look, you told us you have echoes in every room in your home.
And that's great.
That's also true.
I'm starting to think that I have too much to echo.
And we might.
Look at your job.
I mean, come on.
If you didn't, this would be a problem.
I'm just saying customers, you know, we do so much for customers in the home today.
And of course, we're Amazon.
So that's not just thank you, by the way, for having echoes in every room in your home.
That's awesome.
But also, we probably put some packages on your doorstep and probably stream you some content.
And we've got great deep relationships with our customers.
Prime is an incredibly valuable program, for example.
And, you know, hundreds of millions of customers literally take value in that and love it and use it all the time.
So we love our relationship with our customers, too, and think that we can deeply integrate any services customers want as well.
We work with Gmail.
We have the Outlook calendar.
We integrate Apple calendar.
I think it's a very powerful point.
You have to take that and understand.
Like we're both kind of a, we have this, if you will, you have music, shopping, movies.
This is real things that people love doing in the home.
I mean, these are personal at every level.
Photos.
But also, we're such an open platform with thousands of partners.
It's hard to say it's a platform.
So I'd be careful with the word.
At the end of the day, every single integration point across Alexa gives us so many of those insights as well.
But the key, Daniel hit it, when he asks you a question, it might have been rhetorical at some level.
I don't think there's anyone close to be able to understand your home as Amazon, as Alexa.
it's a super important element for us Alex
the idea that smart home is connected to your music
to your entertainment to your life
the fact that we're now bringing in memory to Alexa
and you can have that conversation
it'll hold the context for you
I think I don't think there's anything else like it
because then it's connected to all your services
in a natural way too
I don't think it replaces the centerpiece of the phone
I think it just adds value to your life
in a very different way
and I think there might be a little
little bit of opportunity, and this is me understating it, but the ambient devices in your house
right now and the ones that you can buy from us and some of the beautiful products that we're
both making now and have released recently, they're in your home. And you don't have to think.
You don't have to open anything. You don't have to log in to anything. You just have to be there
and speak. And it's a powerful concept when natural language shows up. Yeah, I was with speaking
with Jamil Gandhi, the head of prime at your event yesterday, and he was talking about how the family
calendar is on his Alexa device, and it is a Google calendar. So the fact that there is that
interoperability, I think, where you don't have a phone, that actually might, maybe that's
an advantage. I'm just trying to. It is an advantage. Just think of it this way. Like, we're not
asking you to start something that you knew that you don't already do. Right. We just want to make it
simpler for you. So Google Calendar is a great example. Okay, just attach all four of your family's
calendar. We'll make it a family calendar and put it front and center for you. And then when you decide
if you're going to dinner on Friday night, we'll rationalize it. And, you know, that concept that
there's a communal device in your house that everyone can see, you know, it's something that people
have been asking for for a long time. But now that you have so much intelligence in the product and
it can do the rationalization for you, you know, I feel.
like we stand alone there. I do think I would I think this calendar example is one that helps
flip the question a little bit in my mind because it really is like how often do you say well it was
just on my calendar? I didn't know to meet you there. Why? I was on my work calendar. I say that to
my wife Tully, you know, all the time. She's like we missed the restaurant. We missed the reservation.
So anyway, having one spot that can be communal and personal, pretty powerful.
I want to press a little bit on this because the phone seems to be the place where people, like, it's all about, like, where do people interact with these assistants?
Yep.
The phone seems like it's going to be a pretty important place.
It will be.
So if you don't have a phone, I mean, again, there's some advantage in that, like, you can bring any service in.
But, like, if people are, like, on an Android and they're summoning a Google assistant,
whatever the name is that week, or they're on an iPhone and they're summoning Apple Intelligence
or Siri, where does Alexa fit in on that? Are you going to have to look at deeper integrations
with these phone makers? Will they even allow you to do that? I think people use different
assistants. I don't think there's any question about it. I don't think there's one.
Although if you lean into Alexa, we have the Alexa app on the phone. And with one touch of the
button on your iPhone, you're having the same conversation. You're actually carrying the conversation
from your home to your phone, to your car, to your PC with Alexa.com.
We thought that through because we needed that thread for sure.
So, you know, as she becomes more personal to you and then, you know, more needed, you want to have her with you everywhere.
That app is doing a crazy cool job right now.
I know we haven't released the new Alexa app yet.
It's coming with if you get Alexa Plus, you get the Alexa Plus app, the Alexa Plus app as well as Alexa.com.
Right, that's going to be a web version of this.
There is.
And usually see the more traditional long form.
work that you do with any AI browser at this point. It's the easiest way to say it. But you
also get all the personalization. You also get the context of carryover. If you had a conversation
in your kitchen, it'll just remind you what conversations you've had lately. If you've booked a
reservation, whatever you've done, it'll collect it there. So it'll be on your PC and your phone
as well. So I think we just want to provide that for our customers so they have the opportunity
to see it. I want my single assistant with me everywhere. You might use your phone for
different things. You might use a different AI assistant on your phone. I think that's a fair
you know, fair proxy, I don't, I wouldn't disagree. It just depends on what's the best path to
get something done. I think Alexa will provide a lot of that best path. Okay, I want to take a quick
break and then talk a little bit about the agenic elements in your new Alexa release, where
agents might be going, and then maybe we dream a little bit about where this technology is going to
lead. We'll be back right after this. Hey, everyone. Let me tell you about the Hustle Daily Show, a podcast
filled with business, tech news, and original stories to keep you in the loop on what's
trending. More than 2 million professionals read The Hustle's daily email for its irreverent
and informative takes on business and tech news. Now, they have a daily podcast called The Hustle
Daily Show, where their team of writers break down the biggest business headlines in 15
minutes or less and explain why you should care about them. So, search for the Hustle Daily
show and your favorite podcast app, like the one you're using right now. And we're back here on
Big Technology podcast with two Amazon executives responsible for the new Alexa.
We have Panos Pinae here is Amazon's senior vice president of devices and service.
And Daniel Rauch is Amazon's vice president of Alexa and Fire TV.
So it's interesting that this agentic buzzword is now starting to be translated into things
that we're seeing in product.
And it's kind of interesting because Alexa's had skills for a while.
call me an Uber.
And now you can use Alexa to call you an Uber.
So is this actually like a really a new moment for agentic AI or is this rebranding of
some stuff that works a little better than it has?
Panas, what do you think?
I can't get it to work anywhere else.
I mean, I think this is a, at the end of the day, it's incredibly new.
But it's also solving so many different things at the same time.
First, you have to always go back to.
how much understanding is in an utterance,
just in natural language,
being able to translate it.
We've talked about this already.
Getting down to calling a service,
calling the right API,
making the right partnership
so that API is called
to make it as simple as possible.
It's, I don't think it's been accomplished.
I don't think you're seeing it out there anywhere
connected to an assistant right now.
I think there's a lot of,
maybe you've seen it.
You've got to share with me where it is,
but I don't think you have.
I have not.
What, agents?
Fundamentally, like using, you know, a core LLM with an agent,
non-deterministic, calling the right API, calling that service,
booking that service, bringing it back and tying it back into all your other services.
It's a demo we've all seen a thousand times, but haven't been able to use, I think, as consumers.
Yeah.
Okay.
I think, yeah, maybe that's the case.
I haven't seen those demos myself, but I do, I believe it.
I believe it.
I maybe didn't need to watch closer.
But I do think it's new.
I think it's new what we've created and what we're doing and building it up.
I think it is.
I also think we mean, we might mean different things by agent.
And so I'm just curious, Alex, what do you, what?
Yeah, just to make sure we're grounded in your definition.
For sure, there's a grounding difference between us.
Just in passing, I mentioned yesterday in my own part of our event, you know, that, boy, everyone just uses this term agent.
And I do think people use it in different ways.
What does it mean to you?
Yeah, it's such a great question because I do think that in some ways that agent
has been used to rebrand automation.
We've been seeing automation demos forever.
I mean, even, so just to give you one example upon,
I wasn't trying to shade the Amazon demo.
I was just to give you one example.
Yeah.
We were all, I mean, a lot of folks watching the tech world
were at Google I.O.
When they demoed a voice assistant that will call a restaurant for you
and book you a table.
And like they did the actual conversation
and the assistant has like human utterance goes,
well, maybe we could have a table for
and then it would actually go
and book you the restaurant
I just don't remember using it but correct
so again there's the demo there's the demo
and then there's real life and but I think it was also
just like you gave a tech command and it would
go out and do that for you
but a lot of this stuff
like I said we've seen demos we haven't
seen it actually work my definition
for agent is something
that can go out and accomplish
for you
So, you know, you had a good demo that I enjoyed watching about trying to go see a Red Sox Yankee game.
By the way, for folks listening, the reveal event was in New York.
Daniel's apparently a Red Sox fan.
He trolled the entire audience, including the guy sitting directly in front of me.
It was almost like he planned it.
I kept saying, are you sure you want to do this Red Sox bit?
He's like, for sure, for sure.
It goes through the entire off-season acquisitions, which Alexis, I mean, as a Mets fan, I will say, you were fine.
You were fine.
By the way, you saw the info expert in action right there.
That's what it was.
Yeah, because you're now, and it was not deterministic.
And then, of course, it's a different answer every time, Alex, every time Daniel did the demo.
At the end of the day, I mean, it was Alexa's decision to talk about Alex Bregman.
It wasn't Daniels.
Like, you couldn't lead that.
You can't plan that.
And so a bit of a risky demo, because if Alexa decided not to talk about Bregman, I don't know where you would have taken the rest of the car.
I do know a lot about the Red Sox, so I figured, you know, maybe eventually we get to buy some tickets is what I was thinking.
But it was to set an example of that kind of agentic capability and sort of set the baseline of what we mean, which is, hey, I just, I actually was just having a chat about the Red Sox.
Could I get some tickets?
Actually, that's a tough game to get.
Oh, they're expensive.
Can you watch for tickets for me?
I mean, that was where we ended up with the demo.
Could have ended up in a lot of different places, but being able to set an agent off,
if you want to call it an agent in that case, we think about it a little bit differently,
but in that case, that agentic capability to say, first of all, I could buy you these tickets right now.
Second of all, you don't like the price.
I'll watch for you.
Infinite patience never runs out of gas.
If those tickets do drop below a certain price, I'm notified and can buy them.
That's a hugely useful thing for a customer.
Yeah, and you could buy it with a command.
Yeah.
Because you're integrated with Ticketmaster.
Exactly.
Yep.
So, yeah, to me, I would say that's agentic behavior.
Great.
I would say it qualifies.
We had some questions in a, we have a big technology discord.
I was like sharing notes with the crew as the event was going on.
And we had some notes from people about what they want sort of beyond those simple use cases.
I mean, I call it simple, but, you know, obviously there's a tech, there's a tech lift to get it done.
So one of our listeners said, is it going to, is Alexis still going to be reactive to requests or can it be proactive and suggest at the start of the day some smart ideas based on the context that Amazon has?
For instance, I would say, you know, do I need to order any birthday gifts?
And it would then go out and say, well, look, on your calendar, there are.
are, you know, five birthdays coming up, these are the dates and these are our suggestions.
So is it going to, because that's, I think, a step further.
I think you're stepping in, you said you want to talk a little bit about the future
and how proactive Alexa can be.
Like, there's a balance.
One, we think Alexa can be incredibly proactive, like, to the point of when you wake up in
the morning, you walk into the kitchen, it's like, Alex, you didn't sleep well, you know.
And then you can imagine integration with some partners that he's like, okay, let's have the conversation.
Also say, hey, your day looks pretty packed today.
You should probably find some time.
That proactivity is there.
It's in the system.
We're using it in a very different way.
We don't want to be intrusive with it.
We've got to learn from our customers first.
Like, how much productivity do you want?
I think it's very, very important to, you know, you don't want to jump to that future.
You've got to be right.
So, yeah, it's a good example.
I'll wake up in the morning.
And if I need to buy a birthday gift, can you just remind me?
We can create reminders.
We can create a conversational piece.
But I don't think a lot of people want Alexa just to wake up and start.
talking to you.
No, I do think that, yeah.
Don't want to be intrusive.
You've got to be really careful.
We've got to be so smart about, you know, we have 10 years of lessons.
This is what's so awesome about it.
And, you know, how much privacy matters.
And when you want to invoke Alexa to be part of the conversation versus when you
how proactive you want, you want it to be.
And, you know, we have a balance on it.
But I think it's a good push.
She's already proactive in the spirit of.
Um, she has a way to, if I, if I went out there and said, hey, I've been looking for this, I watched this movie last week. What was that song that was playing in that movie? Okay, give it that little information. Check prime video. What was he watching? Okay, I got it. I think you're watching this movie. It was this song. Proactivity also includes, do you want me to play that song or you just want the name of it? And a lot of times, Alexa will say, do you want me to play it for you? That's a subtle proactive. It's not intrusive. It's using context, you know, contextual information, some memory, some of your history. And in the past, you've asked me to play it every time.
So why don't I just ask you to play it?
I think those are different forms of productivity, but our vision includes Alexa being proactive.
It has to be.
That we believe the next step customers will ask for is I want her more, not less.
Right.
And so instead of me thinking, oh, I should ask Alexa, is there a point where Alexa will know to ask me?
I think that's a real question.
I don't think that's today.
I think that is the future.
And I think, you know, back to where, you know, we're pretty well positioned for that.
If that's what customers want, I think we can do it for them?
But I think what this listener was asking is, can I just, with natural language, say, can't, you know, take a look at my calendar and tell me something.
Okay, so that's different.
That's different.
Sorry, I went all the way to my vision, but here's what I'll pitch back.
That already happens.
Okay.
So when you wake up in the morning, whoever that listener is, here's the answer, yes.
With Alexa plus or with Alexa plus?
Okay.
Sorry, not with Alexa.
Right.
So this is new. There's no way it's going to happen with Alexa. Okay. It's not. But with Alexa plus, 100%. Wake up in the morning, get your daily brief, tell me what's going on. And, you know, Alexa knows what time you start work. We'll warn you of the traffic. You should probably lead by 8, 20 if you've got to be there by 9 today. Like that level of proactivity, that's in the system, but you have to engage first. Okay. This idea of Alexa being proactive. Like it is, it's definitely, I see where your caution is coming from because there are,
these proactive notifications that you get with Alexa,
I've had to turn them off.
Yeah, yeah, we learn from that.
Yeah, so okay, that's good that there's learning there.
I could go with some other Alexa product feedback,
but I feel let's use our time.
Let's stick with Alexa Plus for a minute.
But if you wanna talk about Alexa,
and we can tell you if Alexa Plus is fixed your frustration,
which we'll do that too.
Well, the one thing I'll say is I use it to play alarms
and there have been moments where it will play the ad
before it will play a song in the work.
So, um, but, but that kind of goes to a question that we did also get in the discord where
people talked about, um, they talked about who, who's assistant do you trust?
And in the back of some people's head, there will be this perspective, um, Amazon is just
going to try to sell me something. Like, for instance, that example of you didn't sleep very
well. Like, all right. It's like a suggestion for sleeping pills coming up. I don't know exactly what
it is. But like, how do you get past this perception of like, I'm going to get, because you do with
an assistant, you trust it with a lot of data. So how do you get to the point where people are
comfortable sharing this data and feeling good about the fact that it won't be used to lead to
purchases? Well, I mean, first, I think even before you get to that part of the question, it's just
how do you manage a customer's data? How do they see transparently what you're doing, what they've
told the system? Do they have control over their data?
So all of, that's so paramount that you have to start there, actually.
It's like one question earlier than that, which is, do you trust Alexa?
And the answer has to be yes.
So we've been building on a foundation of transparency and control.
There's the Alexa privacy dashboard, which is one great place to see everything in terms of system settings and your data, et cetera.
I just want to make clear all of that carries forward to Alexa Plus.
I think that's sort of the important point to make at the top.
And then if the question is, you know, is the question, boy, should I be, you know, should I be offered a product in a given case where a system thinks I need it?
I find that great when it's great.
It is great when it's great.
Like, I found a pair of shoes.
I don't even think it was on Amazon recently through something I was reading online.
And I've got orthotics.
And, you know, it's great when it's great, basically.
I was referred something.
They're awesome.
Altras.
They have a wide towbox.
I'm not going to sell ultrass on your show.
I'm just telling you that I found them.
Alters, if you're listening.
We need sponsors.
Is this the camera?
That's the ultra's sponsor.
Yeah.
Let's give them a heads out of it.
It's an art.
We need sponsors.
Alex needs sponsors.
It's an arcane example, but the bottom line is like, it's great when it's great.
And why is it great?
It's contextual.
It's relevant.
It's offering me something that I actually need.
and so building systems where you can do that elegantly,
like customers actually love that.
We get feedback that that's great.
It's not what's terrible is when you get inundated
with things that are irrelevant to you.
And so we're building a system that doesn't do that.
Does Alexa need to have a screen?
I mean, for this to you, you're the head of devices at Amazon.
A lot of the demos that you did at your launch event
were with Alexa with a screen.
Again, I have like first or second generation echoes in my house.
It might be time to upgrade.
You should upgrade.
Like, there's a couple of things you're missing.
One, you're missing speed that you could have, that you don't have.
And I think speed is time for me.
Okay.
It's comfort, you know, to confidence.
Like, there's so much.
Like, first, yeah, I would always encourage, not just because I want to sell the next device.
That's not why.
I just having something modern.
If your device is nine years old, you're missing eight years of tech.
Okay.
So I'm judging you, giving what you do, you know?
And so your feedback is like,
half heard at this point. But I would say, I'd say that, you know, jokingly, but I go, look, you need a more, it's better. The product's just better as it, you know, generationally. Generation over generation, always got better. Now it's, does it need a screen. Incredible. Yeah, it does. Okay. It doesn't have to have a screen. It's a better experience with a screen. Okay. It really is. Now, let me, let me qualify it because you have a screen in your pocket that works with Alexa. You have a screen on your desktop that works with Alexa.
the screen in your home, you should have one.
It's very powerful.
It's nuanced.
It's not intrusive.
The new design is elegant.
It's soft, if that makes sense.
It's what you want in the home, something softer.
You can get the expression from Alexa from that screen,
and she brings visual expressions as much as anything.
But here's the trick.
It will come with you in your earbuds.
It'll come on your Alexa frames.
It'll be in your pocket.
It will be in your car.
you don't always need a screen, but in your home, I mean, the command and control, the information
management, what you get off of it, it is powerful.
Will it work without a screen?
Absolutely.
Absolutely.
And it'll be great.
So need is a relative term.
I want you to have a screen.
Okay.
Because the experience is that much better.
And there's a nuance in it.
Like when we start rolling out preview, the first customers to get preview will be our screen-based
customers because it's the best.
experience. Okay. That's simple. And so you'll be like, I want the preview and I'll say get a screen. Get a screen. All right. Maybe two. And then we'll light up all your, we'll light up all your echoes, but, but you need a screen. Okay. Maybe one in the kitchen, one in the office. You only need one. Yeah. I mean, it's up to you. But yeah, I mean, it's up to you. Keep the screen out the bedroom. At least, that's my perspective. Totally. Like, you know. The only screen I allow in the bedroom is the Kindle. It's a cool product. But I'm using mine here now. I'm just taking out. Listening to you, by the way, I got the
alarm in the morning note I get that bug filed like I got you but the but the but the idea that
different devices work in different places is real right but I think you need a central hub right
now I think Alexa plus is so dynamic um and the more you can learn to do the screen will teach
you like hey get after it yeah you saw Daniel's thumbtack demo which is a little bit even is was
more agentic than if you will for us then the grubhub slash
Did we do Grubhub or OpenTable last night?
Open table, yeah.
With Uber.
Right.
But the Thumbtack demo was, you know, conversation, let's, I need a repair person.
Well, that agent goes out and starts booking it for you on the website.
And then you need the screen to give you a status, like working on it back in a bit.
Don't worry about it.
Okay.
I think that's what you want that ambience for in the background.
So I think the can't be more clear, I don't think.
I think it'd be great.
Okay.
I'm sold.
I'm going to get one.
All right.
We're running up on time here.
want to give you both a minute to answer this question and then we'll head out. But
it's got to be a minute or your team here will have my head. We talked about how voice AI might
be the future of AI or the catalyst for these large language models on the show a while
back. Open AI, for instance, debuted or introduced this advanced form of AI called with GPT-40.
And you can see the inflection point of chat GPT that the second they announced that, bam. It goes
from 100 million to 300 million users.
Is voice AI the future of artificial intelligence?
Start and I'll close us out.
I mean, we've believed for a long time
that voice is the most natural interface.
We're using it right now.
We're using it with your listeners.
We're using it with each other.
It's incredibly expressive.
You can load an unbelievable amount of context
and power in it.
You can be definite.
You can be vague.
You can be nuanced.
And it's just we're born with the knowledge
of how to use it.
And it's completely intuitive.
So I think we do strongly believe that it's one of the best ways to get things done.
It is not the only way to get things done.
But I do think it's pushing us.
It's challenging us to get more and more human, more natural.
And that's why it's always been one of the kind of centerpieces of our vision for Alexa.
So yes, my answer is yes.
And I think it's really pushing the envelope now.
Okay, I'm in it to you, Panos.
I think we're at that time where this is the inflection point.
And I mentioned it yesterday, you know, that I believe the vision for Alexa's,
incredibly ambitious. It centers around voice for sure. I don't think it ends at voice. I think
the interaction model needs to be the one that's most natural to you. No doubt. If you need to
touch the screen to complete a task, if you need to get to your computer and write the long form,
I think it's a flow. And the thing you don't want to do is you don't want to block the customer
from the interaction that they need to go get something done. It's why we're on the phone. It's
why we're on the PC. It's why we're in your glasses. It's why we're in your ears. And ultimately,
though the anchoring point of all of it is the voice because it is natural it's innate to all
of us the trick is getting to natural conversation the trick is trusting that you can just talk
and and realize that as we talk to each other it's pretty sure you can talk that way with
Alexa and you're going to find that and I think that is the transformation that's coming
I think it finishes you know um the next chapter ends the first chapter and starts the next
chapter and leads us to getting, so finishing is the wrong word there, but getting us to that
next, that next leap over the next 10 years, this is that starting point. The technology is
enabling it right now and that inflection is happening. Um, and it's compelling. So it was a longer
way to say, yeah, it starts with voice, but I don't think it ends with voice. It never will.
Like we, it is also innate to us. You always, we as humans, we're always going to find the best
and easiest path to get something done.
And we think voice will lead to most of that.
But not all of it.
Like, we don't want to overstate it.
Like, we will find the best, easiest,
which means basically the fastest path to completion,
which is why you need to upgrade your devices
and get a screen.
Are you with me?
I told you already I'm buying.
All right, well, get on it, man.
We sold at least one device here in New York.
Good news.
Thank you.
While we're here.
Our goal this week was not to sell devices,
but we'll do that soon.
Very efficient and scaling.
We're killing it now.
We have a new sponsor.
We sold a device.
This is good.
We're killing it.
Well, look, Panos and Daniel, I want to just say while we're recording that I don't take
it for granted to be speaking on record with Amazon.
It's always great for me to be able to hear what you're doing and be able to ask these
questions.
And I'm sure for listeners, it'll be great as well.
So thank you both for being here.
Thanks, man.
It's been a job.
Thank you so much.
Yeah, it's been great.
Awesome.
Well, thank you everyone for listening.
And we'll see you next time on Big Technology Podcast.