Hard Fork - GPT-5 Arrives + We Try the New Alexa+

Starting point is 00:00:00 I had a first this week. What's that? I had my first experience with smelling salts. Wait, did you faint? Yes. So I had to get a blood draw at the doctor, and I am a big baby when it comes to getting blood taken. And so, like, half the time when I get blood taken, I pass out.

Starting point is 00:00:21 And this time, I not only passed out, but I vomited. And had to be brought back with smelling salts, which, Casey, if you, You have never experienced smelling salts. They're not messing around. They are not. And I, okay, so I cannot believe that I'm just learning this information about you because I am also a fainter. You're a fainter. I am a fainter.

Starting point is 00:00:44 We are legion. In 12th grade, we went to go see a cadaver for my AP biology class. And intellectually, I was, like, so fascinated, you know, by all the systems of the body. And so me and all the other kids are standing around the cadaver, and the person is sort of explaining, well, you know, this is the liver. and this is the spleen. And then I just got a whiff of something. I don't know if it was embalming fluid or formaldehyde or something,

Starting point is 00:01:07 but it was like something triggered in my brain and was like, this is against nature. Like you should not be this close to an opened up dead body. And I truly, I spun around. I took a header off a whiteboard that was like, you know, against the wall. And I wake up and I'm staring at the ceiling.

Starting point is 00:01:21 And the first thing I hear is my AP bioteacher, Miss Oliver, saying, do we have an emergency contact for this kid? obviously like I don't want to tell people that they should like faint but it is one of the most amazing crazy experiences you can have in the human like do you know what I mean the moment when when you're like your consciousness just sort of like leaves you yes crazy I and when I was brought back with the smelling salts it felt very like Victorian you know like I was like no the vapors the vapors! Call Mr. Darcy I'm Kevin Roos, a tech columnist at the New York Times. I'm Casey Noon from Platformer. And this is Hard Fork.

Starting point is 00:02:02 This week, give me five. GPT5. We'll tell you all about OpenAI's latest frontier model. Then, Kevin and I get access to the new Alexa Plus. We found a few minuses, and we're bringing in Amazon's VP of Alexa to talk about it. Alexa, prepare my interview questions. Well, Casey, it has been another busy week in the world of AI. Boy, has it.

Starting point is 00:02:36 And because we are going to talk about OpenAI, I should add my disclosure that the New York Times company is suing them and Microsoft for copyright violations. And my boyfriend works at Anthropic. So we've gotten a bunch of new AI releases and announcements this week. We're not going to go through all of them, but some of the highlights, we got something called a world model from Google DeepMind. Genie 3 has this kind of interactive game engine where you can just sort of like describe a game that you want to play

Starting point is 00:03:05 and it can sort of build it in real time. Pretty cool. We can't use that yet, so that was just a demo or research preview. But that was early in the week. And then we got a new Claude version. Opus 4.1 is out. So I've been playing around with that. Not too different, but sort of a newer update from them.

Starting point is 00:03:24 We also got open source models from OpenAI, putting the Open back in their name. They released two open source models this week. And Casey, have you played around with either of those? I have not yet downloaded them. Have you? I have not. One of them is apparently small enough that you can run it on a MacBook. Another one, you kind of need a dedicated GPU for it.

Starting point is 00:03:44 But these are basically OpenAI's first open source models since GPT2 many years ago. People have been hounding them. You guys are sort of betraying the founding spirit of open AI by not making these things open and accessible through open source. And they said, well, here you go. Here are some models. They're not their sort of top-of-the-line models, but people are sort of finding various uses for them. And this is sort of designed to compete with the open-source models coming out of China with companies like Deepseek. Yeah.

Starting point is 00:04:15 And the early word on these is that they're pretty good and that they're competitive with the 03 mini, 04 mini, which are more proprietary models. But yeah, the early reviews I was reading of the open source models were that they were powerful and good. Yeah. So those are some of the announcements that we got earlier in the week. But the big one is that this week, OpenAI released GPT5, their long-awaited flagship model. People have been asking Sam Altman about this, including us, for many months now. This was long-awaited. There was lots of hype and rumors flying around about this.

Starting point is 00:04:47 And we just got off a press briefing, a sort of Zoom call with Sam Altman. and some of the other leaders of OpenAI. And Casey, what did we learn? Well, it probably won't surprise people to learn that what they told us during this briefing was that GPT-5 is their best model ever. You know, Sam Altman in his remarks, said that this is a major upgrade.

Starting point is 00:05:12 He called it a significant step along the path to AGI. But he also said that we're not at AGI yet. Among other things, for example, he said, look, this model does not continue learn, and in his view, AGI will continuously learn. So I actually thought it was cool that he said that, because now we have sort of one thing to hang on to where, well, maybe one a model can do that, we'll feel like we really are getting close to AGI. But, you know, the one other

Starting point is 00:05:36 thing that he said that struck me that I thought was kind of funny was that he said that after they had sort of put GPT5 together, he went back to using GPT4, and he said, quote, it was quite miserable. He said he never wants to go back to using GBT4 ever again. That's how good he says GPT-5 is Kevin. Yeah, so he sort of compared it to the previous models. He said GPT-3 felt like talking to a high school student, GPT-4 felt like talking to a college student, and GPT-5, he said, is the first time it feels like talking to an expert, someone who has a PhD in a subject. So I think we should caveat this all by saying that as of our taping this week, GPT-5 had not yet been rolled out, and we hadn't been able to sort of put it through its paces, but it will be rolled out

Starting point is 00:06:21 this week, including to free users of chat GPT, who have not previously had access to their top-of-the-line models. Yeah, and I think that that's important because OpenAI's best models at the moment have been reserved for paying users. So the chatbot that I use the most is 03, which is a reasoning model that OpenAI makes. That's not accessible to people who are on the free plan. So I do think it's really notable now that even free users, which I think is going to include a lot of high school and college students out there are now going to have access to what at least they are saying is PhD level

Starting point is 00:06:56 intelligence and reasoning. Yeah, so one of the most annoying features of chat GPT for years now has been this model selector where you go in and it sort of gives you this little drop down menu and it defaults to GPT40 now, but you can kind of pick your own model if you want something more

Starting point is 00:07:12 powerful than that and you're a paying user. But for GPT5, OpenAI is doing away with the model picker or at least making it sort of less necessary because they have built this what they called a router that will essentially analyze your request and how much sort of computation it needs to be able to answer that request and whether it's a simple query or something more involved and it will kind of direct it to the correct model. And I think for a lot of people, this is going to be their first experience with a reasoning

Starting point is 00:07:43 model because OpenAI does not make that the default right now in chat GPT. And so I think that will be a big update for people, regardless of whether GPT-5 is actually better than previous models, just the ability to kind of use these reasoning models for free. It seems like a pretty big deal. Yeah, getting rid of the model picker, I think, could cut both ways. And we should say that all of the big labs have a model picker. So Gemini has one, Anthropic, has one in Claude. And sometimes I will, you know, maybe ask an easy question of one of these models that is maybe set to reasoning mode. And then I'll think, oh, gosh, you know, I probably did. need that much computational power. On the other hand, I do sort of feel like it sets up an incentive

Starting point is 00:08:23 for Open AI, which wants to sort of save as much money as it can, to just always try to route you to sort of the absolute least compute that you need. And so I will just be curious to see if I feel like that is affecting the quality of my experience now that I, you know, maybe can't go in and actually say, hey, let me use the good stuff. Yeah. So on this briefing, Open AI said all the sort of expected things about how GPT-5 is, you know, better at everything than previous models. But they also spent a lot of time talking about what they called the vibes of the model, which they believe are quite good. They also gave a series of demos, and one that I thought was interesting, introduced this concept of what I believe Sam Altman called software on demand. So GPT5 can instantaneously create a piece of software for you.

Starting point is 00:09:11 In the demo that we saw, one of the employees there built a tool to let his girlfriend, friend learn French. And it did this in some sort of fun ways. In one case, it sort of created a series of flashcards for her. In another, it created like a little snake game with like a mouse and a piece of cheese. So every time like the mouse caught a piece of cheese, it would like show her a new word to learn. And he was able to do all of that just via a text based prompt. And it actually looked like pretty good. I mean, I think, you know, five years ago, if you turned that in an intro to computer programming class, you probably would have gotten an A. Yeah, it's pretty impressive, but those are also things that other models can do today.

Starting point is 00:09:53 So I'm going to need to, like, really drive this thing myself to figure out what it can do, and I'm going to put it through my usual bevy of tests, known as Roos Bench, and see how it does. Yeah. I confess, during this briefing, I zoned out a little bit. I've been to a bunch of these. Everyone says their model is the latest and greatest, and it's so good at coding, and it's got all these agentic capabilities, and it all starts to sound a little bit like marketing hype to me. For me, the interesting question to ask about new models these days is not how much better is it

Starting point is 00:10:25 or how does it score on these benchmarks? It's like, what is possible for me now that wasn't before? And I still don't have a really good answer to that from GPT-5, although I'm going to investigate. Yeah, and, you know, that might be a good point to bring up two of the questions that got asked of the GPT-5 team. during the briefing that we're on that I think would be of interest to our listeners. One was somebody asked, hey, like, are you starting to run into the limits here, right? Are the scaling laws holding? And Sam Altman said, quote, they absolutely still hold, and we keep finding new dimensions to scale on.

Starting point is 00:11:02 He said that they're still finding new paradigms that will let them scale in new ways. So he very much tried to give the impression that, like, no, we are not struggling at all to figure out how to build better models. I suspect, though, that that might get some pushback as people start to use this thing and just sort of observe that, like, yes, it is like clearly better in a handful of ways, but to your point, Kevin, can you really do anything that you couldn't do before? That doesn't, like, seem like it's the case. It's just sort of that it can do what it used to do a little bit better. I'm curious what you made of that. Yeah, I think that's reasonable. I mean, I wonder if they are starting to finesse their definitions of the scaling

Starting point is 00:11:39 laws to sort of account for the new reasoning models because people, you know, for months now have been saying, well, the models in the pre-training phase may have gotten as good as they're going to get or about as good as they're going to get. But the way to get them to be more intelligent is through this post-training phase, is through these reinforcement learning cycles, these sort of reasoning environments that they're trained on and put into. And so I suspect that when they say that the scaling laws have not broken, they're also sort of referring to this kind of reinforcement learning, reasoning approach as well. And I believe them. I've talked to people who say, you know, they think there's still a long way to go on that. But this was a really big

Starting point is 00:12:19 model. We don't know exactly how big. We don't know exactly how many GPUs it was trained on or how much data it was fed. But it's safe to assume that they did everything they could to sort of max out the scale of the model, at least in pre-training. And, you know, from what we saw in the demos, it doesn't look like it's like that much smarter. I mean, maybe it's a little better at some things, but it did not, you know, come out of the box super intelligent or anything like that. Yeah. The other big question that I'm always interested in when these big new models come out is what was the safety testing experience like? And is this model going to be sycophantic? And what sorts of very intense relationships are people going to form with it? And Nick Turley addressed

Starting point is 00:12:59 that one. He noted that earlier this week, Open AI put out a blog pose, which actually wrote about in Platformer this week, that is all about their approach. here. They say they're working with physicians, really trying to bring in a lot of outside expertise to help them understand how people are interacting with these models and make them safer. And he said that they are absolutely not optimizing for engagement here. They just want to sort of make a useful tool that sends you on your way and that essentially they're going to have more to communicate about this soon. So we didn't get a ton of detail there, but they have said that at least in some ways they think that they have improved these models to make them less

Starting point is 00:13:34 sycophantic. In addition to that, they said they did $5,000. hours of red teaming. They shared, I believe, these models with some external experts for advice on that. So they did say that they rate this model as high on this scale of could it be used to create novel bio risks. So they're building in a bunch of protections around that. You know, but that doesn't seem great. But anyway, that was the sort of safety report that we got in advance of the launch. So they also said that in addition to all these new capabilities, GPT-5 is much more reliable than previous models. They claim it hallucinates less.

Starting point is 00:14:14 And it does this interesting thing called safe completions, where basically if a model doesn't want to, you know, accommodate some request or carry out some task because it's sort of against the guidelines, instead of just refusing it, it will sort of make up a safer version of the request and complete that instead. So it would be interesting to see how people use that.

Starting point is 00:14:36 But yes, this is the claim they make. It's more reliable, less deceptive, and gives these kind of safe completions. Well, and that actually gets into something interesting, though, Kevin, which is, like, what is it that makes OpenAI say this is GPT5? The sort of, like, big number releases have this amazing marketing power now, I think, because the leap from GPT2 to GPT3 was so big, and the leap from three to four was also pretty big, and so that creates a lot of expectations for five. But in the background, Open AI is just trying a bunch of things, building a bunch of new models, and then sort of stapling various

Starting point is 00:15:17 things together, and then eventually they get to something and they say, we're going to call this one five, but like it's not quite as linear as it looks from the outside, right? Yes, and all of the labs have had experiences where they thought they were training like a new model, and then it didn't quite work out for as well as they wanted it to. And so they sort of assign it some lower number, you know, that's happened at least a number of big labs that I know about. It's happened at OpenAI. You know, they had a previous big model that they were building, which ended up becoming 4.5, I believe, was supposed to be GPT5 at one point, and it just didn't turn out as well as they'd wanted it to. So yes, they are playing games with the numbering of the models and sort of the

Starting point is 00:15:58 marketing around that. But I think calling this GPT5 just signals that they want to, they want this to be viewed as a similar step in capability that people saw from GPT3 to GPT4. Yeah, to me, that is one of the most interesting things about this release is like whatever GPT5 turns out to be, this is the thing that they thought was kind of the next big step forward. And I think we should like evaluate it on those grounds. I think it's also like the big picture here is that Open AI is trying really, really hard to stay at the head of the pack. This is a company that has been sort of, of racing very hard toward AGI or something that they can claim is AGI, and they are still going.

Starting point is 00:16:41 I find actually their execution to be quite impressive. Like this is now a very large company. They've got a lot of different competing teams and priorities, and they had all this board drama, and I think it was reasonable to expect, and I certainly expected that in the wake of all that, they would sort of slow down and maybe allow some competitors to catch up. But they showed this week that they are not slowing down, that they are, in fact, accelerating and they want to get here before anyone else. True, although they have also experienced a lot of poaching in recent weeks and months. And I think one thing I'll have my eye on over the next several months is, are they able to continue iterating very quickly or have some of the losses that they've experienced over the past few weeks really hurt them?

Starting point is 00:17:28 Yeah. You know, incidentally, Kevin, I'm told that in response to the GPT-5 launch, inside meta-headquarters, the super-intelligence researchers have moved their desks even closer to Mark Zuckerberg. So that's how seriously they're starting to take this over there. They are now sitting on top of Mark Zuckerberg. There are now two researchers who are sitting at Mark Zuckerberg's desk with him, and we'll have to see how that plays out. Yeah. Yeah. Those are some of our initial impressions.

Starting point is 00:17:53 But we are going to actually come back tomorrow after we have had a little time to play. with the model and give some first impressions there too. Let's travel to the future now, Kevin. All right, Casey. It is now Thursday. GPT-5 has been officially released for a few hours. I still do not have access to it for some reason, but I gather that you do.

Starting point is 00:18:15 So give me your day one vibe check. What are you seeing? What is GPT-5 like? And what do you make of the reaction to it? Well, this is a very significant moment in the history of Hartford, Kevin, because for the first time, I'm having a conversation with you while vibe coding something. What are you vibe coding?

Starting point is 00:18:34 Well, ChatGPT is currently hard at work building a to-do list app for me. I said I wanted it to have the aesthetic of the Fantastic Four First Steps movie that just came out. I didn't love the movie, but I did love the production design. So I was like, make me a to-do list app. It looks like that. Let's see how it goes. I can't believe we're building giant gigawatt data centers for your story. stupid to-do apps. This is so wasteful. God. Listen, you need to send over Roos Bench, your proprietary

Starting point is 00:19:04 suite of Eval, so I could really put this thing through its pages. But look, let me give you some high-level notes, Kevin, on what I'm seeing and on what others are seeing. The headline here is that this does seem like a really meaningful improvement to chat GPT. And I think in particular, if you are a free user of chat GPT, you're going to have a great day, right? Because for the first time now, in addition to the kind of standard chat GPT model, you're going to have some reasoning capability.

Starting point is 00:19:35 So essentially, if you're cheating your way through high school, you're just going to have a lot easier time of it now because this thing can do some really extended work on long problems. Yes, you can now cheat your way through an entire semester with just one press of a button. Exactly. This is good. Yeah, I mean, the way I saw people talking about it online was that they thought that OpenAI had not raised the ceiling of the AI frontier by a lot with GPT5, but they had raised the floor. Essentially now, all the free users who previously got defaulted into the less powerful models are now going to be using the more powerful models, which could be a big perceptual shift, if not, I've sort of won about the frontier capabilities. For sure. And I do think it does have some things that are not quite capabilities, but still will meaningfully affect how people use these AI systems. For example, this thing really is just a lot faster than its predecessor. So just over the past couple of hours, I took some editing work that I sometimes asked chat GPT to do. I know about how long it takes using the O3 model. I put it through GPT5. And sure enough, yeah, it blazed through it. It did just as good of a job as it had done before. for. And so if you're the sort of person who's using chat GPT a lot, I think that's really going to

Starting point is 00:20:52 stand out to you. Yeah. What about the pricing? I saw some people saying that GPT5 was much cheaper than they expected it to be. That would not for cheaper for the sort of chat GPT subscriber, those subscription prices are staying the same. But for developers who are building on top of it, my understanding is it's a lot cheaper than other models from other AI labs. That's right. It came in at $1.25 per one million input tokens, which is the same. same as Google's Gemini 2.5 Pro. Google has, of course, also been pricing really aggressively to try to box out the competition. What makes that figure interesting, I think, Kevin, is that that number is a lot smaller than

Starting point is 00:21:31 Anthropics Claude for Opus API, which comes in at $15 per million input token. So I think some of these really well-capitalized AI labs are taking this moment to say, hey, we're going to put a lot of pricing pressure on some of our competitors. Yeah, it very much reminds me of the moment, like, 10 years ago when, like, Uber's were $4 because the venture capitalists were just subsidizing the artificially cheap prices. We're in sort of that moment for AI tokens now. Yeah. What else can we say about GPT5 in the couple of hours now that it has been out? Yeah, well, you know, I sometimes like to joke that the worst insult you can make to anyone who has just released a new AI model is my timelines are now longer.

Starting point is 00:22:14 and it does seem like that is something that people are saying about the new GPT-5. And what I slash they would mean by that is I now think it's going to take a little bit longer until we reach AGI, some sort of very powerful AI system. In fact, some people are posting online screenshots of some prediction markets

Starting point is 00:22:35 that until today, when asked who do you think we'll have the most powerful AI model at the end of August, we're showing open AI in the last, lead and almost instantaneously after the live stream on Thursday, Open AI collapsed, and Google is now ascended and is assumed to have the best model by the end of this month. So, you know, I don't want to overstate what that means necessarily, but it does seem like there was a huge contingent of people who thought that GPT5 was going to be this revolutionary new model, and it

Starting point is 00:23:07 seems instead like a more evolutionary one. Yeah, that makes a lot of sense to me. One other thing that stuck out to me, and I wonder if it stuck out to you, too, was Open AI released some benchmarks and some data about GPT-5. And one of the things they showed was that hallucinations, the rate of GPT-5 just making stuff up while answering questions, has gone way down. It's now, you know, for some types of questions, it's sort of around 1% hallucination rate. And I think that was interesting to me because this is clearly something that was a problem with earlier versions of this. In fact, there was some speculation and some indication that actually newer, these newer reasoning models were hallucinating at higher rates than the sort of

Starting point is 00:23:49 previous generation of models. And there's a lot of concern about that. And it seems like they have figured out a way to get the hallucinations under control with GPT-5. Although, with everything, I don't totally trust these benchmarks. I'm going to have to see this for myself. Yeah, everyone's mileage is going to vary on this one. I will say I have already caught it hallucinating a couple of times, like, somewhat disappointingly. So, as always, don't trust these things for anything mission critical. You're always going to want to double-check your facts. Yeah, only use it to build stupid to-do apps with the Fantastic Four aesthetic on them.

Starting point is 00:24:26 The Fantastic Four have a very cool aesthetic, and I think you need to open up your mind a little bit. Okay, that is our day one vibe check of GPT-5, and we will continue to play around with this and tell you anything cool or interesting or strange. or upsetting that we find. Sounds good. All right, that's enough about GPT5. When we come back, we'll talk about another AI system we got our hands on this week, Alexa Plus.

Starting point is 00:25:07 Now, Casey, are you in, Alexa user? I have been an Alexa user for a long time. I still have one of the original Amazon Echoes in my house. And to Amazon's credit, it still works. The Pringles can. Yeah, I have the big old sort of Pringles can echo. Yeah, me too. So I am a heavy user of this product. I have probably five of them in my house in various rooms. And so I'm very excited for our conversation today, which is going to be about the new AI-ified Alexa Plus. And before we get into, who our experience is using this thing and our interview with the guy who runs it, we should make a couple disclosures.

Starting point is 00:25:44 One of them is that the New York Times company has recently agreed to a licensing deal with Amazon that will allow Amazon access to Times content for its AI platforms, including Alexa. So we just thought you should know that. We have nothing to do with that, obviously, but that is going on in the background and another part of the company. The second thing we should say is that if you have an Alexa device, it is going to be going off constantly during this segment unless you go over right now and hit the little button that mutes it. So sorry in advance to Alexa owners, but we'll give you a little time right now to

Starting point is 00:26:16 pause this, go over, hit the mute button on your Alexa and come back. Or alternatively, just find the circuit breaker at whatever house you're in right now. You're just shut them all off. Run on battery power for the rest of this episode. Alexa, order 14 bags of dog food. I wonder if that actually works. Wait, and I should also probably disclose that my boyfriend works for Anthropic because I'm pretty sure that Anthropic is providing APIs that are being used in Alexa Plus. Wow, we've got so many disclosures today. Yeah. Yeah. Okay. All right. Let's get started. Okay. Alexa. Alexa is one of the most puzzling technology products that I have ever encountered. And like you, I have been an Alexa user since the very early days. People don't realize

Starting point is 00:26:57 this is this product was released in 2014. Alexa is 11 years old. Yeah. And when it came out, I was very excited. I thought, I'm going to put this smart speaker in my house, and I'm going to, you know, ask it to do things for me, and it's going to be like having a little assistant right there on my kitchen counter. And Alexa has added dozens of features, maybe hundreds of features since 2014, and I use zero of them, because the three things that I use Alexa for are setting timers, choosing music to play in my house, and telling me the weather before I leave for the day. Absolutely. Are those, like, similar to what you use this for? are the exact three things that I use Alexa for. Have I tried to use it for other things? Yes,

Starting point is 00:27:39 but the experience, frankly, has never been that great. So I always come back to those three. Yes. Those are the big three in my house. Same use cases, same limitations. But when generative AI started to get good a couple years ago, I think people naturally started to ask, well, when is Alexa going to start using this new generative AI technology? It's sort of built on this older, more deterministic kind of system, but it seemed like a natural thing to expect that Alexa would start to incorporate some of this technology to be able to answer maybe more open-ended questions, to give longer, more detailed responses to do more than just set timers and tell you the weather. Yeah, I mean, once OpenAI released voice mode for chat GPT, it just immediately

Starting point is 00:28:20 seemed so much more interesting and powerful than Alexa and Siri, which is Apple's very similar system that it makes for its devices. And so, yeah, I think both of us were like, okay, Well, when are we going to get that OpenAI-style voice mode in these smart devices that we have in our homes? Yeah, and so it's taken a while. We should say that. Like, it has not been a smooth or simple process. And part of what I'm so excited to talk with Daniel Roush, the VP of Alexa, about later in this show, is just why it's been so hard to sort of shove an LLM-based generative AI technology into this pre-existing assistant product.

Starting point is 00:28:53 But we should just talk briefly about what Alexa Plus is and then our experiences with it, because both you and I have gotten to try this over the past few days. Yeah, so, Kevin, tell us a little bit about Alexa Plus. So Alexa Plus is the name for the most recent overhaul of the Alexa Virtual Assistant. It's powered by generative AI. We don't know exactly which model or models, but it seems to be sort of a mix of Amazon's proprietary AI models, and then maybe some of Claude, which it has a deal with Anthropic. And so it's been using Claude inside of its AI products for a number of

Starting point is 00:29:28 months now. And Amazon claims that the new Alexa Plus is able to do much more. It's able to be much more conversational, more personalized. It can do things like book reservations at a restaurant or order you an Uber. It can answer questions that sort of aren't just pure lookups where you're looking for, you know, what time is the baseball game tonight? It can actually do more complex things for you. It can control these smart devices and appliances in your house, and it can purchase things for online. So this new Alexa Plus is not out to everyone yet. They've been rolling it out slowly. They are now in what they call the early access period. But we were able to get this on some new devices that we ordered. It also doesn't work on every kind of echo device. You have to have

Starting point is 00:30:13 sort of one of the newer ones to be able to run it. And Kevin, when you say that this has been rolling out slowly, like it has been rolling out extremely slowly. Like, it was only on June 23rd that Amazon said that one million people had Alexa Plus across presumably hundreds of millions of echo devices out there. Yeah, so you and I both got the new echoes that can run the Alexa Plus early access program and turned it on and set it up. And a few things stick out to me right away.

Starting point is 00:30:42 One is the voice on this new Alexa is just way better than the old Alexa. Yes, I would agree with that. It is way more fluid. It sort of sounds like, more like something you hear out of chat GPT voice mode, they have managed to sort of overhaul the actual voice part of the voice assistant, so it sounds much more like a human.

Starting point is 00:31:01 And there are a bunch of different voices. I think I saw eight of them inside the app. Half of them are masculine. Half of them are feminine. So, yeah, you can change that to your liking. Yeah. The other big difference I noticed right away is that the new Alexa Plus

Starting point is 00:31:14 does not require you to say the wake word, like Alexa between every question and answer pair, right? With the old Alexa, wanted to ask a follow-up question, you had to say Alexa again. With the new one, you can just kind of leave it and it will sort of intuit or pick up that you have a follow-up question and it will sort of listen for a while longer. So you can actually have these more extended multi-turn conversations. Yeah. And that lets it do different kinds of things. Like one of the first things that I did with Alexa Plus was it said, hey, would you like to try to solve a riddle? And I

Starting point is 00:31:47 thought, what are you the sphinx? But I said, well, sure. What the heck? And, you know, It gave me a series of clues, and within, you know, three clues, Kevin, I was actually able to solve the riddle. Wow, good for you. Yeah, it's totally smart. I'm so proud of you. Thank you. Thank you so much. So, yeah, what else were you doing with this thing? So another thing it can do is just give you longer answers.

Starting point is 00:32:08 Like the original Alexa was sort of limited to a sentence or two. Maybe you could ask it to look something up on Wikipedia and it would sort of spit out a few sentences, but it was really limited beyond that. But I can now ask it to make up a story and read it to my case. So we had some fun doing that the other night as a family. You can ask it to suggest a recipe for dinner based on what's in your fridge, and it will sort of help you with that. I used that last night. So these are some of the new features that I was excited to try. I also tried some of their integrations, like they have an integration with OpenTable and with Uber and a bunch of other company.

Starting point is 00:32:42 Tell me about this, because I set this up, but I did not actually use it. So how did that work? So basically, you scan a little QR code on your phone and link your Uber account or your OpenTable account. or your open table account to your Alexa account takes, you know, a minute or so. And then you can just kind of say, like, order me an Uber from this place to this place, or I want a table at a restaurant in downtown San Francisco near the ferry building for two people at 630 tomorrow, and it will sort of pull up a couple options, and you choose what you want, and then it can go book the table for you.

Starting point is 00:33:12 So that I thought was cool. And that actually worked when you tried it. So I did not actually follow through with the booking, but I did order an Uber for myself, and it did work. Okay. Cool. I mean, that actually seems like truly useful. Just be like, say to your thing on your desk, hey, I need an Uber to the airport and it pulls

Starting point is 00:33:28 up one. That's great. Yeah. And it can do other cool sort of multi-step things, too. Like I was able to say, I needed a new thing for my kitchen, like a box grater. And I was able to kind of go to Alexa and say, hey, look up on wirecutter what the best rated box grater is and add it to my Amazon car. Now, can I guess why you need a new box grater?

Starting point is 00:33:47 Why is that? You used it to great ginger and it dulled the edges. No. Okay. What was the reason? I left it in an Airbnb. Okay. I should have seen that coming.

Starting point is 00:33:55 Anyways, go ahead. So anyway, those were some of the good things about this product, but we have to talk about some of the limitations as well. Casey, what was your experience with Alexa Plus? Okay, so I have to say I did not have a good experience with this thing. First of all, I bought an Echo Show 5. There's a big banner on the page that says it works with Alexa Plus. And so the thing shows up at my house.

Starting point is 00:34:19 and basically what I've come to understand is that an Echo Show is a device that just constantly invites you to spend money with Amazon and I found it honestly infuriating because I plug this thing in and when you set it up, it's like what kind of background do you want? I was like, show me some art, you know,

Starting point is 00:34:37 that's one of the options. And I would say for about four seconds per minute it would show me, you know, some Renaissance, you know, masterpiece or something. And then it would be like, hey, do you want aspirin? You want paper tablets? You want to buy paper towels? You can actually buy paper towels right now.

Starting point is 00:34:51 Just say, hey, Alexa, buy paper towels. And it was just sort of this, like, forever. So I eventually just unplugged a thing because I was like, why did I just spend $90 to have a permanent rotating advertisement for household products, like, on my desk? That is so weird. So I was just, like, it just put such a bad taste in my mouth about the whole thing. And then a day later, I get the Echo Show 15, okay?

Starting point is 00:35:16 And for some reason, Amazon sent me two of them. truly don't know why. I did not need two of them. And so I unbox the thing and the thing is meant to be mounted on a wall. Now, there are a lot of things I'm willing to do for a podcast, but mount an echo show on my wall. You're not willing to do a construction project. It's not one of that. No, I was not going to do that. And so I just had, and also the thing like can't stand up on its own. So I just had like a 15 inch screen sitting on my desk for a day while I'm talking to it. It was like, This whole thing is, like, very silly. So that's, like, the hardware side of it, okay?

Starting point is 00:35:54 You may have a better experience because, you know, I don't know. You like mounting things to your wall, and so you did that. And so, you know, you're having a good time. But that was just kind of all of the, you know, the precursor steps I needed to take to even be able to, like, engage with this thing. And so then I, you know, I finally have it set up. And I start to try to put it through its paces. So I, you know, go through the little riddle game. And I, you know, it's like, hey, like, I could help you, like, do a personalized meal plan.

Starting point is 00:36:22 It's like, all right, great. But, yeah, you set me up with a personalized meal plan. And, you know, it's like, well, you know, we could do this or that. And it showed me, like, a row of, like, recipes that it could cook for me. And so I swipe through with my finger and I see a lemon pasta. And I say, okay, show me the lemon pasta. And it says, sorry, I didn't get that. And I said, Alexa, the lemon pasta, right?

Starting point is 00:36:46 could you make me this lemon pasta from this website that you're showing me right now? Dead silence. And I'm really, look, oh, my God. And it's like this, like, and right here, we have just landed in the exact spot that has been bedeviling Apple for the last year that is bedeviling Alexa right now.

Starting point is 00:37:06 These systems are just very hard to make reliable. Now, I will say, the device was sort of having trouble connecting to my internet, everything else in my house was connected to the internet it was working fine but like this was just sort of every once in a while being like you're not connected to the internet so was that an issue with me was that issue with the hardware I'm not totally sure

Starting point is 00:37:25 maybe that was why it wasn't able to perfectly answer my question I do want to say that in case this was not actually an AI issue but oh man I within five minutes I was like get this thing out of my house and like and again I wanted to like I was excited about it and after five and after like

Starting point is 00:37:41 two days of ads for paper towels and one day of I'm not going to show you the lemon pasta, I thought, what am I doing with my life? Yeah, I should say, like, I have also had a bunch of, like, very bizarre and frustrating experiences with this thing. Okay, so we've said what we like about this thing. Yeah. Which remind me what that is again? Many of the new capabilities are quite cool. Yeah. Unfortunately, many of the old capabilities I relied on as the reason I used Alexa at all have become broken as a result of this update. Okay, so tell me about this. So one of the things you also notice very quickly when you're using this thing is that the latency is just like a problem.

Starting point is 00:38:17 It's a little slow to respond to questions. It's not as zippy as the old sort of pre-LLM Alexa. I understand that. These things have to go to the cloud. They're processing more complex instructions. It's all going to take a little time. I assume that will get better. The basic things that it gets wrong now include alarms, which is actually a thing that I use Alexa for.

Starting point is 00:38:41 every day. Wait, so tell me how it got it wrong. So the new Alexa Plus update seems to have broken Alexa's ability to reliably set and cancel alarms, which is a core thing that I use this product for. And so, for example, this morning, I woke up on my own a little bit earlier than my alarm, like 10 minutes before my alarm was supposed to go off. And so I said to Alexa, Alexa cancel the alarm. Silence.

Starting point is 00:39:09 Nothing. This is a command that I have issued. probably a thousand times. And Alexa Plus is a little smarter now and she's giving you the cold shoulder. Yes, and she's saying, actually, I'm going to wake you up anyway in 10 minutes. So that was not good.

Starting point is 00:39:25 I also experienced some, like, hallucinations when I would ask questions about, like, things happening in the world, things happening in the news. I asked it about a tennis tournament that's going on right now. I said, who's the top seed in this tennis tournament? It gave me the name of a player

Starting point is 00:39:39 who's, like, not even playing in this tournament. And it also, like, has trouble orchestrating the different tasks. So one of the things that would happen is I would, like, tell it to, I gave it a research project for a dinner playlist. I was looking for some new music to put on a dinner playlist. And instead of doing that research project, it just started searching on Spotify. Like, it routed the query to Spotify within the Alexa thing. and started playing the music, when what I asked was, like, do some research for me.

Starting point is 00:40:15 So it seems to have a little trouble figuring out, like, what exactly the user wants and, like, orchestrating the commands. That case seems like a little borderline to me. I can imagine some people, like, asking for that, you know, maybe being happy if it, like, played some music. But I had this almost opposite issue where, again, as I'm sort of going through, okay, what can this thing actually do?

Starting point is 00:40:33 And it's, you know, and it's like, you know, ask me what I can do. So I asked it. And one of the things it said was, I can help you explore Gen Z music trends, which there was just like something funny about the way it said it to me. So I was like, yeah, sure. Why don't you help me explore Gen Z trends? And, you know, thanks for a second. And then it goes, well, I found some podcasts about it on Amazon music. And I was like, I sort of assumed you were either going to tell me something

Starting point is 00:40:57 about Gen Z music or you were going to play Gen Z music. But now you're trying to sell me Amazon music, which I feel like is very consistent with how Alexa Plus handles everything, which is could we sell you a service right now? Could we sell you a product? So, you know, Kevin, I want to say two things. One is I have not used this product all that long. And so I don't want people to think about anything I'm saying as anything other than first impressions. Like, I have not truly had a chance to do the amount of reviewing that I would like to do.

Starting point is 00:41:25 Two, I'm very confident that lots of other people are probably having much better experiences with this thing. Because I think if most people were having experience as bad as me, I just would have heard about this before now. But all of that said, Alexa Plus did not make a great first impression on me. And the echo family of devices that are just little windows that let you send money to Amazon.com, they're not for me. Yeah, I had, I think, a slightly more positive experience than you. I did actually enjoy some of my interactions with Alexa Plus, but it just seems like it is not quite there yet. And I think Amazon knows this, which is why it's in this early access program. If you open it up, it says, you know, Alexa may make mistakes.

Starting point is 00:42:05 So they're sort of like doing all of the careful rollout that you would expect from a product that is not fully baked. But like some of the features just don't seem to work. There's another feature that I tried where you can like email a document to this email address and it will sort of ingest it into your Alexa and then you can have it summarize it. So I was very excited. I was like I can like, you know, I can learn about new papers in AI while I'm like doing the dishes. And so I email the paper to the Alexa email address. and I say, summarize the paper I just sent you, and it says, I did not receive a document. So I think they need to spend a little more time in the kitchen cooking this one.

Starting point is 00:42:45 But I think the overall, my overall impression is the like the Alexa Plus that you have now in this early access program. It is a little like having a kind of GPT 3.5 class model inside of a smart speaker, which I think is a valuable thing and one that I would like them to continue. to build on. But it is not the state of the art in either the language model or the kind of basic tasks. And actually, it seems to be regressing on some of the basic tasks. So I would say this is like two step forward, one step back. I think the most powerful thing that the new Alexa Plus has done for me is it has made me forgive Apple for not shipping anything with the new Siri. I get it now, Apple. I talked a lot of mess about you on this podcast about not shipping this

Starting point is 00:43:34 thing, but now having used one of your close rivals attempt to do the same thing that you're doing, like, I get it now. I think the finest minds in the world who are working on this stuff actually don't know how to do this yet. That's my big takeaway. Yeah. I think what's happening with Alexa and Siri right now is sort of a, it's like a symbol of what's happening in the American economy writ large, which is like we are trying to jam these like

Starting point is 00:44:01 new AI technologies into these legacy systems and processes. And it's just kind of a messy fit. Like, these things are weird. They are not deterministic. They are not reliable in the ways that, like, a sort of older, more rule-based thing could be. And they have these amazing capabilities. But when you try to, like, make these hybrid Frankenstein things with, like, the old

Starting point is 00:44:25 system with the new brain, it just doesn't really work. And I think that's, like, happening not just at these virtual assistants, but, like, in a lot of places throughout the economy. Absolutely. I also just think that when I'm using a chat bot on my laptop and it gives me something that's like 80 or 85% right, that's much more useful to me than like an Alexa response that's 85% right. Because on a chat, like in a chat bot setting,

Starting point is 00:44:51 I can just sort of take what I need. I can edit or modify it. I can maybe ask the same question of another chat bot and see if I got a slightly different or better result. like I feel much more in control of my own destiny. I can take the stuff that works and leave behind the stuff that doesn't. When you're doing this thing with a smart speaker,

Starting point is 00:45:09 if it doesn't work, you say, yeah, why do I spend 90 bucks on this piece of junk? You know? And it's like, and I think what I learned about myself was I have so much less patience for this sort of thing when it is a piece of hardware in my home that has made some really big promises about how it's going to help me with all my routines and everything,

Starting point is 00:45:28 well, if it's kind of hard to set up and it doesn't work like the vast majority of the time, it all just kind of feels like a waste. Right. So I'm very glad we got to exchange first impressions of Alexa Plus. We should also have a conversation with someone at Amazon who has been involved in this overhaul of their flagship voice assistant. So when we come back, we're going to be joined in the studio by Daniel Roush.

Starting point is 00:45:49 He's the vice president of Alexa and Echo at Amazon, and we're going to get into all of this with him. And you can ask about those ads. Oh, I'm going to. But first, the ads. Daniel Roush, welcome to hardfork. for having me. So Casey and I have both spent the past few days playing around with the new Alexa Plus, and I'd like to just start by asking about the technology that powers this thing.

Starting point is 00:46:40 Yeah. How much of it is a new LLM-based system versus the old, more deterministic model that powered the old Alexa? Yeah. Well, from an AI and model perspective, everything is entirely new. There are some legacy deterministic systems downstream, but really it's a complete re-architecture of everything that you would say Alexa is from the way you have a conversation and engage with the experience at a very basic level, you know, all the way through Alexa acknowledging you or just maintaining a chat. So there's a lot of new under the hood. Yeah. Talk about the challenge of moving to from this deterministic system to something that is very powerful, but also much less reliable. Yeah, I would say, well, hopefully you're not seeing much.

Starting point is 00:47:29 less reliable. I would say, you know, we've got some edges to sand and we're in early access. I'm sure we'll get to talk about the nature of the rollout, but. I just mean like in general, LLAs are not as reliable as a deterministic system. I get it. So, you know, we want to capture all the benefits of that non-deterministic. We'd call it stochastic system in the space. It has the elegance of really engaging in human conversation, but we want the predictable outcomes. Now, large language models don't support interfaces out of the box to classic systems. So getting those capabilities to interface, we would talk about it as APIs across these interfaces for other systems. It's quite hard.

Starting point is 00:48:09 They speak natural language. APIs don't speak natural language. They speak clunky computer science language, but it's very predictable and it gets a lot of things done. So I would say if you had to list the technical challenges, you would say, well, the many millions of things, we stopped counting at some point, but the many millions of things that original Alexa could do, marrying that with the power of LLMs is definitely the first and most prominent on the list. So take us back to when LLMs are first coming out. You guys are starting to play around with them. It's sparking ideas for you of, yeah, if we could like marry this to Alexa, we could have something really cool. What are some of the uses that you're thinking about? Like, what are the

Starting point is 00:48:49 kind of dreams that you have for this model that you're hoping you can bring into reality? I mean, I think it's, we think of the capabilities in two buckets, I would say. Take everything that Alexa, the original Alexa can do and just make it way better. You know, just picking up from what customers are already doing with Alexa. Then you start brainstorming and where I think you were really headed was what are all the new things that we can do. And the depth of conversation that you can have with the new Alexa experience just opens whole vistas of new kinds of things we can get done. We can help you plan a trip and then follow through on it. We can. watch for concert tickets for you. We can, you know, not just help you brainstorm about cuisine, but either pick a recipe and get some groceries and invite the neighbors or, you know, let your partner know it's date night and we're going out and book a table. So I think the kinds of journeys and the kinds of tasks we can get done for customers are just so much more expansive. So Casey and I have spent the past couple of days trying out Alexa Plus, and we have some feedback, which we can share with you now or later. We've talked about it on the show

Starting point is 00:49:54 just before this. I think it's fair to say we both had some things that impressed us about the new Alexa Plus and some things that were challenging, including some of the basic stuff that Alexa seemed to be very good at before, or at least that I know how to get reliable performance out of Alexa for that no longer seem to work as well. But what I actually want to know is, like, why has it been so hard to do this? Because back in 2023, when Amazon announced that it was going to revamp Alexa, sort of give it this brain upgrade with these new AI capabilities. They said this was going to be ready in 2024, and then that got pushed back a couple different times. So walk us through kind of the journey that you all have been on over there, trying to

Starting point is 00:50:37 sort of shoehorn this new technology into this existing products and maybe some of the challenges that you encountered along the way. Well, I'll tell you, you know, we should definitely get some of the feedback. We can cover as much as you like here on the show. So if you rewind the tape, actually, you were asking about this, too, as we're starting to experiment, what can we imagine doing? If you go back to 2023 in the models that were available then, the state of the art, very little instruction following, reasoning, low ability to execute on these interfaces with other systems, we announced something called let's chat, which was sort of a mode of Alexa. So think about flipping a switch on Alexa and turning on a chat interface so that you can do some basic question and answer. and have a discussion about a topic, mostly about knowledge native to the model's training data versus bringing something in at runtime the way that, you know, modern chatbots answer questions

Starting point is 00:51:33 by going out on the internet. I think what we mostly learned from that announcement and the customers that we rolled out to is that we just had to increase our vision and do something more audacious, basically. Customers really wanted, and we all really wanted, to pick up from where Alex is and was, and extend all of those capabilities. That is many millions of things that Alexa can do. And when you count the tens of thousands of services and devices that are integrated with Alexa and the space of the interfaces and the systems that you need to integrate with, it's

Starting point is 00:52:07 incredibly large. So that's the first sort of technical challenge I mentioned before. It's sort of the first and probably most important bucket. Second is really grounding it in authoritative sources. I think, as all of us know, you can sit there and fiddle with a child. bat bought long enough to press it into being smarmy or responding in ways that we don't believe are the way Alexa might act, for example, or press it to give you wrong information that's from some unauthoritative source or a mistake in its training data online shifts back to its native

Starting point is 00:52:38 training. So getting Alexa to speak confidently in her personality with authority and answer questions right, it's another key challenge. Personalizing an experience of this depth so that that Alexa is always learning from her interactions with you and extending your interactions so they get more delightful over time. I think this is something you probably wouldn't have seen in a weekend's worth of fiddling with the experience. You'll see it gets more personalized. That's another big technical challenge

Starting point is 00:53:04 because the surface area is so much bigger. So those are a few of the reasons why it sort of took so long. And I think if you rewind the tape to 2023, it's really about learning how big a project Alexa Plus would be and then starting to put one foot in front of the other, really inventing the space of creating those integrations because it just hasn't been done. What's an example of some early failure mode

Starting point is 00:53:28 that you all had to overcome? I mean, I've heard some stories from folks who have worked on Alexa or worked with suppliers who provide models to Alexa. They would tell me stories about, you know, you'd ask Alexa to set a timer for you, and it would write you an essay about the history of timers. It was just sort of misunderstanding the request

Starting point is 00:53:47 in the way that a large language model might. So tell us some of those stories. Well, that's a good one. I mean, verbosity was definitely an early issue. And it continues to be an issue on our podcast, by the way. We still haven't solved it. I've got some training ideas. Okay, good.

Starting point is 00:54:05 You know, verbosity, these models want to give you an extensive answer. Customers don't want an extensive answer read out, and they certainly don't want a disquisition on, you know, the nature of timers, right? What they want is an interface that, sets a spaghetti timer. And how do you get them to do that? Is it just as simple as putting in the system prompt? Like, if a customer asks for a timer,

Starting point is 00:54:24 like don't give them an essay on the history of timers or be concise? How do you actually solve that problem? I would love it if it were that easy. You need a set of models. There are over 70 models in Alexa Plus. It's a vast space. There are different models specialized in different tasks.

Starting point is 00:54:42 There are different corpices of training data we use on different models to get them to complete instruction. sets for us and really follow the rules of the road in interfacing with something. You always need to loop back to central systems that are maintaining context and the conversation and picking up from references and pronouns you've used to refer back in time, right, and sort of cascade those forward. But the amount of work that went into just the interface between a large language model

Starting point is 00:55:12 and the downstream systems that complete tasks is, I mean, it's the biggest body of work that we've put in, and there's, it would be too much to even try, without a whiteboard here, it would be too much to even try to explain to you and your listeners, I think, the technical depth that went into it. We've got a great team working on it, and it's hard. Of those 70 models in Alexa Plus, how many are Amazon's own in-house models versus models like Claude that you all get from external companies? There's a mix. So, you know, the best way to know what models in our Alexa Plus is just go to the bedrock webpage and look at an update on the latest in there. We use the best,

Starting point is 00:55:47 tools that we have available to us for the job. We've got great partners over in AWS, helping make sure we've got the right best tools for the job. Most of our traffic does flow through Amazon Nova models. We have the most control over how those get trained and tuned and post-trained. I think it's over 80% of traffic on sort of the main big inferences within the system, flow through Nova models. But there are many different reasons to use many different models. I think you guys know better than most that models are special, you know, they specialize in different things. So we use the best tool for the job. Can you give us a sense of like how big the team is that's working on Alexa? How big of a priority is this within Amazon? It's thousands of

Starting point is 00:56:28 people. Okay. Yeah. Now that's building hardware. It's building Alexa plus. It's integrating with all those systems. It's adding new integrations and new things that Alexa can do. It's a pretty, it's a pretty vast scope, so it takes a big team. Yeah. There was a former machine learning scientist at Alexa AI, Mihail Eric, who did a long post on X last year, sort of his version of a postmortem or retrospective on what was happening with Alexa. And he wrote that Amazon had, quote, all the resources, talent, and momentum to become the unequivocal market leader in conversational AI. But then he said that Amazon and Alexa had fumbled the ball because Alexa was...

Starting point is 00:57:14 quote, riddled with technical and bureaucratic problems. Sort of made it seem like the problem was not just that that technology was an uneasy fit, but that there were also some organizational and bureaucracy problems that had to be solved. Can you talk a little bit about that? I won't comment on that post in particular. Honestly, I don't remember it, but there is definitely I would say sort of a startup culture transformation happening within the Alexa team. I think, you know, the life cycle of any product that's been around for 10 years, right? It has ups and downs. But I think our rate of innovation had slowed down.

Starting point is 00:57:50 And I think coming through for customers on integrating these new powerful tools is something that's really quickened and inspired the team. I don't identify with the bureaucratic comment. Maybe it's a comment about me, so maybe I won't identify with it. I don't know. But I do think the team is inspired. it's inspired by the vision executing at an unbelievable pace and really, really creating a lot of invention because there are a lot of really hard problems. I'm curious where the new Alexa sits in relation to Amazon's overall AI ambitions. This is a company that has offered a lot of AI models

Starting point is 00:58:29 through AWS. Big, you know, market share in cloud-based AI. Also recently started an AGI lab at Amazon that is going to be pushing towards something like an artificial general intelligence. Is Alexa part of that overall effort to create and serve more capable AI systems, or is this sort of a consumer-targeted sort of spin-off of those efforts? I would say we do believe, and I share this belief the leadership team at Amazon has that this generation of generative AI is going to transform every customer experience we have. And that means, I mean, we have a lot of different types of customers. You mentioned AWS. We have enterprise business customers.

Starting point is 00:59:16 We have consumer customers. We offer a very big landscape of services. I know that at some point within the last year, we counted in it. there's over a thousand different AI efforts going on with consumer applications alone. So if you sort of look at the scale and scope of what Amazon does and, you know, assume our belief that every experience will be transformed with generative AI, it's as big as Amazon is at that point. I would also say just internally, it's part of how we work now.

Starting point is 00:59:49 To be as productive as you can be in this day and age and get as much done for customers as we aspire to. You have to build AI into how you're working. You both do this. I know that, and I'm sure many of your listeners do too, but it's certainly part of what's going on at Amazon as well.

Starting point is 01:00:04 Yeah. Okay, well, Daniel, we have some product feedback for you. Let's do it. As they say, feedback is a gift. Always. So we'd like to give you some gifts. And it's Christmas. Casey, why don't you start?

Starting point is 01:00:15 All right. Well, so, let's see. The, I feel like most of my feedback is less about like Alexa plus as an AI than it is about like Alexa plus in the actual like hardware that I got. I first started with the Echo Show 5, which does say on the website that it is Alexa plus enabled, but then some of your folks were like, no, like to get the full experience, you should get the 15. So I sort of had the two experiences. Okay. On the five, my first observation was that

Starting point is 01:00:47 after I told it I would like to see art, I feel like every time I looked over at it, it was asking me if I wanted to buy paper towels or Advil or something. That was a little bit less the case once I got the 15. I don't know why that might have been. But I felt like the Alexa Plus AI thinks of me primarily as a person who might send more money to Amazon if you just sort of gave me a few more ideas for how I might do that. And what I would love is if it evolved to treat me like a person who isn't like

Starting point is 01:01:19 constantly looking to buy paper towels. You know what I mean? So that I think is actually my biggest piece of feedback was I wanted fewer ads, fewer reminders that Amazon Music exists, fewer reminders that Amazon Prime Video exists, just like get to know me as a person a little bit. That's my big feedback. Subject line. Yeah. Enough with the paper towels.

Starting point is 01:01:40 Enough with the paper towels. If I say I want to see art, I really mean it. So I get it because you want to show everything that your hardware can do. You worked very hard on it. It can do many things. you want to like showcase all of those things but I do think it comes across as a kind of insecurity in the device like if we're not constantly showing you everything that we've built into this thing

Starting point is 01:02:02 you'll never discover it and you'll put this thing in a drawer I understand the pressures that you're under and I understand why it has evolved this way but I have to say when I unplugged it I felt more relaxed because it wasn't giving me a list of things to do and I didn't feel that way about my original Alexa

Starting point is 01:02:19 which is like great at the things that it does So I know that's a lot, but those were my emotions. The first one to me, the Echo Show 5 feedback sounds like a bug. I don't know what state I got into, but just if you asked for artwork and that's not what it was showing to, that one sounds like a bug. The latter part, it might just be, you have a different reaction than most of our customers do to the onboarding experience is maybe what it sounds like, or you're just looking for more diverse things. I will be curious to follow up with you in a week and find out if your use has helped shape. the nature of what we're showing you. Yeah. That is certainly our intention is that when you're onboarding to the new experience,

Starting point is 01:02:56 the types of things you're asking for are the types of things we're showing you. And that could be anything. Like one of my most delightful, we have a new element called For You, which is a place where we post little notifications about things we think you might be interested in. And I had been helping my daughter study the periodic table part of her chemistry final. And I wasn't never great at remembering in particular the elements that like you need a mnemonic for like lead or you know they don't match pb very good so you were good at chemistry obviously yeah nailed it very low latency on this you don't need them mnemonics but i had done that the night before and when i came in in the morning my for you said you know uh should we should we make a chemistry quiz for

Starting point is 01:03:40 ellie or something like that was like write a chemistry quiz for ellie and with the generative content capabilities of alexa plus i just said yeah let's try try that. Like, let's make, can we make a sheet of all of the elements that aren't intuitive? Now, did it also ask if you wanted to buy lead? It didn't ask me that. I think, which, you know, it's a product safety thing, so I'm glad we ticked that box. We will have to just look and see what, like, the extent of the Amazon services being, being shown to you. But, you know, I will tell you that the body of feedback that we get from customers doesn't accord with that specific version of it. Definitely,

Starting point is 01:04:17 customers want to learn what they can do. That is one of the biggest things that we hear from customers. I want to come back to what you said about, you know, unplugging the device and plugging it back in. We made the Alexa Plus experience incredibly easy to get out of and get back into and get out of, which is not true for sort of like an OS update, right? It's very hard to go backwards. And we worked very hard to try to make it possible because we knew that it would be so much change. The very high 90% tile of customers stick to the new experience. That makes sense to me. I mean, it's clearly much more capable. It can do more stuff. And I know it's going to evolve and presumably improve over time. So, yeah, no part of me was like, I want to go back to the old

Starting point is 01:04:57 experience. I was just like, wow, this is just like very intense. Honestly, I think the bigger ship that I experienced was going from like just a pure speaker to something with a screen. Like, that actually feels bigger than the change. I understand that. To piggyback on Casey's question, I think this is one of the big questions about the Alexa business model is whether you see this as something that is going to make money on its own or whether this is primarily sort of a way of increasing the amount of money that people spend on Amazon.

Starting point is 01:05:25 You know, I spend an ungodly amount of money on Amazon. Thank you for your business. Thank you for your business. A large fraction of my income is spent on various things on Amazon. And so I'm well aware of the many products that exist on Amazon.com, the website. I do not need, like, ads to be cascading on my screen

Starting point is 01:05:45 telling me to buy more stuff on Amazon. But it does seem like this is primarily going to be an ad-supported product. Andy Jassy recently said on the earnings call for their most recent quarter that you all were trying to bring more advertising experiences to Alexa Plus. So talk to us about that. And like, are we just going to inevitably be more annoyed at the number of ads that are showing up on these devices? I definitely don't think you'll inevitably be more annoyed.

Starting point is 01:06:11 I would say advertising is definitely part of the business. plan, but it's not the biggest part. It's actually probably the smallest part. The most important decision we made on the business side with Alexa Plus was bringing it into prime. And putting it into prime sort of brings together both all of a customer's prime benefits, because you might watch a video or listen to a song from Amazon music or use your Amazon Photos benefit, which is awesome with an echo show and review your family photos. I use that all the time. to look back at the kids in particular, but, you know, you have this long list of prime benefits. Alexa is a great place where they come together, and putting in the value of having this

Starting point is 01:06:54 world's best personal assistant into prime just turns the prime flywheel. And we know every time we've added a benefit to prime, customers use their prime benefits more, it's stickier to them, it provides them more value, and it turns into a great business. That's the goal. Okay, so Casey's feedback was about advertising. Okay. Mine is about some of these new features that don't work and some of the old features that actually don't work either. So some of the more complicated stuff that I tried with Alexa Plus, such as setting up routines that involve multiple steps, such as emailing documents or research

Starting point is 01:07:29 papers to the Alexa email address and trying to have it summarized these things. These things just didn't work for me. The routines didn't run. The papers didn't show up to be summarized. I assume this is just sort of growing pains and beta testing bugs and things like that. What I found more sort of frustrating that I wanted to ask you about, because I'm actually not sure why this happens, was that some of the basic features that Alexa had previously been good at and reliable at for me were less reliable with Alexa Plus. So this morning, for example, I tried to cancel an alarm that was about 10 minutes from going off, and Alexa just didn't listen, didn't hear me, the alarm went off anyway. So help me understand why that is. Is that like a hallucin,

Starting point is 01:08:11 of the model? Is that a problem that is sort of related to the orchestration of the various tasks and sending it to the right place? What is going on there? Honestly, we'd have to dive deep into each of those to figure it out. Early access is here as a program to cover off on these kinds of issues and to make sure customers know that they can opt into Alexa Plus. They can opt out if they want. Again, the very vast majority of customers stick to it. The key challenge probably in everything you said are that interface between the large language models and these more predictable rule-based systems that communicate through APIs. Something like canceling an alarm, making sure we find out the exact intent of what you were looking for, translating that into a set of commands and then issuing those commands to an API, sometimes it does fail.

Starting point is 01:09:02 It's rarely at this point because of hallucination, we've got so much going on to monitor for model hallucinations. it is sometimes because of just incorrect use of an API or just misunderstanding exactly where to send those commands. So that's more likely the case in each of these cases. Got it. I will give you one more piece of feedback, which is actually not for me. This is from my three-year-old son,

Starting point is 01:09:24 who is our house's most active Alexa user. He talks to Alexa all the time, probably more than talks to us. Should I be concerned about that? Maybe, but we'll save that for a later episode. But he was doing story time with this because he wants to, you know, constantly wants more stories about various vehicles, you know, various dinosaurs. And so we were doing a story time about super tow truck that rescues cars from the water. And he asked for another one, and it gave him a totally different set of characters.

Starting point is 01:09:56 So if there's some way for kids to have like a kind of, you know... Their own private cinematic universe? Persistent cinematic universes for super tow trucks. I know at least one three-year-old would appreciate it. I got it. Excellent product description, by the way. I like that for sure. I agree that children, as they explore, it doesn't even have to be an imaginary friend, but they do love themes and they love to continue them. So it's great. That's great feedback. We'll take that to the team. Yeah. For all of our feedback, I actually am very glad I've got to try this. I'm going to keep testing it. We are very active Alexa users in

Starting point is 01:10:31 my household, so we'll keep sending you our feedback. That's awesome. Yeah, we like trying new things around here. Yeah. Daniel, thanks so much for coming. Thanks, Daniel. appreciate your time, guys. Thanks a lot. Oh, wait. Did you just set off your Alexa? Oh, my Siri, stay out of this. Gosh, she's got a lot of nerve coming into this podcast recording. Wow.

Starting point is 01:11:07 Well, Casey, we've got some good news and bad news. The bad news is that our hat promo is coming to an end. So as of this week, the limited time offer to get a free hard fork hat along with a new annual New York Times audio subscription is running out. This is your last chance to get this very cool limited edition hat as a thank you for subscribing to New York Times audio. And these hard fork hats are only available to subscribers in the United States. The good news is that we are going to have more hats available for sale.

Starting point is 01:11:39 A different hat with a slightly different design is going to be available in the New York Times store. We're told pretty soon. So look out for that if you already subscribe to New York Times Audio or you just want the hat without the subscription. We are told that the next wave of Hard Fork hats will be available internationally as well. Hard Fork is produced by Whitney Jones and Rachel Cohn. We're edited by Jen Poyant. fact check by Caitlin Love. Today's show was engineered by Chris Wood. Original music by Marion Lazano, Diane Wong, Rowan Nemistow, and Dan Powell. Video production by Sawyer Roque, Pat

Starting point is 01:12:17 Gunther, Jake Nichol, and Chris Schott. You can watch this whole episode on YouTube at YouTube.com slash Hard Fork. Special thanks to Paula Schumann, Puiwing Tam, Dahlia Hadad, and Jeffrey Miranda. You can email us at Hardfork at NYTimes.com with a story about when you fainted. You know,

Hard Fork - GPT-5 Arrives + We Try the New Alexa+

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.