Big Technology Podcast - OpenAI’s New Model, Jensen’s Bold Claim, Alexa+ Is Here

Starting point is 00:00:00 Let's break down what the release of GPT 4.5 means for OpenAI and the future of generative AI. Plus, Anthropic also has a new model. NVIDIA's CEO Jensen Wong makes a bold claim, and Amazon introduces a better version of Alexa. That's coming up right after this. Welcome to Big Technology Podcast Friday edition, where we break down the news in our traditional cool-headed and nuanced format. We have so much to talk you through this week. It feels like this week, among many crazy weeks, has been one of the craziest. We have a new model from Open AI, a new model from Anthropic, a new Alexa, Nvidia earnings, and Skype is dead.

Starting point is 00:00:35 So it was a very, very promising week for a lot of companies, but not for Skype, which will forever live in our memory. So we will say goodbye to Skype at the end of the show. But in the meantime, joining us, as always, on Friday is Ranjan Roy of Margins. Ronchan, great to see you. Happy new model week, Alex. How have all these new models changed your life as of today, February 28th? Not at all, but we will talk about whether that will matter in the long term, because, of course, I put your, is it the model or is it the product question to OpenAI head of research, Mark Chen. We talked about it. And now you're going to get a chance to respond. But first, let's just break down the news because yesterday we had the release of course of GPT 4.5 OpenAI for the first time ever put a spokesperson on this podcast. We broke the news here with Mark, and now we're going to analyze what it means because we've sort of left the fog of war

Starting point is 00:01:32 and we have some perspective on whether this is disappointing for Open AI, whether this is promising for Open AI, and whether, you know, this means that generative AI can continue to progress or not now that we've seen some more reactions outside of Mark Chen saying, yes, scaling is still alive. So this is from the verge. OpenAI announces GPT 4.5. GPD 4.5 is the largest and newest large language model from OpenAI. It's going to be available as a research preview for chat TVT pro users to start. And here's like a weird thing though that happened. There was some documentation. We're going to get right to it right away. There was some documentation that Open AI released about this model and then removed. And it's very mysterious. They said GPT4.5

Starting point is 00:02:18 is not a frontier model, but it is OpenAI's largest LLM, improving on GPT4's computational efficiency. by more than 10x. It does not introduce seven net new frontier capabilities compared to previous reasoning releases and its performance is below that of 01, 03 mini, and deep research on most preparedness evaluations. OpenAI has since removed this mention from an updated version of the document.

Starting point is 00:02:45 So they did remove it. I don't think they disputed it, though. And I found what was interesting was, yes, this was a step change improvement over GPT4. it was not over the reasoning models. So you would think maybe you could build reasoning on top of this. We're going to talk about that in a moment and it will be even better. But for the meantime, opening eye has a new model that does not exceed the reasoning models in certain benchmarks and seem to admit that in a document. So Ron, John, you've been following along this whole

Starting point is 00:03:16 way. What do you think the implications of this are? This week had me thinking, I feel with iPhone releases in recent times, a lot of us have been saying, do we really need a big event to release every new iPhone? Certainly the 16E was not exactly the iPhone launches of yesteryear. I'm starting to feel like that with all of these large language model releases, Claude 3.7, GPT 4.5. Even as you're listing out all of the kind of release notes around this and then there's some ingredients that are, you know, not listed or these things are removed from the actual release documentation. It's not that exciting. It's not exciting enough to have to try to launch a live stream and get everyone, you know, hyped up around it. GPD 4.5, and we're going

Starting point is 00:04:09 to get into there's some elements of emotional intelligence or emotional quotients that are around it. Perhaps creative writing is a little bit better. Perhaps there's a bit of computational efficiency introduced to it. Even Sonnet 3.7, and I was trying this. Clod code is a pretty big release and a pretty big step change, but it's not revolutionary. So I think a lot of these companies have gotten caught in this hamster wheel of needing to do these big model launches. And there was a time where the step change was so big that it was actually exciting for all of us. But now, I think 4.5 is probably the least interesting model release from OpenAI to date. Because even 01 and adding reasoning models to the overall suite was a pretty big deal.

Starting point is 00:04:56 4.5, I still cannot tell you what the big deal is. Maybe you can tell me. We are going to get some commentary from Andre Carpathie about this that he put on Twitter yesterday, which does answer that point. But even from Open AI itself, there was some very interesting. communication, shall we say, around this model? So Sam Altman came out with this tweet and he said, GPT4.5 is ready. The good news, it's the first model that feels like talking to a thoughtful person to me. I've had several moments where I sat back in my chair and had been astonished at getting actually

Starting point is 00:05:29 good advice from AI. The bad news is it's giant, it's expensive. We really wanted to launch it to pro and plus users at the same time, but we've been growing a lot and are out of GPUs and we'll add tens of thousands of GPUs next week and roll it out to the plus tier then and there's hundreds of thousands coming soon so I'm pretty sure you'll be able to use it once we can rack up. This isn't how we want to operate

Starting point is 00:05:52 but it's hard to perfectly predict growth surges that lead to GPU shortages. Remember, chat GPT has gone from 100 million to 300 million users in a very short amount of time. Heads up, this isn't a reasoning model and won't crush benchmarks. It's a different kind of intelligence

Starting point is 00:06:08 and there's magic to it. I haven't felt before, really excited to have people try it. Look, it's very interesting because, again, we're going to go into my interview with Mark Chen very quickly. But Mark, I was like, you know, hey, listen, does this show that, you know, we're getting diminishing returns from scaling? And he said, absolutely not. But then you have these endorsements from Altman and it's fairly muted.

Starting point is 00:06:30 So it makes sense of that for me, Roncha. The world's greatest product marketer cannot market his own product. I mean, it's not as great at some things, but trust me, there's this magic which I felt, but I'm not going to actually tell you what that magic is. I think it kind of, it cap, actually his statement really captures the overall feeling I have of 4.5. It just tries to put a positive slant on it. But I think that's exactly it.

Starting point is 00:07:01 They have to keep pushing new models, new narratives, pushing towards G. P.T.5 whenever, if and when that will come. But to me, they need, and actually, this is going to get back to our product versus model debate. They need to show more product. Again, operator deep research. Those were exciting moments. 4.5 as any kind of announcement is not incredibly interesting to me. You had asked Mark Chen during your interview, you know, what are the new use cases or what are the use cases where this will be better at? And I was actually sitting there waiting with bated breath ready to hear, okay, this is how this is going to help me or other people. And there was a somewhat generalized answer around how with creative writing tasks,

Starting point is 00:07:52 whatever that might mean, this is better. And that was kind of it that I got out of the interview. So I think, and which lines up with the whole idea around emotional quotient, emotional intelligence, more creative writing, more thoughtful answers. And I've seen a lot of examples of out there of 4.5 answering and being a bit funny and people saying, this is the first time AI has made me laugh. But if you're just trying to get a little bit more grocky with your model, I don't know. That doesn't seem like that's going to fill in that soft bank valuation for me. Yeah, look, I didn't find a grocky at also because, you know, as we've talked about on the show, rocky but you know more on the the trying to make it funny or interesting as opposed to just

Starting point is 00:08:39 giving you information so having experimented with it you know as i was going to say both of us have paid for that $200 a month upgrade because we wanted to try deep research and i guess mine is still live i think yours just dinged so but i'll say that i spent a good amount of time chatting with gpt 4.5 yesterday and what they're saying is real like it is definitely much more pleasant to talk about it. And I spoke with Mark about this a little bit yesterday. The responses are shorter. They're more human-like. Like, it doesn't feel the need to, like, print out, you know, a master's thesis for each answer. Like, you can actually have a back and forth with it. And it was actually one of the more enjoyable conversations I've had

Starting point is 00:09:20 with with a bot to date. That's, okay. I'll give you, that is a good point. If the big functional change is, we've all gotten very used to this idea. that you, you know, query a chat bot, and you get this really overly thoughtful answer that tries to both hedge itself from any kind of safety consideration and, you know, lists out 10 bullet points and, as you said, a master's thesis. So maybe there is something very important there, where it actually starts to be able to answer you correctly in a concise way, in a more conversational human way. Maybe there's something there.

Starting point is 00:10:02 But to me, again, why not just put that out there in the model? Why have a big event around it? Why make a big press push around it? Why not just put that in the product? Well, here's why I would say it's important to do that, is because, and this is what Mark was saying, that you have linear progression of the model's capabilities. Based off of what you predict, if you put this much compute in it, you get this much output. And I think Open AI is saying that this 4.5 is the next step. on that progression and it's met with the amount of compute that they've put in the benchmarks that

Starting point is 00:10:37 they've expected to hit. And that's why I said to him, did you find the scaling wall? And he said GPT 4.5 is really proof that we can continue the scaling paradigm. So basically, I think that is sort of like that is the march. But I also think it's important to kind of talk about like what it's going to feel like to all of us. And then this gets to Carpathie's comments. And it's basically here he describes really well the progress from the original models because you you that it is it's going to feel less as you get better so he says uh gpt1 barely generates coherent text gpt2 was confused was a confused toy 2.5 was skipped straight into gpt 3 which is even more interesting and gpt 3.5 crossed the threshold where it was enough to actually ship a product uh and sparked open a i's chat gpt moment

Starting point is 00:11:30 He says, I went into testing our GPT 4.5, which he's at access to. And he says, everything is a little bit better and it's awesome. But not exactly in ways that are trivial to point to. Still, it is incredibly, incredibly interesting and exciting as another qualitative measure of a certain slope of capability that comes for free just by pre-training a bigger model. He says, we actually expect to see improvement and tasks that are not reasoning heavy. And I would say there are. tasks that are more EQ as opposed to IQ related and bottlenecked by world knowledge, creativity, analogy making, general understanding, and humor. So these are the tasks that he was most interested in during his vibe checks, basically saying that like you use this model. It's a little bit better. And that matters a lot because we've already come so far from the barely coherent part to where it is today. I think I'm going to nominate you as the new product spokesperson for open AI because I think you just convinced me right here. I think you just turned my entire view of 4.5 in this moment. So basically, I've been talking a lot about AI has a branding problem. The idea that people

Starting point is 00:12:41 say that's written, quote unquote, written by AI. Everyone has this really narrow view of what AI text generation is. And that's because of this very dry, weird, almost inhuman way that it responds to you. And every model, whether it's Gemini or Claude or ChatchipT, everyone has this view of this is what an AI response looks like. So actually, if the real advancement here is it can move beyond that and make things more human and conversational, that actually could be very interesting overall in terms of getting people to use these products. So I think if that's the real change here. I'm surprised that they didn't hone in on that, that this is going to be what takes ChachyPT to the next 700 million people outside of all early adopters and makes people

Starting point is 00:13:36 comfortable and happy with it and makes AI much more natural within all types of mediums and channels and outputs. If they positioned like that and if that's what's really happening here, that is kind of exciting for me. I think that is how they're positioning it. They've They are talking about the fact that this has great AEQ, and that is where they want to seem to focus people with this release. And you look at some of these benchmarks, and so I'll just read a few of them, simple QA accuracy. GPT 4.5 has 62.5% compared to, let's see, 47%, the closest model, which is opening I.01. the hallucination rate is 37.1%. Again, lower is better. GPT4O has a 61.8% hallucination rate, which seems high. So those are like the standard benchmarks. But then you get into the everyday queries.

Starting point is 00:14:35 And they say that for everyday queries, people prefer 4.5, 57% of the time over GPT40. For professional queries, they prefer 63.2% of the time. over 40 and for creative intelligence 56.8% over 40. So that's not nothing. No, I think if Kantrowitz and Roy were behind this marketing campaign and launch, we could have just come up with a simple make AI less AI. What about that one? Something just pushing the idea that that's what this is really about, not getting caught up in the scaling loss side of it, the compute efficiency side of it, and really saying this is the first model that makes AI less AI. It makes more people feel comfortable using this on an everyday basis.

Starting point is 00:15:28 I think that I would have been, it would have been a little more exciting for me. Definitely. And so there's a very interesting debate that's gone on about like where did it get this more EQ-oriented positioning? Was it pre-training? Like, is it because of its abilities or was it post-training where like they just added this personality after the model was built. We don't fully know. And actually, if I was going to have one question that I'd want to ask Mark Chen, if I could get him on the phone for like another five minutes, it would be that question. And I feel bad, having left that out yesterday. But I have

Starting point is 00:16:03 seen some very interesting debates about it over the past couple of days where there's this one, Princeton academic, Arvind Narayan. He says, apparently the main thing we're getting with GPT 4.5 is an exchange for 30 times percent price in exchange for a 30x price increase is fuzzy stuff like IQ. The ironic thing is this is an aspect of behavior, not a capability. My bet is that any difference in EQ are due to post-training, not the parameter count. Okay, so that's an interesting thesis. Ethan Malik from Wharton slides into his mentions. And Ethan Malik, of course, he's a professor. He's been on the show. He's been pretty good at sort of following the pulse of AI. He's pretty positive. So he tends to take the sunny side.

Starting point is 00:16:47 of things, but he says, disagree on this one. Stuff like theory of mind or EQ are deeply rooted in abilities, not behavior in humans. And I would bet the same for AI. But again, we don't know yet. So basically, if this did come out of like just training the model, making it more able, and then it all of a sudden produces like a more human style of communication, I think that's pretty interesting. Well, yeah, I do think, and I was thinking about this as well after reading these, on one hand, it could be essentially kind of a party trick. It could be more instruction level after the actual core training where it's just, you know, speak in this voice, give concise answers, try to lean your behavior towards a certain way. I think that would actually be very

Starting point is 00:17:36 sad and be, you know, like, because that would be easy. What Ethan's saying, I think, is the more interesting part. And I have to say, if it's open AI doing this, I have to imagine, imagine for this kind of product and model, they're not going to be going the party trick route and genuinely changing the way the model thinks and produces knowledge would be a very big deal, as Ethan saying. But again, we don't know what that means or what it looks like. Is it in the supervised fine-tuning layer? Is it in the base training layer? We don't know. I'm actually surprised. Yeah, we got to find out. You got to ask Mark Chen again. Because to me, Again, that is the really interesting stuff they should loudly be talking about rather than

Starting point is 00:18:22 Sam Altman just saying it's kind of magic and not giving us anymore. Exactly. And so there's been this other thing that's happened, though, which is that people have taken a look at the evaluation scores and have noted that this is not as good as reasoning models in a lot of different fields. So I think we should talk about that because it has been used as a discussion point about whether open AI has lost the magic. So let me just go through some of these, you know, whatever they're going to mean to, I'm just going to read them out. So there's GPQA, which is science, 4.5 gets a 71.4% compared to 79.7% for OpenAI 03 mini. So it's down by 8, 7,8 percentage points there. There's AIME 24, which is math, GPT 4.5, 36.7.7.7.

Starting point is 00:19:13 percent compared to 03 mini 87.3 percent less than half as performance. It's amazing. It just beats on this multilingual test and it is a little bit, no, it is a little bit better on one coding benchmark and then a little bit worse on another coding benchmark. But basically people have taken this and I think this was also something I saw afterwards. I was like, oh dear, you know, like there is reason, these reasoning models are outperforming this. in a lot of benchmarks. And I think we should say that the reasoning models use the intelligence of these standard models

Starting point is 00:19:50 and they learn how to attack things step by step, which is like, yes, the reasoning models are doing the things that they're supposed to do. And it just shows you how impressive the reasoning is. But then there's also just like, why is it lagging? People have been like, all right, that's really disappointed. Here is, let's hear from trustee Bob McGrue, former chief research officer at OpenAI that always seems to hop

Starting point is 00:20:10 in the discussion at an opportune time. He says, don't be disappointed that GPT 4.5 isn't smarter than 01. Scaling up pre-training models. Pre-training improves responses across the board. Scaling up reasoning improves responses a lot if they benefit from thinking time and not much otherwise. Wait to see how the improvements stacked together. I think this is really important, right? It's that this 4.5 is going to be the basis of the next reasoning model that Open AI is going to put out.

Starting point is 00:20:40 And I think Mark hinted on this is that GPT. will bring both of those capabilities together where you're going to have the smarter basic foundational model, which is going to be GPT5 or something built off of a GPT 4.5, and then you're going to add the reasoning in, and then it should even further outperform the stuff we're seeing with like 01 and 03. So what do you think about that? Yeah, trusty Bob McGrew, making sense of it, I think. That makes sense that building this more apt, able, emotionally intelligent foundation model

Starting point is 00:21:11 and then building, you know, incorporating that with the reasoning model and ideally that getting us to GPT-5 seems like something ambitious enough to actually push forward on. I guess I still have such a difficult time again, though, when we're looking at GPQA benchmark AIME-24, even when we're looking at what you had showed earlier, there's like on everyday queries that GPT 4.5 beat 4.0. by 60%. What does that actually mean? What, like, where, what does that look like? What kind of real life problems? Because I'm so fascinated by what is an everyday query in one of these tests that if you have an AI researcher creating a benchmark, what is their everyday query versus your or my everyday query? Like, I think that that's the part that still worries me about open AI that so much focuses on that research house part of it and the much much like the very, very research-oriented approach to all of this, going back to product versus model.

Starting point is 00:22:18 But it feels like we're still locked in that rat race here. Okay, well, that just takes us to our model versus product question again, because I did bring this up to Mark, and I said, all right, you're the head of research at Open AI. You're a model guy. So just like, I am trying to figure out how to argue this to Ron John. Maybe you can help me figure it out. And he did. He gave an explanation, basically saying that.

Starting point is 00:22:41 As the models get smarter, these products, like, for instance, deep research, get smarter. We talked last week about how if they're hallucinating, they become useless. So the less hallucinations you would imagine, the better, maybe unless you're Benedict Devin's, who wants zero hallucinations. So I'm kind of curious to put that to you and get your thoughts on what it means. I listened to it, and it still felt like a relatively generalized statement for something that shouldn't be a generalized statement. Even going back to what are the real use case, is it creative writing? Is that really what you're pitching me with 4.5, that it's going to be better? Is it everyday users will have a better experience with a chatbot and feel more comfortable?

Starting point is 00:23:26 Is it AI therapists are going to get a lot better because now it can actually talk to you in a more emotionally connective way? To me, that's the part that the hallucination rate side of it, I think, obviously matters, but if the idea is, it's like, to me, the 99% versus 98%, versus 97%, for most AI use cases in the world, I think will probably be okay. To me, again, it's more, it still doesn't answer that question. Like, deep research can get better and better, but does that mean financial analyst will actually trust everything that they is put into a deep research report in a week, in a month, in a year? What does that actually look like? Yeah, I think we still don't know. I mean, we can definitely say for sure that like improving

Starting point is 00:24:20 the model from GPT1 to where we are today has mattered. But I think that the question is, yes, what are these incremental improvements going to really lead to? And like, yeah, I mean, Mark was like it's all about getting to the frontier of knowledge in AI, the smarter these things are, the more they can do, just like a smarter human can do more. I think it's great. I love that they are pushing the cutting edge on this and that every AI lab is trying to get, push the cutting edge on it. But I, you know, I cannot staunchly sit in my position for much longer unless I see some tangible outcomes from this. But anyway, I'll still be on team model for the time being. All right. So it makes for better Fridays, knowing we still have product versus model.

Starting point is 00:25:12 Yeah, I'm not, I don't really see myself going away from that position anytime soon. I want to see the better models. Thanks for shipping the better models. I'm waiting for GPT5 to show up and magically solve every use case perfectly. And I will eat my hat, whatever, whatever one does on that day. Well, the last thing I'll say about Mark is I did say, like, so aren't you setting expectations too high? And he said, I don't think so. Okay, all right, Mark. GP5, baby. I don't know what trusty Bob McGrew would say about that, but let's see.

Starting point is 00:25:47 All right. So now that I've become the sort of de facto product spokesperson for Open AI, let's go to Gary Marcus. Because I feel like we should talk about, you should at least give some time to those who've said, actually, that GPT 4.5 launch shows that OpenAI is toast and sort of discuss their points. And one of those people are Gary Marcus, longtime critic, former, well, he's been on the show as well. I'm sure we'll have him back soon. He messaged me on LinkedIn after he saw my Mark Chen interview and said, allow me to give a rebuttal. I said, all right, send me something.

Starting point is 00:26:23 We'll read it on the show. I haven't got anything back, but I will read a LinkedIn post from him and we can discuss it. So he says, Open AI is in serious trouble. They still have the brand name, a lot of data, and tons of mostly unpaid users. But GPT4.5 is usually expensive. Even so, it offers no decisive advantage over competitors in zero moat. Scaling hasn't gotten them to AGI. The GPT5 project was a failure.

Starting point is 00:26:46 There is already starting to be in, is that all they have reaction, including from some people who've said they've have to adjust out their prediction of when we hit AGI. He said, Deepseek led to a price war that cuts potential profits. There is still no killer app. Open AI is still losing money on every prompt. A bunch of investment turns to debt if they can't make the transition to the nonprofit fast enough. And Elon has perhaps up the cost. Many, many top people have left.

Starting point is 00:27:13 Some have started serious competitors with similar IP because opening eye's burn rate is so high. They have limited runway. Microsoft no longer has their, fully has their back. Altman's credibility has diminished sorrow went nowhere. Whatever lead they had two years ago has been squandering. and if Masa changes his mind, they will have a serious cash problem. And Elon is right that they don't have all the money for Stargate. Man, what do you think in responding to that list?

Starting point is 00:27:40 What takes Ed Zittron about 5,000 words to write? I think Gary Marcus did in about in one tweet. Actually, that reminded me of SORA that it exists, which I played. Have you used it recently? I have not. The text to video or image to video model, that one definitely went nowhere, could have been a good product demonstration. I think overall they are making this bet. It's what we keep talking about. But it has to be, GPT5 has to be, oh my God, this solves everything. Like this is where there's no hallucinations. It's reasoning. It's a huge foundation model. It's relatively low cost somehow. I think it really, the way they're positioning their entire business is that it's going to be the kind of silver bullet to everything. Otherwise, I really don't see, again, the zero moat part of it, you're seeing more and more, which is to me maybe that is why they push so hard on these constant model releases, because they have to stay relevant.

Starting point is 00:28:49 Because the moment they're just an API in the background, then you're the most commoditized thing imaginable. and then that will kill you anyway. So that's a pretty compelling case right there. Yeah, and I think that one point I think that I should make here, and I did speak to one more point about the market interview. I spoke to him about starting and stopping. And he said that's a normal part of training any model. But if you're starting and stopping on a model that's this expensive to make,

Starting point is 00:29:16 then your costs go way up. So I think Gary is right that the errors or whatever, the changes, the tweaks that you have to make become very expensive tweaks. when you're starting to work on projects this size. Yeah, the cost of the model training. And we're going to get into how Anthropic, supposedly the new Claude was much less expensive. Deep Seek, we know, whatever, whether it was 6 million or 60 million or whatever it was, was significantly less expensive.

Starting point is 00:29:46 I think overall, you have one side of the industry showing us that it actually can be cheaper and cheaper and cheaper, but then those with the best interest, number, Open AI's competitive advantage could be talent to an extent, even though a lot of talents left, they have a pretty deep bench that's pretty impressive. Or it could be resources, cash, and access to compute. So they almost have to make that their game, because if that's not their game, they're not going to win. If that's not the game, they're not going to win. Yep. So we talk about the competition. You teased Anthropic. We have so much more to talk about, including the new Anthropic model, what Jensen Wong has talked about, how expensive reasoning is,

Starting point is 00:30:28 and Nvidia earnings, and of course, the new Alexa, we're going to do that right after this. Hey, everyone, let me tell you about The Hustle Daily Show, a podcast filled with business, tech news, and original stories to keep you in the loop on what's trending. More than 2 million professionals read The Hustle's daily email for its irreverent and informative takes on business and tech news. Now, they have a daily podcast called The Hustle Daily Show, where their team of writers break down the biggest business headlines in 15 minutes or less and explain why you should care about them. So search for The Hustled Daily Show and your favorite podcast app like the one you're using

Starting point is 00:31:02 right now. We're back here on Big Technology Podcast Friday edition talking about all the latest AI and tech news, including the fact that Anthropic has a new model. Jensen Wong has a stance on how much compute reasoning uses and the new Alexa. And by the way, Skype is dead. So let's see if we can get to that all in the second half. The first is that, GPT 4.5 wasn't the only model here. We have Anthropics Claude 3.7 Sonnet. It's here. This is from TechCrunch.

Starting point is 00:31:31 Anthropic is releasing a new AI frontier model called Cloud 3.7 Sonnet, which the company designed to think about questions for as long as users want it to. So like we've been talking about, there is, it's a hybrid AI reasoning model, a single model that can give both real-time answers and more considered thought-out answers to questions. and you just choose, do you want the quick response, or do you want the thinking response? And the model represents Anthropics' broader effort to simplify the user experience around its AI products.

Starting point is 00:32:01 We're longtime clawed heads, I would say, on this show. I've gotten a chance to use it. You've gotten a chance to use it. I believe the thinking toggle that we talked about is pretty good. It's almost as good as Deep Seeks. What is your response that we have another model from Anthropic and the fact that we went not from three, to four, but from 3.5 to an incrementally better, 3.7.

Starting point is 00:32:27 What about 3.6? I was waiting for 3.6. That was going to be the big one, but we just skipped straight ahead to 3.7, baby. I think I've been using, as a clodhead, I've been using 3.7 regularly. Again, from the model side, it doesn't, the thinking toggle mode, which I'll still categorize a bit as product. Maybe that one lives between product and model is good. Claude code is definitely a very new offering from them. And I think, like, is going to be very interesting because still coding to me is the most monetizable direct to actually productive use case for generative AI as of today.

Starting point is 00:33:12 So I think the way they approach this is kind of how I want these model launches to be approached. there's a blog post there's some tweets you know there's there might be an explainer video here and there and that's it and and we keep getting improvements and as we wait for 3.9 maybe not 4.0 because that's a guy probably 4.1 4.0 is aGI so yeah so I will say just having we're going to get into how they trained the model because it's interesting but I will say I did an experiment this morning where I've been using I mean I think I I've mentioned this on the show. I've been using Claude every day as a diet coach

Starting point is 00:33:51 where I basically write down the meals I've had, weigh in, will give me how I did a letter grade based off of the prompt that I gave it about the way that I want to be eating. And it will like count up the calories and grade the foods. It's very good and it has lots of memory. And so I like, I copied the history, which goes back like probably a month and a half at this point in the latest chat and dropped it into OpenAI's GPT 4.5,

Starting point is 00:34:15 Claude 3.7 reasoning and deep seek. And unfortunately, I'm here to report that Deepseek did the best job of all of them. You didn't try GROC and have it yell at you and make fun of you for a... I'm good on that. Thank you, though. I think, see, that's like, to me, I want that as its own standalone benchmark. The Alex, what did I eat benchmark that is the leading benchmark for all frontier models going forward. I mean, that's the real life stuff that's actually interesting to me to, to, I do that very, regularly. I'll have three tabs open, try the same question across three and see what I get a

Starting point is 00:34:53 good answer on. Those are the use cases and the kind of ways that I think everyone, all of our listeners, like approach these models in that way, try different models and try the same question, just see what happens and see what you like better. I think that's the real way to try to decide what's really happening in terms of progress here versus the more theoretical stuff. Can I just say one of the big takeaways for me this week is just that reasoning is just freaking unbelievable like it's a true breakthrough and when you use those models you just get better stuff and that to me has been sort of like i think discounted i think in some of the conversation here but not in our show i think we've already always talked highly of reasoning but in the broader

Starting point is 00:35:39 like the walls hit opening eyes toast like reasoning is both useful and better i'll agree it's better, but it's still better for many things, but not all things. Again, simple queries of back and forth, analyze this text, something that's like where all the information is right there in front of you and doesn't require a great deal of complexity. You don't need reasoning for that, and that's more expensive. Or that's more complicated or it's more time consuming. So I think I agree, I'm still wowed, but there's also the UI element of that that Deepseek, again, listing out the questions of the chain of thought as they're coming up was the kind of using the term party trick again. But it's just UI feature that makes it so much more real.

Starting point is 00:36:28 And now everyone's doing it. It's amazing how quickly, sometimes it's almost annoying now on chat GPT where it starts walking me through what it's doing. And now it's thinking when I actually don't want it to when I'm like, I'm good. I'm good. Just give me an answer. I'll take a little, I'll switch another tab and come back. But it is like your very talkative friend. who's like, let me tell you exactly how I got to this. And you're like, no, it's good. We're just going to go with your answer. I'm going to start on this tab and then I'm going to go here.

Starting point is 00:36:57 It's almost, I want like the log afterwards if something's wrong to go back. But we've talked about this before. The problem that remains is if something is broken in that reasoning process, you can't simply fix it. It's not like I go back and I'm like, okay, on step three of eight, I would rather you have done this than this. That does not exist yet. So at that point, the reasoning is nice, the show of reasoning, but it's not actually, you can't utilize it in any meaningful way. That's fair. So Ranjan, I want you to talk a little bit about this cost efficiency that Anthropics seems to have found in training 3.7 because I think that that's pretty significant when we think about.

Starting point is 00:37:45 how these businesses will operate and whether they need to spend as much money as they are training their latest models. So on the cost side, 3.7 saw it. It apparently costs just a few tens of millions of dollars to train. We already talked about deep seek. I think, again, it goes back to showing like what are the real costs involved? There's gathering up some large amount of data. If it's a reasoning model, there's the supervised fine-tuning side of it. There's a reinforcement learning side of it, that could involve bringing lots of humans. And again, that literally is like, what is the correct way to get to this answer? Is the answer correct? What rank these outcomes and actually going through hundreds, thousands, tens of thousands of

Starting point is 00:38:32 times and training the model that way? Obviously, that's time intensive and it's expensive, but I think it's important to recognize that even Anthropic, who has kind of been in the whole big models, expensive models game so far, the fact that they are moving towards this, it almost means, I guess Open AI is probably the only player left that's still kind of trying to sell. You need big expensive models to win. So then what do you think about this comment from Jensen, where he talks about now we're going to go to reasoning and inference and that's going to be. be more expensive. So Nvidia earnings came out this week. So they had revenue jump 78% from a year earlier to 39.33 billion in the quarter they're projecting 43 billion in the next quarter. They delivered 11 billion of their Blackwell chips. So life is good for Nvidia, but everyone's getting the sense as to like, how is your business going to look if we get more efficient, if we go

Starting point is 00:39:33 toward these reasoning models? And this is a very interesting state. from Jensen Wong, where he says, AI has to do 100 times more computation now than when chat GPT was released, basically talking about how the reasoning approaches are more expensive. Next generation, AI will need 100 times more compute than older models as a result of new reasoning approaches that think about how to best answer questions step by step. The amount of computation necessary to do that reasoning process is 100 times more than what we used to do. So it is interesting to me because, I mean, you look at what Deep Seek did and they found a way to not only do reasoning, but do it more efficiently. And Jensen is saying this thing that seems to

Starting point is 00:40:19 disagree with this a little bit. Well, I'm curious what you think, Ron John. I mean, never to speak ill of Jensen Huang. I think he's saying what he needs to say. I mean, if the thesis that things are going to get much cheaper and require less compute holds. We could have the Javans paradox, which I haven't heard in a little while, but we all heard about that one week. Again, the idea that the more ubiquitous AI would get, because it's cheaper, would actually require more aggregate compute. But it's, I mean, it still feels like Nvidia has to tell that story. And I'm, again, the company blew out numbers again. And even though it's getting caught up in the larger stock market route as of today, but this is still an insane company in terms of its ability

Starting point is 00:41:10 to produce and deliver. It still hurts their longer term story, at least with the expectations that have been set by the market. Yeah, I mean, it's just one of those things where I'm like, I see his logic and I see where he's going, but I don't really see how, I mean, yes, I mean, they've talked about how inference is 40% of their revenue, but I just don't really see how it's going to cost a hundred times more to do reasoning. Maybe I'm missing something. No, I think it's very difficult to try to calculate out because even, I guess, the more complex the use cases get, and maybe we'll start unlocking use cases that we haven't even imagined

Starting point is 00:41:55 or AI is going to be applied to areas where we haven't even started to, and those will be the ones that really soak up all that compute. But I agree with you that the idea that it's going to require a hundred, it's going to a hundred times more compute, especially as the trend is everything's getting cheaper. It doesn't make sense to me either. Let me ask you this one thing that I saw from earnings. I'm curious if you think that it's right. I mean, the fact that they shipped 11 billion in Blackwell chips, the expectation was like three and a half billion. So clearly there's a huge amount of demand for the Blackwell chips, which are the latest generation of Nvidia chips. All the hypers are saying will take as much as we can get,

Starting point is 00:42:36 including Andy Jassy at the Alexa event this week. Does that show that there's already enough tangible process? Sorry, does that show that there's already enough tangible progress within AI that merits this further investment of chips? Or do you think we're just still in the finding out phase? You never want to be in the find out phase. I feel because I think, We all know what happens after. But I think from the hyperscaler side, it's still like no one backed down from actually Microsoft a bit seemed to hedge. And I believe there's some reporting that they're canceling some data center leases.

Starting point is 00:43:14 But overall, the hyperscalers are playing the same game that we're going to, it's an arms race for a compute and we're going to continue down this road. And we're going to get into the Alexa event. maybe it does start to seem like the more complex Alexa gets. If every single person who has an Alexa is actually actively engaged all day with Alexa Plus, then you start to see that, okay, it's going to require a lot of compute. So if it really lands in the way that it's being promised to, maybe that does make sense. But I think as of today, it's just everyone is to, all the hyperscellars are taking

Starting point is 00:43:55 the exact same bet right so we're still in the like scale up infrastructure and maybe this will work not and the this is working enough that we're going to keep investing exactly so i think that i mean i am still i'm still bullish on invidia but i think if you take this kind of like dubious proclamation about reasoning being cheap being much more expensive combined with the fact that like yes they're still ordering but you know there's a big if at the end of the tunnel i i do wonder a little bit like if there's like a potential nasty surprise for invidia coming in a couple years no one ever won that prediction in the last few years at least but i'm not i'm not disagreeing with you but it's one of those things that it's almost i'm almost like fearful of saying out loud yep i'm sure

Starting point is 00:44:42 i'll eat the words there and maybe mark Zuckerberg will be the one that continues to uh keep invidia running taking all that ad money and pushing it right into this chip and server uh company right because now Facebook is going to potentially spin off a meta-a-I app in an effort to compete with OpenAI's chat chip-chip-T. This is according to CNBC. Meta-I will soon become one of the social media company's standalone apps joining Facebook, Instagram, and WhatsApp. The company intends to debut a meta-AI standalone app during the second quarter.

Starting point is 00:45:14 Of course, they're going to have all the app install power that you have on Facebook to get people to use it via their ad slots and new slots they're going to put in. Mark Zuckerberg is really intent on basically taking over OpenAI's lead with ChatGPT. He sees it. I've talked about it in the past. He sees this as a big consumer app. He sees that it's growing fast. And he doesn't want somebody else to do it.

Starting point is 00:45:35 Same way that he sort of cut off Snapchat and did it with some success against TikTok and Reels. Very funny response from Sam Altman when he sees this. He says, okay, fine, maybe we'll do a social app. He says, it's funny if Facebook tries to come at us and we just uno reversed that. him. It would be so funny. I mean, I think he's, you get that response from Sam Altman when he's not completely sure of his footing. And I don't really feel that he's sure of his footing against this one because you don't want to go up against Facebook when it comes to a consumer app. It doesn't usually end well. So what do you think, Rajan? I think that I'd actually, it's been so long

Starting point is 00:46:13 since I played Uno. I had to look up what Uno Reverse was first. But I was like so surprised. I was like, what do you, Uno Reverse? Who speaks like that? But anyway, Sam Altman speaks like that. But the same guy who's coming up with everyday queries for our AI benchmarks. But I thought this was a very interesting one because I have been using meta AI more for some to like for image generation just because it's very easily accessible. And it's still in a weird place because it lives in the search bar for Instagram and like WhatsApp.

Starting point is 00:46:49 or Facebook. And living in the search bar, I've also accidentally used it when I'm searching for something on Instagram and somehow it pushes me towards meta AI and gives me a weird chatbot response. So I think spinning it out is a very interesting idea. And then being meta, they would be incredible at quietly guiding people towards that app from all of their other apps. But it's still, is it needed? Is there other ways to integrate it more as a tab in existing Facebook Blue and Instagram itself? That part, I would think it would just be another tab on the regular app versus having people download something. But I see this one is another threads. They'll get some big numbers, but I don't think it's going to be anything too impactful.

Starting point is 00:47:41 Yeah, I don't think it's going to work. I would like to see an open AI social. network, and not for nothing, but open face is out there for the taking. God. The AI first social network where all of your posts are commented on extensively, where you have a million friends who are all fawning and love you very much, I think they could go down that road, open face. Open face. Open face.

Starting point is 00:48:12 I'd sign up. I'd be a day one user. social networking needs like an entire remake and maybe Sam Altman is the one to bring that to us I mean if anybody can do it maybe it is open AI they're the best product in AI so you don't even need a better model for that I'll give you that Ron John that's the product that we've all been waiting for so we got about 10 minutes left and I've saved this for last and I don't want to spend too much time on it because I am going to have a podcast next week covering it. But Alexa, the new Alexa app is out. This, or is not out, has been introduced. The new Alexa revamp. It's called Alexa Plus. It is conversational. It is able to

Starting point is 00:48:56 accomplish things in the real world. It seems to have an awareness of what happens between your Amazon services. So you can ask it to play a song and then say, all right, can you take me to a point in this movie? That song is on and it will do that based off of Amazon Prime music and video. It will search your ring cameras for you. It will potentially order you an Uber. You can use it to control the sound in your apartment if you have echoes with conversational tones like can we have the song play in that room or can I want to hear it over there. It was a very impressive, very impressive demo I felt and it was live unlike Apple Intelligence where Apple Intelligence was a promise and it seems like a lot of this,

Starting point is 00:49:42 Alexa stuff is going to work. So I do want to do this preface that we're going to have Panos Penae, along with Daniel Rauch, the head of Alexa. So it's going to be a fun conversation. That's coming up on Wednesday. You're going to hear a lot more about that. But Ranjan, I'm very curious, like, what your reaction was in our chat, I think in the Discord, you dropped this German tweet where he talked about how, like, this was chat-GTP

Starting point is 00:50:05 voice on steroids, and Apple has to be embarrassed at this point for how bad Apple intelligence is. But what was your takeaway looking at the Amazon news? And if you can, if you want to, I'm just going to say, if you want to, you can say what it might mean for Siri. Oh, man. I spent so much time rewiring my entire house for Home Ponds. And I watch that event. And I want to just go back.

Starting point is 00:50:36 I want to switch all to Alexa. And I'm sure we'll definitely talk about it more after your next episode. but like it looked good. It looked exactly like what it should be. It looked like putting chat GPT voice mode on a device or Gemini voice on a device or just what basic voice interaction should do right now. And I'm avoiding hitting my table right now. So there's feedback on the mic because you can tell them. No, do it this time. I'm just telling me, sorry listeners. Oh my God. That's all it should be doing right now. And it did what it's supposed to do and voice. We know generative AI voice is that good right now. I think the only

Starting point is 00:51:17 thing that I think could be a little problematic for Alexa Plus is like Amazon does not have a great reputation in terms of privacy or just overall like it can still be a little creepy. The reason I got rid of my Lexus was it would always ask these follow up questions which you could not turn off. You'd be like, what's the weather? Oh, here's the weather. And can I interest? you in these three other things and you couldn't turn it off. So I think, I mean, the way they portrayed, it becomes your like really trusted companion that you're sitting there sharing yourself with. That's a big ask in terms of trust. So I think in terms of the technology, I'm pretty confident they're there in terms of like getting people actually comfortable

Starting point is 00:52:04 with interacting with your voice device in that way. We'll see. But, oh man, this is going to cost me a lot of money. So yeah, we did talk about it yesterday. I'll just give a quick preview of what this is going to look like. I mean, we talked about it yesterday, and they are aware that the, you know, talking back to you and being proactive can be pretty interruptive and annoying. And I think they're paying attention to that as they roll this out. So that'll be at the end of the conversation for those that listen. But yeah, I thought it was really interesting. I think Amazon has a shot here. I wrote about this in big technology that basically all big tech companies want to build a, universal, contextually aware, assistant that helps you get things done.

Starting point is 00:52:45 And Amazon has a pretty good shot to be the one that pulls it off, especially because they have, I mean, they have a working demo of this, and it seems like it's going to go live next month. And I don't know. I mean, they don't have an operating system, which is on one hand, they don't have a mobile operating system. On one hand, that's a curse because that default matters a lot. We know Google pays Apple $20 billion a year to be the default.

Starting point is 00:53:11 search engine on the iPhone. However, that does let them use other productivity services and not privilege their default. And I think these companies privileging their default productivity services have been sort of the downfall of the modern AI assistant. Like if I'm using an iPhone and I can't use Google Calendar or Gmail in there because Apple is so dedicated to whatever Apple Mail, that ruins Siri for me. But Amazon doesn't have that problem. And so I was speaking with actually the head of Prime, who is mentioning that, yeah, I use my Google calendar in my echo devices and it works just as well. So that could be a blessing for them. Yeah, I 100% agree. Though speaking of Prime, the way they rolled out the pricing, I think,

Starting point is 00:53:57 was the most like savage and amazing Amazon move ever. Again, I forget what the monthly subscription is. Is it like, 1999? Or is it 1499? Let me see. It's something that most people would not pay as of today, but as an Amazon Prime member, you get it for free. So it just kind of like assigns this additional benefit to being a prime member, which if you're a prime member, you shop on Amazon X percent more. So they're going to kind of assign this incredible value to it on day one. And you're going to feel like, oh, well, now if I was questioning, should I renew my prime membership? Well, it's a gimme. I mean, I'm getting Alexa Plus for free. Braun, listen to this. Alexa Plus costs $19.99 a month. Prime cost $14.99 a month.

Starting point is 00:54:45 Oh, wait. I thought it was free for Prime. No, if you have Prime, it's free. So basically, you could pay an extra $5 to just get Alexa Plus or five less dollars to get all of Prime and Alexa Plus. Oh, okay. That is savage. That is savage. I mean, if Lena Khan was still around, I don't know what she would say, but my goodness. That is not. But she goes not. Okay, before we go, we need to talk about Skype. Microsoft is killing Skype. This is hot off the presses and makes me very sad. As from TechCrunch, after kickstarting the market for making calls over the internet 23 years ago, Skype is closing down. Microsoft, which acquired the messaging and calling out 14 years ago, said it will be retiring it from active duty on May 5th to double down

Starting point is 00:55:38 on teams, Skype users of 10 weeks to decide what they want to do with their account. It's not clear how many people will be impacted. The most recent numbers that Microsoft shared were in 2023, where it said it had 36 million users, a long way from Skype's peak of 300 million users. We do look at tech with a critical, sometimes hopeful I. I will say that Skype is one of the products that I've loved the most on the internet. Just have good feelings about it helping me make international calls and calls to friends and you could play different games on it back in the day. And that little squeak that it makes

Starting point is 00:56:12 when you get a message will forever remain in my heart. Rest in peace, Skype, we bury you the week after the humane pin goes the way of the Neanderthals. And I'm much sadder about losing you than the wearable AI device. I had my first job, first remote job interview on Skype. I agree. International calls. It opened up the world. Sold for $8.5 billion to Microsoft. in 2011 and sorry you had to get caught up into a corporate battle with Microsoft teams that clearly won and ended up basically, I think the last few times I ended up on Skype, I had all these messages that were clearly fishing and scam things that were just barraging my Skype account and did not open it after. Goodbye Skype. Goodbye Skype. And goodbye to all of you,

Starting point is 00:57:05 But just hopefully for a couple days because I'll be back on Wednesday with those two Amazon executives. And Ron John and I will be back on Friday. Ron John, thanks so much for coming on the show. See you next week. All right, everybody. Thank you for listening. And we'll see you next time on Big Technology Podcast.

Big Technology Podcast - OpenAI’s New Model, Jensen’s Bold Claim, Alexa+ Is Here

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.