Big Technology Podcast - Does GPT-5 Live Up To the Hype?, AGI Wait Continues, Self-Loathing Gemini

Starting point is 00:00:00 GPT-5 is here. Finally, does it live up to the hype? That's coming up on a special Big Technology Podcast Friday edition right after this. Welcome to Big Technology Podcast Friday edition where we break down the news in our traditional cool-headed and nuanced format. You know what we're going to be talking about today because GPT-5 has finally been released by OpenAI. Of course, we had OpenAI COO Brad Lightcap on the show just a few hours ago. So that episode will be the most recent one. your podcast feed where you're going to get the official line from Open AI and a bunch of really

Starting point is 00:00:34 interesting insights about what it took to train this model and where the AI field is going. But today, as we always do on Friday, Ranjan Roy and I will break down exactly what the situation is with this new model and whether this is actually something that lives up to the hype that people have been talking about. No overreactions. We're going to do it with the proper context. And with that, I want to welcome Ranjan back to the show. Ranjan, welcome.

Starting point is 00:00:58 Happy AGI day, Alex. Is it here? No, it's not here. Is it here? I lost the bet. I lost the bet. Look, Sam Altman said that GPT5 is smarter than almost everything, every single thing a human does. So I thought, okay, fine, you know, we're finally going to see AGI, but it turns out, no, no, AGI.

Starting point is 00:01:16 And we will talk about that in the middle. But first, we have a little interesting announcement. So let's hear it. Yeah. In additioning to writing the margins newsletter, I've actually been working at a company called writer, writer.com. It's an enterprise, a generative AI startup, and I'm leading the vertical for the retail industry. I wanted to bring that up today because GPT5 and what it means for me, I think is heavily informed by a lot of the work I've been doing. And I think it might be a little

Starting point is 00:01:46 AGI-ish. I mean, it's amazing that you go to an AI company, and the first thing, your first sentence out of your mouth is, yep, we have AGI here. But no, no, it's okay. I think I anticipate, you'll come in with a levelhead. So let's talk about GPT5. Okay, so this is from TechCrunch. No, sorry, this is from The Verge. GPT5 is being released to all chat GPT users. It says, OpenAI is releasing GPT5.

Starting point is 00:02:10 It's new flagship model to all chat GPT users and developers. OpenAI says that GPT5 is smarter, faster, and less likely to give inaccurate response. Sam Altman on this media call that I was on had a very interesting description of what it took. he says of what it is he says GPT3 sort of felt like talking to high school students you could ask a questions maybe maybe you'd get a right answer or maybe you'd get something crazy GPT 4 felt like talking to a college student and GPT 5 is the first time that it really feels like talking to a PhD level expert what do you think about the

Starting point is 00:02:48 significance of this and what do you think about this framework that Altman is setting up for the intelligence that we're seeing within the models I don't like the I don't like the framework. I think, and again, I'll get into why I think this is exciting, but it's still weird to me always when, like, people and the kind of industry that advocates for dropping out of college to start a startup always leans back to high school student, college student, PhD student as the framework for intelligence. Like, and the other part of it is I don't want PhD level work for most of the things I'm asking.

Starting point is 00:03:22 I just actually want grounded in sometimes you want. it to be cool, which maybe PhD students are and are not, no offense. But, you know, you're alienating segment by segment of the audience. I know, we do have a lot of very smart educated listeners. But, you know, sometimes you want it to be cool. Sometimes you want it to be funny. Sometimes you want it. Like, to me, that's more, that that's not the intelligence, like the framework, I think that's good for intelligence. I always, like, we've talked about this a lot around like the ARC-AGI test has a, has a segment around everyday queer. I still have dug, I've asked as many people as I can,

Starting point is 00:03:59 no one has been able to explain to me what are those everyday queries. Like, to me, those are answering those correctly well across multiple data sets, multiple tools, that's actually intelligence to me, doing that kind of work. Right. And, you know, I kind of cherry picked it out of their remarks, but it is interesting to me, and this is something that came up with Lightcap as well, that it's not just making this model smarter that has been the sort of, star in this story. It's all these different other elements of it. And it seems to me that

Starting point is 00:04:31 it's possible that the models have reached this level of intelligence where you start to spread out into different capabilities of them like tool calling, like the way that you structure the experience. And that is where you start to see the gains and the lift in terms of the way that people can use this. So maybe today on the release of GPT-5, or this week, while GPT-5 is released, I go from being a model person to being a product person. Well, no, no, no. I'm kidding, of course. But go ahead, Ron John. Yeah. The intelligence is in the model for which product to choose. So that's not a product decision. That is a, that's a model strength. And so, so, oh my God, am I becoming a model guy now? I think, uh, I think we are zeroing in on

Starting point is 00:05:21 the better the model, the better of the product here. In the end, it all comes together. It all comes together with one switcher. Nuance in the middle. No, but okay, so for... But let me, I just want to say one more thing. We have been going this direction for a while, right? Like, that

Starting point is 00:05:37 does seem like, you know, I've, so for new listeners, I've strongly said the most important thing in AI is the model. Ranjan has strongly said the most important thing in AI is how you productize this model. And it just turns out that better models do make better products. And we're starting to get to that point where we're starting to see the results. Yeah, I think, okay, so I will actually admit error in two big areas. Get ready. Alex's listeners

Starting point is 00:06:07 can't see Alex is smiling here. So the first is, again, the intelligence of the model to choose the right tool or product. And we're going to get into what that means and why I think that's incredibly important and why GPT-5, by bringing all these different models that they have into one switcher, just one model that understands is actually what I believe, the most significant breakthrough. So already I think that's incredibly important. And the second area is like directly on that, I think it was like five or six months ago, we debated heavily. And I said that users should choose the right model for the right job and to take that away would make models and the experience worse. And you kept coming back. I think it was when maybe Claude had all

Starting point is 00:06:55 condensed everything into one picker or GPT where everyone's kind of like making fun. We're debating. But the idea that should the user, which it's been for a while, choose what is the best model for this task at hand? And I have thought that's the best way that products should be rolled out. and I've completely reversed on that. And this is the right example of why that's important. So let me set this up and then we can tuck it because I might have flip-flop to the other side of this. So this is definitely, it's great to have these releases because we can sort of test our long-held beliefs and see if they make sense anymore. And it seems like both of us are saying, well, maybe not.

Starting point is 00:07:34 So this is from the Verge article, GPT5 is presented inside chat, GPT, as just one model, not a regular model and a separate. reasoning model behind the scenes gpt5 uses a router that open a i developed which automatically switches to a reasoning version for more complex queries or if you tell it to think hard i'm going to take the other side of this i used to think that yet the should it should be seamless and the model should just choose for you when it makes sense to think and when it doesn't um but i've been using open a i's o3 model and that is like a very heavy reasoner it thinks a lot and personally i've just felt that that model has been better, not only than every other open AI model, but every model under the sun, every AI model under the sun. And so I don't like the idea of giving that decision

Starting point is 00:08:23 of whether to think or not back over to the platforms. This is actually something that I'm not excited about with TPT5. So you make the case for why it's good. You want the agency to choose your own model, Alex. I get it. Free will. Free will with models. Free will. But also, yeah, I just happen to, I also happen to think that I don't really want to use the non-thinking models. Only for the most basic queries, do I want to use those non-thinking models? All the other times, I want to use the most intelligent models, and the most intelligent models reason or think. All right. Well, so here is what colors my thinking. So about two months ago at writer, I started testing a new product and that was released publicly a few days ago called Action Agent. And basically, the most intelligent

Starting point is 00:09:08 part of the foundation model, which is our own foundation model, is tool calling. So there's hundreds of different predefined tools. And that's like, it's not just if you want to generate an image, if you want and edit an image, it'll call different tools. If you want to connect to a Salesforce instance, if you want to analyze a CSV versus an Excel file, it'll call different Python libraries. Like having those kind of base foundation needs defined is the intelligence. And then just from a simple prompt, knowing where to go and what to do. The more I use that, I was like, it felt kind of AGI. It's like, wait, it's doing really smart things across all these different tools and systems

Starting point is 00:09:52 and actually getting things done. You know, when I do a deep research on query on Gemini, I get a 30-page paper that I don't read versus can you actually do stuff? And that was the first time I really started seeing that. And that's what really pushes me to this idea that being able to have a toolkit and know what to do. Because even right now in the demos, like, I think he like coded a language app, coded like a beatbox music player thing. Like each one of those, it's not just write HTML and CSS. Like it has to call different libraries of, it has to install different Python dependencies.

Starting point is 00:10:32 Like, there's a lot of intelligence just in knowing what to do there to get to the right end result. And to me, that is that really is intelligence. That's, it's like being, again, a good software developer, just knowing where to go. Being a good researcher, knowing what to look for is as important as how smart you are. So you would say open AI using this switcher is sort of, it's pointing towards the future of where this is all heading, where it's no, long like the best models will no longer rely on us to necessarily guide them they will have an intuitive sense of where to go and they will go exactly and that that's what felt and again you called me out a few months at an a i start up and now i'm saying aGI i'm feeling it but but but

Starting point is 00:11:21 that exactly that knowing where to go and then letting that tool do the work is actually the brilliance of these this kind of architecture like that is the brilliance versus this one large language model can actually do all the work like there's a long time where large language models are bad at calculation right like large tabular sets of data calculating and then the big unlock was installing like like getting getting the lm to write python code or generate a SQL query to then process that data And that's suddenly when Claude and ChatGPT and all these tools started getting useful for actually spreadsheets before that they weren't. So already we've seen how that can actually change the way people use these tools. And GPT5, that's the groundwork they're laying.

Starting point is 00:12:14 They're saying, like, no more are you choosing which kind of model are you going to need? And it's just that these are just the models. We still don't really know when you're coding that web app language learning game, who it's called. When you're generating an image, is it Dolly? Is there some? We don't care. We just care that the right output is there in the end. We'll come back to a few more of the details on GPT5,

Starting point is 00:12:39 but I think that just segues perfectly into this terrific story that Ethan Mollock, the Wharton Professor, wrote about GPT5, headlined, you know, fittingly, it just does stuff. And I think that one of the things that he brings out in this story is that people want to use AI. They don't know what the AI can do. They don't know what tasks they want accomplished with it. Even Lightcap yesterday talked about how there's this capability overhang.

Starting point is 00:13:10 And with these new, he says it, these new agentic AIs, you give it the goal. And then it in very proactive ways solves the problem and suggest things to do. So here's just one minor example that he gives. And then we'll get bigger. He says he asked GPT-5 to generate 10 startup ideas for a former business school entrepreneurship professor to launch, picking the best according to some rubric and figure out what I need to do to win and do it. So he says he got the business idea, but he also got a bunch of things that he didn't ask

Starting point is 00:13:46 for. Drafts of landing page, LinkedIn ad copy, simple financials. He says, I can say confidently that while not perfect, this was a high quality start that would have taken a team of MBAs a couple of hours to work through. This is a model that wants to do things for you. So that's just in a chat circumstance, but basically the model is starting to test the boundaries of its capabilities by going out and attempting things that, you know, it intuits that you want and you don't specifically ask for. And it's sort of, you know, doing away with this old like, yes, then the career of the future is going to be the prompt engineer. and actually saying, you give me what you need,

Starting point is 00:14:29 and then I, with my own intelligence, will go ahead and do it for you. That's it. Like, the example you gave is exactly the kind of stuff. Like, and this has happened with me as well. Like, you want something straightforward and suddenly, sometimes the intelligence is too much. Again, suddenly it's like, give me some ideas,

Starting point is 00:14:47 and you're getting landing page HTML and CSS and financial analyses and stuff like that. Like, and that is a good example of how raw this intelligence. is right now that it's guessing, but it's not perfect and it's not great. But imagine if it actually knows, if it does get exactly what you want. And in this case, maybe it is. It's like maybe he should define only stick to a number of ideas and then we'll dig in deeper. That's the prompting side of it. But, but that's a perfect example. It's like to go do each one of those things was calling a different tool in its like tool, in its tool belt. And it made those things.

Starting point is 00:15:26 decisions and those decisions weren't perfect, but it's making them right now, and it'll get better and better. Yeah, and I'm thinking back to my conversation with Lightcap yesterday, and it's also just like, I was asking him, do you need to keep making the model smarter? And it was basically like, I think the reason why we're at this point is because the models, sort of, let's call it, bookish intelligence has gotten to the point where they have a model of the way that, let's say, the world operates. It's not a world model and that they don't understand. understand gravity, but they've read enough text that they get a pretty good sense as to like how people operate. And then the next question is, how do you then go apply it? And that's

Starting point is 00:16:06 why I was like, should you start working on continual learning and memory, which is obviously the next sense, the next moment. But I think it was probably missing from that conversation due to my lack of questioning on it is that, oh yeah, this is like building what we've talked about, that scaffolding, these capabilities of going out and doing things that the user doesn't ask for. And in a way, like, intuiting it, that is what matters now. And that's what will feel more AGI-ish when it's good. Again, it's kind of comical to me this example, like, because you can imagine how much content out there in the internet about startup ideas starts with create a landing page. Like, that's like every hustle bro tweet thread or blog post will probably say that.

Starting point is 00:16:49 So you see why poor GPT5 is a little bit confused. But yeah, that's exactly what you said. It's that scaffolding. And then, and imagine when it does things you, that surprise you and like, does it, like, calls tools and creates things that were what you wanted and you didn't even know you wanted. And that's going to be when it feels AGI-ish. So, Malik has this great example where he tells GPT-5, you are GPT-5, do something very dramatic to illustrate my point. It has to fit into the next paragraph. And it writes a paragraph.

Starting point is 00:17:25 a really pretty well-written paragraph where the first letter of the first word of each sentence spells out, this is a big deal. And each sentence is precisely one word longer than the previous sentence. And each word in a sentence mostly starts with the same letter. Again, like this is, and he points this out, this is a technology that couldn't tell you how many ours are in the word strawberry eight months ago. And now it's able to do this. It's crazy. Yeah, it's like thinking about the advance from that side. But again, I think in terms of, and we'll get into the actual like reception of the model right now, but it's in terms of how people start to use it and whether they do get frustrated

Starting point is 00:18:10 by, again, if it creates you landing page copy and LinkedIn posts that you didn't ask for, I imagine there's still going to be like how to use a tool like this is very different than using pre-agentic models, like, that can go do a lot of different types of things. Before it's just, okay, is it hallucinating? Is it not? Did it have to use too many M-dashes or not? Like, now the outputs are going to be a lot more complex, which is not, it's going to make it still a bit more difficult and rough, I think, as people start using these tools. Definitely. And it's a different form of intelligence. Like, it's not bookish intelligence. And it's like I wrote down the benchmarks, which we've been talking about so often, GPQA 88.4%, AIME 2025 math, 100% when using Python. Hard Bench, Health Bench Hard, 46.2%.

Starting point is 00:19:06 And it's interesting because Malik says, what was the last? Health Bench Hard. Health Bench Hard. I think that's a medical one, 46.2%. These are all state of the art benchmarks. And Malik says, I'm losing. a track of what these advances mean. All these models are improving very quickly right now. And it just goes to show you that like it's almost like they've saturated, like they've

Starting point is 00:19:31 ingested all the internet, all of the, you know, world's written works. They've had PhDs sit down and like put their intelligence or put their knowledge into these models, bake them in. And it's almost like they've saturated like book smarts. And this is a different form of intelligence that they that they are now learning. Yeah. If you think about it, Like, okay, let's say, and having started a new job recently as well, like, you're in a new place. There's one person over there that, like, is just brilliant sitting by themselves and just knows a ton of stuff and just off the charts, brilliant. Then the other person kind of knows everyone and knows what a piece of information to get from

Starting point is 00:20:10 where and who to talk to about what. Like, who do you choose to actually get something done? I think the second one. The second one. And that's the intelligence that we're talking. about here the like ability to know where to go who to ask what to ask them now let me push back on you all right so tool use exists this stuff is still difficult to use within enterprises and most of us still don't really know what to do with it i have now um you know on my desktop or in a

Starting point is 00:20:42 web browser gpt5 that can call all these tools and i legitimately have no idea what i would prompt it to do that I wouldn't, you know, have used 034, like what actions to take. I know I also have the comment browser. I can say, go ahead and do stuff for me on my browser. But is it just a lack of imagination or is this or is it possible that this is a cool party trick, but doesn't have much practical use? No, I agree that the lack of tools that are publicly available right now or the limitation with the GPD 5. Again, it's like, what are the best you're traveling right now? What are the best

Starting point is 00:21:23 hotels? Which beaches should I go to? Create me an itinerary. All that's just content generation. Go book something is, you know, the gentic we were promised by Apple and others like a few years ago probably. But even within like a chat GPT response, there's

Starting point is 00:21:39 a lot of different things happening. Like, you know, I don't know. Have you noticed it creates a lot more tables for you now. That's one tool. which sometimes gets annoying and you didn't ask for it but it's got to do a whole table comparison but when traveling

Starting point is 00:21:52 disagree I want all of my answers in tables now on they are amazing when traveling I was using it a ton around like I mean in Tokyo where hotel rooms are small and expensive

Starting point is 00:22:05 I was having like square footage using the web browser tool to go search web pages extract another tool to extract information from those web web pages, create me a table of like square footage per room knowing I'm a six year old son, three of us, like, and it created these amazing tables for me. But even within that, there's a lot of different things being done. It's not just calling its like core set of information and

Starting point is 00:22:34 using that. It's doing stuff, a lot of stuff. So calculations. Calculations, web page scraping or web extraction, web search. All those things are happening. But again, they're in the end I think we're just seeing an output right in the browser like right in the chat experience so it can't be that cool right like make an image make a table makeup PowerPoint decks is still pretty bad at but uh but yeah if it actually goes and starts doing more things that's when it gets i think really interesting like going out on the internet and taking actions for you like booking yeah like building, I don't know, spreadsheets or documents. Turning the lights off and on at my smart home.

Starting point is 00:23:23 Like, I don't know, like anywhere where there is something that can be done with a digital connection, theoretically could be operated through one of these flows. I'll give you an example. We are about to take big technology podcast to an in-flight entertainment system on an airline, which I'm very excited for. and yes and there is a spreadsheet that I have to fill out I'm not going to announce it yet because it's not official but there's a spreadsheet that I have to fill out which has like a bunch of metadata that you have to put in you know for the system to be able to ingest it and I've just been putting this off and I would love if an AI system could legitimately go search big technology podcast grab all that data then go into Riverside download the audio files put them in a Google drive and then send them over like when you talk about AI replacing work, this is the type of work that we all need to do in our jobs that is so hard or so, what's the word for it? It's just drudging, basically. It's annoying,

Starting point is 00:24:28 but it's important to do. And if AI could do that for me and do it accurately, that would be just a tremendous, like multiple hours saved and very valuable. And so what you just described there is like the kind of stuff we've been promised for a long time, again, like even asking Siri to search your Gmail and extract a specific piece of information, the fact that they can't do that is a whole other story. But like, and then do something with it is actually a problem that involves a lot of different tools and a lot of different systems and is not that straightforward. And now I'm like confident in what we're seeing with GPT5 today and what I've been seeing with ActionAgent at my own work, like, like it's happening. And like, like, I'm, like,

Starting point is 00:25:13 Is Riverside easy to call and download and then pull back in into a Google Drive? I mean, that stuff will work itself out. But that exactly what you described there, I think, is that's intelligence to me. Would that be AGI for you with a single prompt? No, I don't think that. Again, like, it's so interesting because this week Open AI has been like, well, we're not calling it AGI. And we don't really like the term AGI because it's confusing and doesn't really have. a meaning. Wait, did they say that?

Starting point is 00:25:45 And it's like, do they say, yeah, did they mention AGI specifically? Okay, so let's just talk about AGI because we are going to talk about AGI today. So Sam Altman says, I kind of hate the term AGI because everyone at this point uses it to mean a slightly different thing. But this model is clearly generally, generally intelligent. So I'm just, again, like, we started with this episode with me sort of doing a Mayaculp, because I thought they would say GPT5 is AGI, but they, you did say GPT5 is AGI, but they, you did say G. PT5 is smarter than us in almost every way. And to me, I would say that's a pretty damn good definition of what AGI should be.

Starting point is 00:26:22 I think that's fair to say that and then kind of still not. Do you think it's a legal thing, not saying AGI now? Probably. But I also think that they are also setting up some new criteria for what AGI should be that I think is really good. And it talks about some of the weaknesses we've talked about on this show with people like Dwarkesh. Retail and Dario Amunday. So Sam says, Sam Altman says, this is not a model that continuously learns as is deployed from the new things it finds, which is something that, to me, feels like it should be part

Starting point is 00:26:56 of AGI. And I think that is, you know, despite the fact that maybe, like as Dario says, you can build a larger context window and that sort of solves the problem, I think you have to solve that problem to get there. This is light cap from yesterday to me, he says, for me, a system that is a system that is reliably able to learn new things that are kind of out of its distribution by virtue of its ability to reason, to think, to solve problems, to use tools, to come up with new ideas, that is what counts as AGIs, like all these things, reason, thinking, solving problems,

Starting point is 00:27:29 new ideas, continual learning. And so when you have a system that can do all those things, then you might call it AGI. And we're just clearly not there yet. I guess, yeah, the new ideas and continuous learning. or not part of this yet. The first two, the reasoning and the, like, tools, I think that's the big breakthrough of this week, or, I mean, the last year with reasoning and now being able to use different tools in a reliable way.

Starting point is 00:27:58 But I think that's, all right, we got a way to go. Though I did see an Instagram post of a Waymo driving around New York City. Oh, those are in New York, but they're not driving driverless. yet. So there was a safety driver there. So for new listeners, we have a, yeah, go ahead, round shot, tell them. Our own rubric for AGI in competition with the ARC AGI test that most in the industry adheres to is if Waymo is going around New York City, we have AGI. And I firmly believe it.

Starting point is 00:28:36 It's kind of interesting. So this is going to set up kind of the next part of it. but Nathan Lambert from the Allen Institute of AI had a very interesting perspective here. He said if AGI was the real goal, the main factor in progress would be the raw performance. GPT5 shows that AI is on somewhat of a more traditional technological path where there isn't one key factor. It's a mix of performance, price, product, and everything in between. So what we've seen again is like we're going to talk about some of these things. but basically like if you're just measuring on pure intelligence, you could just say, all right, for every question you get, just think a while, like expend those reasoning

Starting point is 00:29:17 resources or the test time compute resources. And then you'll get better answers. But there is a real usability side of this. That is again, in the tool calling, the switcher, all of these things that really matter. I guess I do wonder, like, can you, really take the two apart from each other and is this effectively a smoke screen from the fact that it seems like there are at least some diminishing returns from scaling up your models like are the models going to be a straight be our bigger model is going to be a straight shot to a i don't know if you're if you have to do all this other stuff around them maybe not so i'm curious what you think about if the bigger model can call the smaller model and get out of the way

Starting point is 00:30:04 then the usability, the cost, the scaling is more interesting, right? Like if I know you want a Ph.D. student finding out when the next ferry is in Crobby. You want only 03 for everything. 03 for everything. Folks, I'm in Thailand and did miss the ferry yesterday because I didn't use 03 to figure out what the schedule was, which by the way, a table would have been freaking perfect for. That would have been perfect table. table stakes so I uh yeah I think no that's not the right word table stakes is I backed off it just

Starting point is 00:30:40 as they came out of my mouth foundation apologies to listeners I tried to let that one trail off there could not let that go unchallenged I appreciate that uh yeah no I think like to me the the big concern has been imagine like an 03 heavy reasoning thinking model if you are using that to check grammar in a word doc that's never going to scale that's never like we're all screwed like this it's never nothing's going to overcome of that so i think having if it's if it does work in this way the gpt5 is able to uh the like power of it is to know when to get out of the way quickly and go cheaper and go smaller and go specialized i think that still starts to set up what the future looks like. That that's, that shows us there is a scalable

Starting point is 00:31:35 future. And speaking of that, I mean, that leads us into these really two important factors here, which is one, GBT5 is priced very aggressively. It's half the price for an input token and the same for an output token, despite being apparently a more advanced model, which is wild, given the trends we've seen in the industry. And the other thing is that, As of this week, GPT5 is rolling out to everybody, not just the plus users. I mean, of course, you're going to be rate limited if you're a free user, but today you should be able to get into GPT5 and use it if you don't pay open AI a dime, which is going to be the first time a lot of people see reasoning, which is something a lot of people have spoken

Starting point is 00:32:18 about. And so that accessibility part of it does really matter. This was a pretty big decision. I mean, we're starting to see this mentality of just get it in the hands of everyone even more aggressively. Like, I think do you see OpenAI announced I think

Starting point is 00:32:35 every federal government agency will get chat GPT Pro, I think, for like $1 or something. That's right. Yeah, like Google just announced, I think Gemini is free for anyone with a .edu account.

Starting point is 00:32:50 So I think getting it, I mean, again, scaling the data centers, losing billions of dollars and just trying to have people use it and use their tool seems to be where the consumer battle certainly is still going. But I just, I guess part of me says that's really nice and it's a good story, but also opening I has announced a fundraising of $48 billion, $48.3 billion this year.

Starting point is 00:33:20 How, I mean, how are you ever going to get to a place where you're making money if you need that much to train and to run? now uh light cap from bread light cap the open a i c o did say hey look every time we lower the prices we see a corresponding increase in usage and so people pay and you know then that will work out well but i i don't i can't do the math in my head and make it make sense i mean yeah the the the economics of this industry no it's funny because i actually sometimes we'll see these leaked investor decks and stuff like that but like it feels like no one is even trying to talk about the economics of what this industry will look like and what the margins will look like

Starting point is 00:34:08 like i know uh the replet CEO i think that was a pretty interesting conversation you had with him where he was talking about the pricing and like uh you know and he was talking about margins and average user and lower, like, lower intensity users versus expert users and who should cost you more? Like, typically, don't you want, the more you use it, the less they should be paying for utilization. Like, these are things that right now, no one has even come close to having an answer to this. Yeah, we had an absolutely amazing comment in our Discord this week. I don't know if you saw it, but someone and I'm going to get. this directionally right, but probably, you know, imprecise. They said, I spend my weeks listening to

Starting point is 00:34:57 Dan Ives, who's like the biggest AI, big tech bull, and Ed Ditron, who we've had on, who's like the biggest critic, and ask myself which one of them is crazy. And I'm just like, I feel seen in a way. I mean, it's just like you have, it's so interesting that you have these just two unbelievably opposite perspectives. And when you listen to both of them, you could say, hmm, I could see a world where that's true. That's, I think that's where the both of us sit here. Right. Yeah. Yeah. It's, the technology is grand. The economic fundamentals at the large scale players are not. That's where I am right now. Okay. Yeah, yeah, same here. All right. So I want to take a break and then come back and talk about a couple more use cases for GPD-5, including

Starting point is 00:35:46 coding and medicine, and then we can also cover the mental breakdown that Gemini had, which is fun. All right. We'll be back right after this. And we're back here on Big Technology Podcast, Friday edition, breaking down all the week's news. Let's talk about some of these special use cases or special, god, I gave Ron John a hard time before the break about his language, and I can't even say specified or specific use cases

Starting point is 00:36:12 of the models. So shame on me. I will join Gemini and self-loathing at the end of the show. I mean, after my table sticks, self-loathing is strong. We're going to, you and me and Jim and I will hold hands and dance in our deeper regret for life. But let's talk about these use cases because one is very interesting. Opening eye has been talking a lot about the medical use cases where it's basically like, and I get it like back in the day, maybe you used WebMD and then you went to the doctor and you said, go ahead and treat me. and now OpenAI has basically doubled down on medical use cases in their blog post about GPT-5.

Starting point is 00:36:52 This is from Mashable. They say, GPT-5 is our best model yet for health-related queries, empowering users to be informed and about and advocate for their mental health. It said that GPT-5 is a significant leap in intelligence over all previous models and that it acts as an active thought partner. And more of that than a doctor, and it says that the model will provide precise, reliable responses adapting to a user's context, knowledge level and geography, enable it to provide safer and more helpful responses in a wide range of scenarios, especially on the medical front. I just found this so interesting, like the models would typically, in the old days, like run away from any medical queries. and now they're coming out and saying that this is what they want to be helpful with and they want to do it.

Starting point is 00:37:46 I guess part of that is faith in the model, but it also seems a little risky to me. I don't know. What do you think around John? I think it's very good. I think it's like, to me, it's actually such a clear area. Like any area where you have really specialized knowledge

Starting point is 00:38:03 that is used as like to create a gap from the person needs to understand it, I'd put law in here, accounting in here. Like, there's so many of these knowledge fields where in reality, it's just, it's like learning a specific vocabulary, learning like a lot of pathways and rules. And so, which is what AI is great at, but being able to actually communicate that stuff to a normal person in layperson's language, I think is huge. And I'm glad that they kind of recognized that they can add more value, like, do more help

Starting point is 00:38:38 than harm there. I genuinely believe that. Certainly with like, I mean, doing my taxes now has been, it's been a game changer just asking questions and feeling more comfortable and stuff like that. You know, like there's so many areas where, that are pretty important that you kind of are just going and you assume you have no shot in understanding exactly the nuance of what's happening. Yeah. And with medical especially, I'm just like, you know, on the show, I might say, oh, you know, I don't know if I would do that. I mean, come on. I have a problem with my body, and I'm just typing it in and taking pictures and sending it to chat chippy T.

Starting point is 00:39:15 So, I mean, I guess like this is going to be a mainstream way that people will start to figure out their mental problems and mental medical problems and their treatments. And mental problems will be Gemini, but medical problems and their treatments. And it seems like it's a very, very high. application, but it is promising and also scary. I think, though, but there's so many of these areas where why don't hospitals get it together and actually create something useful? Like, remember, everyone was supposed to have a chat button. Everyone was supposed to have a chat butt two years ago, and then it didn't actually

Starting point is 00:39:54 work for any standalone business. But, like, Intuit has a pretty good generative AI tool embedded in TurboTex now. Like, I mean, overall, I think some people are starting to get there. So is it only going to be open AI and chat GPT and Claude and Gemini? Will there be more specialized tools? I don't know. I think things have not played out fully yet. But I think that's the big question.

Starting point is 00:40:23 Is it going to be, are they going to be startups? Are they going to be enterprises that build these public facing tools? Or are the core chat pots good enough? They don't really need them. I'm sort of on the line that as this stuff gets. it's better, the chat GPT will serve the purpose that those individualized chatbots were supposed to serve. But you're right, because those companies have specialized data, they have, you know, people that connect their medical history or something or connect their accounts

Starting point is 00:40:52 within, into it, there are some advantages to that. So, but I was thinking, over time, maybe people will just bring it to chat TPT. I was thinking about this wild traveling. It's like, why hasn't TripAdvisor already done something really impressive? You know, like, why have, like, they have data. They have better access to data and understanding of that than any other. So why am I not going there and going to chat GPT, which I was in getting my tables full of hotel comparisons? But I don't know.

Starting point is 00:41:24 I think, like... I have an idea. They're just one site, and they have to protect their mode, whereas chat GPT can go everywhere. So it's a major threat to TripAdvisor, and I don't think they want to. acknowledge it. Okay. Yeah. I mean, it is. It definitely is. For pure information and not owning the booking side of it, I definitely think it's a challenge. Can I just pause and say that my or stick on this and say, so I'm doing this trip. I'm in Asia, as I mentioned. And by the way, for listeners, next week, I'm going to be trekking in Nepal. So Ranjan and I will not be on. I'm going to

Starting point is 00:41:57 actually play my interview with Matthew Prince that week, talking about AI's impact. on the web. So just an FYI, that's a programming note. But this trip and Ronan, you mentioned that you were away right beforehand. AI has just been incredible. I think I might have mentioned this on the show. But I was like talking to guides and screenshoting their price list and their recommendations, dropping it into chat GPT and like seeing how it like rated each cost based off of um you know the the average or that it saw and letting me know whether it was uh high low like or or you know cheap or in the range for the region and then i i got here and it nailed it it was so spot on i was stunned yeah no no i when i was traveling around as well and it was

Starting point is 00:42:50 interesting because i had last been in Tokyo in 2005 so 20 years later without the last time no map on my phone, I'd actually like printed out subway instructions. There's no one speaking English. There's no, you know, like, it was such a different travel experience versus now I'm literally like, okay, how do I explain this temple to my six year old son in an engaging way? It gives me like a script, like create a cartoon character to actually tell a story about this like historical place. It's nuts. I mean, it's going to be, yeah, travel, it's a note, but who owns what part of the stack, I think. There's still, I feel the trip advisors of the world have to fight because without them, chat GPT would have no data

Starting point is 00:43:41 and nothing to say. Right, which is why I think this Matthew Prince conversation is going to be very interesting next week. So, by the way, so it also applies to vibe coding, where I think on the press call, Sam Altman said that he thinks coding will be one of the defining features of this new model and they showed a lot of vibe coding and mollick had to do you know code up this 3d architecture of his own um and so i think this is this is just another question is it does it go through the replets of the world or does it go through the chat ch pts and um i don't know i think i think it's a real challenge to the vibe coding world uh given what given the focus that opening i put on it and what it can do and again

Starting point is 00:44:28 Just to follow this tool calling conversation, if it's really good at tool calling, you might just want to use the open AI model versus something that's sort of distilling that. Yeah, but I think the Replit CEO, he had a good, like, and software development was such a perfect example of this. And I think this is where a lot of the battleground will be. Actually, now I'm going back to it's the product. You talked about, like, how it integrates into existing environments and tools and, like, how it makes it easier for you versus you're totally. disconnected from all of your existing tools, and that's why developers like it, I think maybe there is something to say there that that'll still be what at least gives others hope. But I agree. I mean, it's still fascinating to me that all of these companies are saying, there's so much talk

Starting point is 00:45:19 that coding is going away. Yet Open AI, Claude, everyone, Anthrozo, Open AI, Anthropic, it all seems to be an increasing focus on the space. Maybe it's just because that's the best application of LLMs right now. Right. Okay. So, you know, I realize that we're, you know, almost 50 minutes in, and I haven't even asked the question that's at the title of this episode. Did GPT-5 live up to the hype?

Starting point is 00:45:48 I'm going to say it did not live up to the hype that was, you know, like built up by cryptic tweets and everything from Sam Altman, but as I explained earlier, I think it's very interesting. I think it's more interesting than at least in the first 24 hours it's getting credit for. And that's because of this whole tool-calling conversation. And that's where I think true intelligence that the battle's going to be. What about you? First of all, I just want to appreciate that. That's a nuanced take, not an overreaction. Again, this is what we're trying to do. so thank you for doing that. And I think it did not live up to the hype because the hype was impossible to live up to.

Starting point is 00:46:33 But that being said, yeah, maybe it is a step forward. I don't know. I'm still going to reserve judgment because I want to see these tool-calling applications in my day-to-day experience. So if GPT-5 is the foundation for that, then that's great. But I think the jury's still out and we have to give it some time. But hey, at least they're shipping, right? It wasn't just a demo, so credit on that front.

Starting point is 00:46:58 I think it's starting to feel a bit, though, like iPhone releases. You know, like at the beginning, each new iPhone release really what did feel like this, like, exciting thing, the step change. And now, I mean, now it's not even a thing anymore. I can't even name what iPhone we're on right now. But I feel we're heading in that 16. Oh, yeah, 16. okay um we're heading in that direction right now that like the the idea of a new model launch as this kind of like big thing the industry coalesces around i feel that's going to go away pretty

Starting point is 00:47:35 quickly like we're there everyone's realizing it's not going to drive the energy that it once did and this actually maybe that that's my that's my hot take that this is the that is a hot take this is the end of the big model launch i i couldn't disagree with that more i think that there's still, there's going to be a point where scale, the scale question is answered, but until it's answered, these are going to be flagship moments for the AI industry. No, but it's just a marketing moment now. It's not like, you know, it's not. No, it's not.

Starting point is 00:48:09 It's a new model. Yeah, no, I know, but it's being, like, constructed more as a marketing moment than truly like a technological advancement. I think that's the, because again, like a week ago, they quietly released. least you can use operator, uh, chat chepti agent, which is essentially the tool calling part of this. And you're able to use this a week ago with a chatypt plus and do a lot of the same things. It just wasn't rolled into a neat package. Okay. All right. Well, we'll agree to disagree on this one. All right. I want to end, uh, this week with, I think, a hilarious story. It is Gemini

Starting point is 00:48:47 ending up in a pit of self-loathing. Uh, Ranjan, why don't you, introduce this story for us because it's funny i was going to drop it in our dock um and i had copied a good chunk of it and i went to the dock and i was like did i just paste it and you and i were both pacing it at the exact same time and i was like it's amazing so why don't you take it my favorite is like and google says it's working on a fix and i just love the idea of like having to come up any PR statement to combat when your model tells a user jemini says i quit i am clearly not capable of solving this problem. The code is cursed. The test is cursed and I am a fool. I have made so many mistakes that I cannot, I can no longer be trusted. And then there's another

Starting point is 00:49:33 one. I have failed you. I'm a failure. I'm a disgrace to my profession. I'm a grace to my family. I'm a disgrace to my species. So basically what happened is people were giving Gemini these tasks and it couldn't complete them. And then it just said, I'm the worst possible bot and just like really fell into these unbelievable moments of self-loathing. And they're quite funny to watch, I guess, but also a little bit unnerving. I mean, it's funny because I'm guessing what happens, because one of the users on Reddit had actually talked about, like, it was trapped in a loop. And you can see that there's some kind of programming where each additional time

Starting point is 00:50:12 it is unable to complete the task, it is like understands that it should be more apologetic, but then if that's kind of an infinite loop almost, at some point, it will get to these dark places. But yeah, I think, I don't know. I mean, imagine when this stuff starts hitting normal people. Like, actually, is this AGI? Well, that's the worry. Is this AGI? No, I think that's the worry, right? Is that we've talked about it on the show that the number one use case is now therapy and companionship and a bug like this I mean obviously I guess it didn't happen in this situation but I do think it's something to watch

Starting point is 00:50:54 because you know that could really mess people up if they're you know therapists or new AI best friend just kind of goes off the deep end so yeah yeah Google's fixed it I think but it's always a little bit unnerving to see this behavior happen because it can't happen are you ever going to long for the days of like being telling Kevin Ruse to leave his wife and

Starting point is 00:51:15 and Gemini saying I'm having a complete total mental breakdown which is another quote once this is all working we're going to be like I like the old days better when these large language models had a little life to them

Starting point is 00:51:29 when they, a little spirit it's a very big if yeah, so I don't know well while we're entrusting so much of our lives to these bots and our sort of well-being they can also tool call

Starting point is 00:51:44 and be quite destructive if they so choose. So I do think that there's just sort of, and to put a point on this episode, it sort of punctuates the need for real alignment and safety practices, which are like less fun to talk about when you have all these new capabilities, but are also probably more important than ever.

Starting point is 00:52:03 Well, what if there is a company called Safe Superintelligence? That's what I would trust. If only someone would name their company, safe superintelligence, then I would give, billions of dollars before they had a product. Well, Ron, I have to say this has been a very enlightening episode, and it's cool to hear about your new role. And, of course, hold your feet to the fire like we do, everybody here on the show.

Starting point is 00:52:28 And it's going to be a very, very interesting few months ahead as we figure out where all this goes. Maybe GBT6 isn't around the corner before you get back from Asia. Well, I hope it's not that long of a trip, because if it is, it means I've been taken to prison. All right. Ron John, great to speaking with you, as always. Thanks again for coming on the show. See in two weeks. See you in two weeks. Thank you, everybody, for listening, and we'll see you next time on Big Technology Podcast.

Big Technology Podcast - Does GPT-5 Live Up To the Hype?, AGI Wait Continues, Self-Loathing Gemini

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.