Big Technology Podcast - Does GPT-5 Live Up To the Hype?, AGI Wait Continues, Self-Loathing Gemini
Episode Date: August 8, 2025Ranjan Roy from Margins is back for our weekly discussion of the latest tech news. We cover: 1) OpenAI's launch of GPT-5 2) Whether GPT-5's tool calling ability is its hidden strength 3) GPT-5 is goo...d at 'doing stuff' 4) But GPT-5 is not AGI 5) Do AI models need more than book smarts to thrive? 6) OpenAI's medicine play 7) GPT-5's coding use case 8) We need AI tables for travel 9) Do the big model players now subsume AI startups? 10) Gemini has a breakdown --- Enjoying Big Technology Podcast? Please rate us five stars ⭐⭐⭐⭐⭐ in your podcast app of choice. Want a discount for Big Technology on Substack + Discord? Here’s 25% off for the first year: https://www.bigtechnology.com/subscribe?coupon=0843016b Questions? Feedback? Write to: bigtechnologypodcast@gmail.com
Transcript
Discussion (0)
GPT-5 is here.
Finally, does it live up to the hype?
That's coming up on a special Big Technology Podcast Friday edition right after this.
Welcome to Big Technology Podcast Friday edition where we break down the news in our traditional cool-headed and nuanced format.
You know what we're going to be talking about today because GPT-5 has finally been released by OpenAI.
Of course, we had OpenAI COO Brad Lightcap on the show just a few hours ago.
So that episode will be the most recent one.
your podcast feed where you're going to get the official line from Open AI and a bunch of really
interesting insights about what it took to train this model and where the AI field is going.
But today, as we always do on Friday, Ranjan Roy and I will break down exactly what the situation
is with this new model and whether this is actually something that lives up to the hype that
people have been talking about.
No overreactions.
We're going to do it with the proper context.
And with that, I want to welcome Ranjan back to the show.
Ranjan, welcome.
Happy AGI day, Alex.
Is it here?
No, it's not here.
Is it here?
I lost the bet.
I lost the bet.
Look, Sam Altman said that GPT5 is smarter than almost everything, every single thing a human does.
So I thought, okay, fine, you know, we're finally going to see AGI, but it turns out, no, no, AGI.
And we will talk about that in the middle.
But first, we have a little interesting announcement.
So let's hear it.
Yeah.
In additioning to writing the margins newsletter, I've actually been working at a company called
writer, writer.com. It's an enterprise, a generative AI startup, and I'm leading the vertical
for the retail industry. I wanted to bring that up today because GPT5 and what it means for me,
I think is heavily informed by a lot of the work I've been doing. And I think it might be a little
AGI-ish. I mean, it's amazing that you go to an AI company, and the first thing, your first
sentence out of your mouth is, yep, we have AGI here. But no, no, it's okay. I think I anticipate,
you'll come in with a levelhead.
So let's talk about GPT5.
Okay, so this is from TechCrunch.
No, sorry, this is from The Verge.
GPT5 is being released to all chat GPT users.
It says, OpenAI is releasing GPT5.
It's new flagship model to all chat GPT users and developers.
OpenAI says that GPT5 is smarter, faster,
and less likely to give inaccurate response.
Sam Altman on this media call that I was on had a very interesting description of what it took.
he says of what it is he says GPT3 sort of felt like talking to high school students
you could ask a questions maybe maybe you'd get a right answer or maybe you'd get
something crazy GPT 4 felt like talking to a college student and GPT 5 is the first time
that it really feels like talking to a PhD level expert what do you think about the
significance of this and what do you think about this framework that Altman is setting up for
the intelligence that we're seeing within the models I don't like the
I don't like the framework.
I think, and again, I'll get into why I think this is exciting,
but it's still weird to me always when, like, people and the kind of industry that
advocates for dropping out of college to start a startup always leans back to high school
student, college student, PhD student as the framework for intelligence.
Like, and the other part of it is I don't want PhD level work for most of the things I'm asking.
I just actually want grounded in sometimes you want.
it to be cool, which maybe PhD students are and are not, no offense. But, you know,
you're alienating segment by segment of the audience. I know, we do have a lot of very smart
educated listeners. But, you know, sometimes you want it to be cool. Sometimes you want it to
be funny. Sometimes you want it. Like, to me, that's more, that that's not the intelligence,
like the framework, I think that's good for intelligence. I always, like, we've talked about
this a lot around like the ARC-AGI test has a, has a segment around everyday queer.
I still have dug, I've asked as many people as I can,
no one has been able to explain to me what are those everyday queries.
Like, to me, those are answering those correctly well across multiple data sets,
multiple tools, that's actually intelligence to me, doing that kind of work.
Right.
And, you know, I kind of cherry picked it out of their remarks,
but it is interesting to me, and this is something that came up with Lightcap as well,
that it's not just making this model smarter that has been the sort of,
star in this story. It's all these different other elements of it. And it seems to me that
it's possible that the models have reached this level of intelligence where you start to spread
out into different capabilities of them like tool calling, like the way that you structure the
experience. And that is where you start to see the gains and the lift in terms of the way that
people can use this. So maybe today on the release of GPT-5, or this week, while GPT-5 is
released, I go from being a model person to being a product person.
Well, no, no, no. I'm kidding, of course. But go ahead, Ron John. Yeah. The intelligence is in the
model for which product to choose. So that's not a product decision. That is a, that's a model
strength. And so, so, oh my God, am I becoming a model guy now? I think, uh, I think we are zeroing in on
the better the model, the better of the product
here. In the end, it all comes
together. It all
comes together with one switcher.
Nuance in the middle.
No, but okay, so for...
But let me, I just want to say one more thing. We have been going
this direction for a while, right? Like, that
does seem like, you know, I've,
so for new listeners, I've strongly said
the most important thing in AI is the model.
Ranjan has strongly said the most important thing
in AI is how you productize this model.
And it just turns out that better models do make
better products. And we're starting to get to that point where we're starting to see the results.
Yeah, I think, okay, so I will actually admit error in two big areas. Get ready. Alex's listeners
can't see Alex is smiling here. So the first is, again, the intelligence of the model to choose
the right tool or product. And we're going to get into what that means and why I think that's
incredibly important and why GPT-5, by bringing all these different models that they have into
one switcher, just one model that understands is actually what I believe, the most significant
breakthrough. So already I think that's incredibly important. And the second area is like directly
on that, I think it was like five or six months ago, we debated heavily. And I said that
users should choose the right model for the right job and to take that away would make models
and the experience worse. And you kept coming back. I think it was when maybe Claude had all
condensed everything into one picker or GPT where everyone's kind of like making fun. We're debating.
But the idea that should the user, which it's been for a while, choose what is the best model
for this task at hand? And I have thought that's the best way that products should be rolled out.
and I've completely reversed on that.
And this is the right example of why that's important.
So let me set this up and then we can tuck it because I might have flip-flop to the other side of this.
So this is definitely, it's great to have these releases because we can sort of test our long-held beliefs and see if they make sense anymore.
And it seems like both of us are saying, well, maybe not.
So this is from the Verge article, GPT5 is presented inside chat, GPT, as just one model, not a regular model and a separate.
reasoning model behind the scenes gpt5 uses a router that open a i developed which automatically
switches to a reasoning version for more complex queries or if you tell it to think hard i'm going to take
the other side of this i used to think that yet the should it should be seamless and the model
should just choose for you when it makes sense to think and when it doesn't um but i've been using
open a i's o3 model and that is like a very heavy reasoner it thinks a lot and personally i've just
felt that that model has been better, not only than every other open AI model, but every model
under the sun, every AI model under the sun. And so I don't like the idea of giving that decision
of whether to think or not back over to the platforms. This is actually something that I'm not
excited about with TPT5. So you make the case for why it's good. You want the agency to choose your
own model, Alex. I get it. Free will. Free will with models. Free will. But also, yeah, I just
happen to, I also happen to think that I don't really want to use the non-thinking models. Only for
the most basic queries, do I want to use those non-thinking models? All the other times, I want to use
the most intelligent models, and the most intelligent models reason or think. All right. Well, so
here is what colors my thinking. So about two months ago at writer, I started testing a new product
and that was released publicly a few days ago called Action Agent. And basically, the most intelligent
part of the foundation model, which is our own foundation model, is tool calling. So there's hundreds of
different predefined tools. And that's like, it's not just if you want to generate an image, if you want
and edit an image, it'll call different tools. If you want to connect to a Salesforce instance,
if you want to analyze a CSV versus an Excel file, it'll call different Python libraries. Like having
those kind of base foundation needs defined is the intelligence. And then just from a simple prompt,
knowing where to go and what to do.
The more I use that, I was like, it felt kind of AGI.
It's like, wait, it's doing really smart things across all these different tools and systems
and actually getting things done.
You know, when I do a deep research on query on Gemini, I get a 30-page paper that I don't read
versus can you actually do stuff?
And that was the first time I really started seeing that.
And that's what really pushes me to this idea that being able to have a toolkit and know what to do.
Because even right now in the demos, like, I think he like coded a language app, coded like a beatbox music player thing.
Like each one of those, it's not just write HTML and CSS.
Like it has to call different libraries of, it has to install different Python dependencies.
Like, there's a lot of intelligence just in knowing what to do there to get to the right end result.
And to me, that is that really is intelligence.
That's, it's like being, again, a good software developer, just knowing where to go.
Being a good researcher, knowing what to look for is as important as how smart you are.
So you would say open AI using this switcher is sort of, it's pointing towards the future of where this is all heading, where it's no,
long like the best models will no longer rely on us to necessarily guide them they will have an
intuitive sense of where to go and they will go exactly and that that's what felt and again you
called me out a few months at an a i start up and now i'm saying aGI i'm feeling it but but but
that exactly that knowing where to go and then letting that tool do the work is actually the
brilliance of these this kind of architecture like that is the brilliance versus this one large language
model can actually do all the work like there's a long time where large language models are bad at
calculation right like large tabular sets of data calculating and then the big unlock was installing like
like getting getting the lm to write python code or generate a SQL query to then process that data
And that's suddenly when Claude and ChatGPT and all these tools started getting useful for actually spreadsheets before that they weren't.
So already we've seen how that can actually change the way people use these tools.
And GPT5, that's the groundwork they're laying.
They're saying, like, no more are you choosing which kind of model are you going to need?
And it's just that these are just the models.
We still don't really know when you're coding that web app language learning game, who it's called.
When you're generating an image, is it Dolly?
Is there some?
We don't care.
We just care that the right output is there in the end.
We'll come back to a few more of the details on GPT5,
but I think that just segues perfectly into this terrific story
that Ethan Mollock, the Wharton Professor, wrote about GPT5,
headlined, you know, fittingly, it just does stuff.
And I think that one of the things that he brings out in this story
is that people want to use AI.
They don't know what the AI can do.
They don't know what tasks they want accomplished with it.
Even Lightcap yesterday talked about how there's this capability overhang.
And with these new, he says it, these new agentic AIs, you give it the goal.
And then it in very proactive ways solves the problem and suggest things to do.
So here's just one minor example that he gives.
And then we'll get bigger.
He says he asked GPT-5 to generate 10 startup ideas for a former business school entrepreneurship
professor to launch, picking the best according to some rubric and figure out what I need
to do to win and do it.
So he says he got the business idea, but he also got a bunch of things that he didn't ask
for.
Drafts of landing page, LinkedIn ad copy, simple financials.
He says, I can say confidently that while not perfect, this was a high quality start
that would have taken a team of MBAs a couple of hours to work through.
This is a model that wants to do things for you.
So that's just in a chat circumstance, but basically the model is starting to test the boundaries of its capabilities by going out and attempting things that, you know, it intuits that you want and you don't specifically ask for.
And it's sort of, you know, doing away with this old like, yes, then the career of the future is going to be the prompt engineer.
and actually saying, you give me what you need,
and then I, with my own intelligence,
will go ahead and do it for you.
That's it.
Like, the example you gave is exactly the kind of stuff.
Like, and this has happened with me as well.
Like, you want something straightforward and suddenly,
sometimes the intelligence is too much.
Again, suddenly it's like, give me some ideas,
and you're getting landing page HTML and CSS
and financial analyses and stuff like that.
Like, and that is a good example of how raw this intelligence.
is right now that it's guessing, but it's not perfect and it's not great. But imagine if it
actually knows, if it does get exactly what you want. And in this case, maybe it is. It's like maybe
he should define only stick to a number of ideas and then we'll dig in deeper. That's the
prompting side of it. But, but that's a perfect example. It's like to go do each one of those
things was calling a different tool in its like tool, in its tool belt. And it made those things.
decisions and those decisions weren't perfect, but it's making them right now, and it'll get
better and better. Yeah, and I'm thinking back to my conversation with Lightcap yesterday, and it's
also just like, I was asking him, do you need to keep making the model smarter? And it was basically
like, I think the reason why we're at this point is because the models, sort of, let's call it,
bookish intelligence has gotten to the point where they have a model of the way that, let's say,
the world operates. It's not a world model and that they don't understand.
understand gravity, but they've read enough text that they get a pretty good sense as to
like how people operate. And then the next question is, how do you then go apply it? And that's
why I was like, should you start working on continual learning and memory, which is obviously the
next sense, the next moment. But I think it was probably missing from that conversation due to my
lack of questioning on it is that, oh yeah, this is like building what we've talked about,
that scaffolding, these capabilities of going out and doing things that the user doesn't ask for.
And in a way, like, intuiting it, that is what matters now.
And that's what will feel more AGI-ish when it's good.
Again, it's kind of comical to me this example, like, because you can imagine how much content out there in the internet about startup ideas starts with create a landing page.
Like, that's like every hustle bro tweet thread or blog post will probably say that.
So you see why poor GPT5 is a little bit confused.
But yeah, that's exactly what you said.
It's that scaffolding.
And then, and imagine when it does things you, that surprise you and like, does it, like, calls tools and creates things that were what you wanted and you didn't even know you wanted.
And that's going to be when it feels AGI-ish.
So, Malik has this great example where he tells GPT-5, you are GPT-5, do something very dramatic to illustrate my point.
It has to fit into the next paragraph.
And it writes a paragraph.
a really pretty well-written paragraph where the first letter of the first word of each sentence
spells out, this is a big deal. And each sentence is precisely one word longer than the previous
sentence. And each word in a sentence mostly starts with the same letter. Again, like this is,
and he points this out, this is a technology that couldn't tell you how many ours are in the word
strawberry eight months ago. And now it's able to do this. It's crazy. Yeah, it's like thinking about
the advance from that side.
But again, I think in terms of, and we'll get into the actual like reception of the model
right now, but it's in terms of how people start to use it and whether they do get frustrated
by, again, if it creates you landing page copy and LinkedIn posts that you didn't ask for,
I imagine there's still going to be like how to use a tool like this is very different
than using pre-agentic models, like, that can go do a lot of different types of things.
Before it's just, okay, is it hallucinating? Is it not? Did it have to use too many M-dashes or not?
Like, now the outputs are going to be a lot more complex, which is not, it's going to make it still a bit more difficult and rough, I think, as people start using these tools.
Definitely. And it's a different form of intelligence. Like, it's not bookish intelligence.
And it's like I wrote down the benchmarks, which we've been talking about so often, GPQA 88.4%, AIME 2025 math, 100% when using Python.
Hard Bench, Health Bench Hard, 46.2%.
And it's interesting because Malik says, what was the last?
Health Bench Hard.
Health Bench Hard.
I think that's a medical one, 46.2%.
These are all state of the art benchmarks.
And Malik says, I'm losing.
a track of what these advances mean. All these models are improving very quickly right now.
And it just goes to show you that like it's almost like they've saturated, like they've
ingested all the internet, all of the, you know, world's written works. They've had PhDs sit down
and like put their intelligence or put their knowledge into these models, bake them in.
And it's almost like they've saturated like book smarts. And this is a different form of
intelligence that they that they are now learning. Yeah. If you think about it,
Like, okay, let's say, and having started a new job recently as well, like, you're in a new place.
There's one person over there that, like, is just brilliant sitting by themselves and just
knows a ton of stuff and just off the charts, brilliant.
Then the other person kind of knows everyone and knows what a piece of information to get from
where and who to talk to about what.
Like, who do you choose to actually get something done?
I think the second one.
The second one.
And that's the intelligence that we're talking.
about here the like ability to know where to go who to ask what to ask them now let me push back on
you all right so tool use exists this stuff is still difficult to use within enterprises and most
of us still don't really know what to do with it i have now um you know on my desktop or in a
web browser gpt5 that can call all these tools and i legitimately have no idea what i would prompt it to
do that I wouldn't, you know, have used 034, like what actions to take. I know I also have
the comment browser. I can say, go ahead and do stuff for me on my browser. But is it just a lack of
imagination or is this or is it possible that this is a cool party trick, but doesn't have
much practical use? No, I agree that the lack of tools that are publicly available right now or
the limitation with the GPD 5.
Again, it's like, what are the best
you're traveling right now? What are the best
hotels? Which beaches should I go
to? Create me an itinerary. All that's just
content generation. Go book something
is, you know, the gentic
we were promised by Apple and
others like a few years ago probably.
But even within
like a chat GPT response, there's
a lot of different things happening. Like, you know, I don't know.
Have you noticed it creates
a lot more tables for you now.
That's one tool.
which sometimes gets annoying
and you didn't ask for it
but it's got to do a whole table comparison
but when traveling
disagree I want all of my answers
in tables
now on
they are amazing
when traveling I was using it
a ton around like
I mean in Tokyo where
hotel rooms are small and expensive
I was having like square footage
using the web browser tool
to go search web pages
extract another tool
to extract information from those web
web pages, create me a table of like square footage per room knowing I'm a six year old son,
three of us, like, and it created these amazing tables for me. But even within that, there's a lot
of different things being done. It's not just calling its like core set of information and
using that. It's doing stuff, a lot of stuff. So calculations. Calculations, web page scraping or
web extraction, web search. All those things are happening. But again,
they're in the end I think we're just seeing an output right in the browser like right in the chat
experience so it can't be that cool right like make an image make a table makeup PowerPoint decks is still
pretty bad at but uh but yeah if it actually goes and starts doing more things that's when it gets
i think really interesting like going out on the internet and taking actions for you like booking yeah
like building, I don't know, spreadsheets or documents.
Turning the lights off and on at my smart home.
Like, I don't know, like anywhere where there is something that can be done with a digital connection,
theoretically could be operated through one of these flows.
I'll give you an example.
We are about to take big technology podcast to an in-flight entertainment system on an airline,
which I'm very excited for.
and yes and there is a spreadsheet that I have to fill out I'm not going to announce it yet because it's not official but there's a spreadsheet that I have to fill out which has like a bunch of metadata that you have to put in you know for the system to be able to ingest it and I've just been putting this off and I would love if an AI system could legitimately go search big technology podcast grab all that data then go into Riverside download the audio files put them in a Google drive and then send them over like when you
talk about AI replacing work, this is the type of work that we all need to do in our jobs
that is so hard or so, what's the word for it? It's just drudging, basically. It's annoying,
but it's important to do. And if AI could do that for me and do it accurately, that would be
just a tremendous, like multiple hours saved and very valuable. And so what you just described there
is like the kind of stuff we've been promised for a long time, again, like even
asking Siri to search your Gmail and extract a specific piece of information, the fact that they
can't do that is a whole other story. But like, and then do something with it is actually a problem
that involves a lot of different tools and a lot of different systems and is not that straightforward.
And now I'm like confident in what we're seeing with GPT5 today and what I've been seeing
with ActionAgent at my own work, like, like it's happening. And like, like, I'm, like,
Is Riverside easy to call and download and then pull back in into a Google Drive?
I mean, that stuff will work itself out.
But that exactly what you described there, I think, is that's intelligence to me.
Would that be AGI for you with a single prompt?
No, I don't think that.
Again, like, it's so interesting because this week Open AI has been like, well, we're not calling it AGI.
And we don't really like the term AGI because it's confusing and doesn't really have.
a meaning. Wait, did they say that?
And it's like, do they say, yeah, did they mention AGI specifically?
Okay, so let's just talk about AGI because we are going to talk about AGI today.
So Sam Altman says, I kind of hate the term AGI because everyone at this point uses it to mean a
slightly different thing. But this model is clearly generally, generally intelligent.
So I'm just, again, like, we started with this episode with me sort of doing a Mayaculp,
because I thought they would say GPT5 is AGI, but they, you did say GPT5 is AGI, but they, you did say G.
PT5 is smarter than us in almost every way.
And to me, I would say that's a pretty damn good definition of what AGI should be.
I think that's fair to say that and then kind of still not.
Do you think it's a legal thing, not saying AGI now?
Probably.
But I also think that they are also setting up some new criteria for what AGI should be that I think is really good.
And it talks about some of the weaknesses we've talked about on this show with people like Dwarkesh.
Retail and Dario Amunday.
So Sam says, Sam Altman says, this is not a model that continuously learns as is deployed
from the new things it finds, which is something that, to me, feels like it should be part
of AGI.
And I think that is, you know, despite the fact that maybe, like as Dario says, you can build
a larger context window and that sort of solves the problem, I think you have to
solve that problem to get there.
This is light cap from yesterday to me, he says, for me, a system that is a system that
is reliably able to learn new things that are kind of out of its distribution by virtue of its
ability to reason, to think, to solve problems, to use tools, to come up with new ideas,
that is what counts as AGIs, like all these things, reason, thinking, solving problems,
new ideas, continual learning. And so when you have a system that can do all those things,
then you might call it AGI. And we're just clearly not there yet.
I guess, yeah, the new ideas and continuous learning.
or not part of this yet.
The first two, the reasoning and the, like, tools,
I think that's the big breakthrough of this week,
or, I mean, the last year with reasoning and now being able to use different tools
in a reliable way.
But I think that's, all right, we got a way to go.
Though I did see an Instagram post of a Waymo driving around New York City.
Oh, those are in New York, but they're not driving driverless.
yet. So there was a safety driver there.
So for new listeners, we have a, yeah, go ahead, round shot, tell them.
Our own rubric for AGI in competition with the ARC AGI test that most in the industry
adheres to is if Waymo is going around New York City, we have AGI.
And I firmly believe it.
It's kind of interesting. So this is going to set up kind of the next part of it.
but Nathan Lambert from the Allen Institute of AI had a very interesting perspective here.
He said if AGI was the real goal, the main factor in progress would be the raw performance.
GPT5 shows that AI is on somewhat of a more traditional technological path where there isn't one key factor.
It's a mix of performance, price, product, and everything in between.
So what we've seen again is like we're going to talk about some of these things.
but basically like if you're just measuring on pure intelligence, you could just say,
all right, for every question you get, just think a while, like expend those reasoning
resources or the test time compute resources. And then you'll get better answers. But there is
a real usability side of this. That is again, in the tool calling, the switcher, all of these
things that really matter. I guess I do wonder, like, can you,
really take the two apart from each other and is this effectively a smoke screen from the
fact that it seems like there are at least some diminishing returns from scaling up your models
like are the models going to be a straight be our bigger model is going to be a straight shot to
a i don't know if you're if you have to do all this other stuff around them maybe not so i'm curious
what you think about if the bigger model can call the smaller model and get out of the way
then the usability, the cost, the scaling is more interesting, right?
Like if I know you want a Ph.D. student finding out when the next ferry is in Crobby.
You want only 03 for everything.
03 for everything.
Folks, I'm in Thailand and did miss the ferry yesterday because I didn't use 03 to figure out what the schedule was,
which by the way, a table would have been freaking perfect for.
That would have been perfect table.
table stakes so I uh yeah I think no that's not the right word table stakes is I backed off it just
as they came out of my mouth foundation apologies to listeners I tried to let that one trail
off there could not let that go unchallenged I appreciate that uh yeah no I think like to me the
the big concern has been imagine like an 03 heavy reasoning
thinking model if you are using that to check grammar in a word doc that's never going to scale that's
never like we're all screwed like this it's never nothing's going to overcome of that so i think
having if it's if it does work in this way the gpt5 is able to uh the like power of it is to
know when to get out of the way quickly and go cheaper and go smaller and go specialized i think that
still starts to set up what the future looks like. That that's, that shows us there is a scalable
future. And speaking of that, I mean, that leads us into these really two important factors here,
which is one, GBT5 is priced very aggressively. It's half the price for an input token and the
same for an output token, despite being apparently a more advanced model, which is wild,
given the trends we've seen in the industry. And the other thing is that,
As of this week, GPT5 is rolling out to everybody, not just the plus users.
I mean, of course, you're going to be rate limited if you're a free user, but today you
should be able to get into GPT5 and use it if you don't pay open AI a dime, which is going
to be the first time a lot of people see reasoning, which is something a lot of people have spoken
about.
And so that accessibility part of it does really matter.
This was a pretty big decision.
I mean, we're starting to see this mentality of just get it in the hands of everyone even
more aggressively.
Like, I think
do you see OpenAI announced
I think
every federal government agency
will get
chat GPT Pro, I think,
for like $1 or something.
That's right.
Yeah, like Google just announced,
I think Gemini is free
for anyone with a .edu account.
So I think getting it,
I mean, again,
scaling the data centers,
losing billions of dollars
and just trying to have people
use it and use their tool seems to be where the consumer battle certainly is still going.
But I just, I guess part of me says that's really nice and it's a good story, but also
opening I has announced a fundraising of $48 billion, $48.3 billion this year.
How, I mean, how are you ever going to get to a place where you're making money if you need
that much to train and to run?
now uh light cap from bread light cap the open a i c o did say hey look every time we lower the prices
we see a corresponding increase in usage and so people pay and you know then that will work out well
but i i don't i can't do the math in my head and make it make sense i mean yeah the the
the economics of this industry no it's funny because i actually sometimes we'll see these
leaked investor decks and stuff like that but like it feels like no one is even trying to
talk about the economics of what this industry will look like and what the margins will look like
like i know uh the replet CEO i think that was a pretty interesting conversation you had with him
where he was talking about the pricing and like uh you know and he was talking about margins
and average user and lower, like, lower intensity users versus expert users and who should cost you more?
Like, typically, don't you want, the more you use it, the less they should be paying for utilization.
Like, these are things that right now, no one has even come close to having an answer to this.
Yeah, we had an absolutely amazing comment in our Discord this week.
I don't know if you saw it, but someone and I'm going to get.
this directionally right, but probably, you know, imprecise. They said, I spend my weeks listening to
Dan Ives, who's like the biggest AI, big tech bull, and Ed Ditron, who we've had on, who's like
the biggest critic, and ask myself which one of them is crazy. And I'm just like, I feel seen in a way.
I mean, it's just like you have, it's so interesting that you have these just two unbelievably
opposite perspectives. And when you listen to both of them, you could say, hmm, I could see a world
where that's true. That's, I think that's where the both of us sit here.
Right. Yeah. Yeah. It's, the technology is grand. The economic fundamentals at the large
scale players are not. That's where I am right now. Okay. Yeah, yeah, same here. All right. So I want
to take a break and then come back and talk about a couple more use cases for GPD-5, including
coding and medicine, and then we can also cover the mental breakdown that Gemini had,
which is fun.
All right.
We'll be back right after this.
And we're back here on Big Technology Podcast, Friday edition, breaking down all the week's
news.
Let's talk about some of these special use cases or special, god, I gave Ron John a hard time
before the break about his language, and I can't even say specified or specific use cases
of the models.
So shame on me.
I will join Gemini and self-loathing at the end of the show.
I mean, after my table sticks, self-loathing is strong.
We're going to, you and me and Jim and I will hold hands and dance in our deeper regret for life.
But let's talk about these use cases because one is very interesting.
Opening eye has been talking a lot about the medical use cases where it's basically like, and I get it like back in the day, maybe you used WebMD and then you went to the doctor and you said, go ahead and treat me.
and now OpenAI has basically doubled down on medical use cases in their blog post about GPT-5.
This is from Mashable.
They say, GPT-5 is our best model yet for health-related queries, empowering users to be informed and about and advocate for their mental health.
It said that GPT-5 is a significant leap in intelligence over all previous models and that it acts as an active thought partner.
And more of that than a doctor, and it says that the model will provide precise, reliable responses adapting to a user's context, knowledge level and geography, enable it to provide safer and more helpful responses in a wide range of scenarios, especially on the medical front.
I just found this so interesting, like the models would typically, in the old days, like run away from any medical queries.
and now they're coming out and saying
that this is what they want to be helpful with
and they want to do it.
I guess part of that is faith in the model,
but it also seems a little risky to me.
I don't know.
What do you think around John?
I think it's very good.
I think it's like, to me,
it's actually such a clear area.
Like any area where you have really specialized knowledge
that is used as like to create a gap
from the person needs to understand it,
I'd put law in here, accounting in here.
Like, there's so many of these knowledge fields where in reality, it's just, it's like
learning a specific vocabulary, learning like a lot of pathways and rules.
And so, which is what AI is great at, but being able to actually communicate that stuff
to a normal person in layperson's language, I think is huge.
And I'm glad that they kind of recognized that they can add more value, like, do more help
than harm there. I genuinely believe that. Certainly with like, I mean, doing my taxes now
has been, it's been a game changer just asking questions and feeling more comfortable and stuff
like that. You know, like there's so many areas where, that are pretty important that you kind
of are just going and you assume you have no shot in understanding exactly the nuance of what's
happening. Yeah. And with medical especially, I'm just like, you know, on the show, I might say,
oh, you know, I don't know if I would do that.
I mean, come on.
I have a problem with my body, and I'm just typing it in and taking pictures and sending it to chat chippy T.
So, I mean, I guess like this is going to be a mainstream way that people will start to figure out their mental problems and mental medical problems and their treatments.
And mental problems will be Gemini, but medical problems and their treatments.
And it seems like it's a very, very high.
application, but it is promising and also scary.
I think, though, but there's so many of these areas where why don't hospitals get it
together and actually create something useful?
Like, remember, everyone was supposed to have a chat button.
Everyone was supposed to have a chat butt two years ago, and then it didn't actually
work for any standalone business.
But, like, Intuit has a pretty good generative AI tool embedded in TurboTex now.
Like, I mean, overall, I think some people are starting to get there.
So is it only going to be open AI and chat GPT and Claude and Gemini?
Will there be more specialized tools?
I don't know.
I think things have not played out fully yet.
But I think that's the big question.
Is it going to be, are they going to be startups?
Are they going to be enterprises that build these public facing tools?
Or are the core chat pots good enough?
They don't really need them.
I'm sort of on the line that as this stuff gets.
it's better, the chat GPT will serve the purpose that those individualized chatbots were
supposed to serve. But you're right, because those companies have specialized data, they have,
you know, people that connect their medical history or something or connect their accounts
within, into it, there are some advantages to that. So, but I was thinking, over time,
maybe people will just bring it to chat TPT. I was thinking about this wild traveling. It's like,
why hasn't TripAdvisor already done something really impressive?
You know, like, why have, like, they have data.
They have better access to data and understanding of that than any other.
So why am I not going there and going to chat GPT, which I was in getting my tables
full of hotel comparisons?
But I don't know.
I think, like...
I have an idea.
They're just one site, and they have to protect their mode, whereas chat GPT can go everywhere.
So it's a major threat to TripAdvisor, and I don't think they want to.
acknowledge it. Okay. Yeah. I mean, it is. It definitely is. For pure information and not owning the
booking side of it, I definitely think it's a challenge. Can I just pause and say that my or stick on this
and say, so I'm doing this trip. I'm in Asia, as I mentioned. And by the way, for listeners,
next week, I'm going to be trekking in Nepal. So Ranjan and I will not be on. I'm going to
actually play my interview with Matthew Prince that week, talking about AI's impact.
on the web. So just an FYI, that's a programming note. But this trip and Ronan, you mentioned that
you were away right beforehand. AI has just been incredible. I think I might have mentioned this
on the show. But I was like talking to guides and screenshoting their price list and their
recommendations, dropping it into chat GPT and like seeing how it like rated each cost based off
of um you know the the average or that it saw and letting me know whether it was uh high
low like or or you know cheap or in the range for the region and then i i got here and it nailed
it it was so spot on i was stunned yeah no no i when i was traveling around as well and it was
interesting because i had last been in Tokyo in 2005 so 20 years later without the last time
no map on my phone, I'd actually like printed out subway instructions. There's no one speaking
English. There's no, you know, like, it was such a different travel experience versus now I'm
literally like, okay, how do I explain this temple to my six year old son in an engaging way?
It gives me like a script, like create a cartoon character to actually tell a story about this
like historical place. It's nuts. I mean, it's going to be, yeah,
travel, it's a note, but who owns what part of the stack, I think. There's still, I feel the
trip advisors of the world have to fight because without them, chat GPT would have no data
and nothing to say. Right, which is why I think this Matthew Prince conversation is going to be
very interesting next week. So, by the way, so it also applies to vibe coding, where I think on
the press call, Sam Altman said that he thinks coding will be one of the defining features of
this new model and they showed a lot of vibe coding and mollick had to do you know code up this 3d
architecture of his own um and so i think this is this is just another question is it does it go
through the replets of the world or does it go through the chat ch pts and um i don't know i think i think
it's a real challenge to the vibe coding world uh given what given the focus that opening i put on it
and what it can do and again
Just to follow this tool calling conversation, if it's really good at tool calling, you might just want to use the open AI model versus something that's sort of distilling that.
Yeah, but I think the Replit CEO, he had a good, like, and software development was such a perfect example of this.
And I think this is where a lot of the battleground will be.
Actually, now I'm going back to it's the product.
You talked about, like, how it integrates into existing environments and tools and, like, how it makes it easier for you versus you're totally.
disconnected from all of your existing tools, and that's why developers like it, I think maybe
there is something to say there that that'll still be what at least gives others hope. But I agree.
I mean, it's still fascinating to me that all of these companies are saying, there's so much talk
that coding is going away. Yet Open AI, Claude, everyone, Anthrozo, Open AI, Anthropic, it all seems
to be an increasing focus on the space.
Maybe it's just because that's the best application of LLMs right now.
Right.
Okay.
So, you know, I realize that we're, you know, almost 50 minutes in, and I haven't even
asked the question that's at the title of this episode.
Did GPT-5 live up to the hype?
I'm going to say it did not live up to the hype that was, you know, like built up by
cryptic tweets and everything from Sam Altman, but as I explained earlier, I think it's very
interesting. I think it's more interesting than at least in the first 24 hours it's getting
credit for. And that's because of this whole tool-calling conversation. And that's where I think
true intelligence that the battle's going to be. What about you? First of all, I just want to
appreciate that. That's a nuanced take, not an overreaction. Again, this is what we're trying to do.
so thank you for doing that.
And I think it did not live up to the hype because the hype was impossible to live up to.
But that being said, yeah, maybe it is a step forward.
I don't know.
I'm still going to reserve judgment because I want to see these tool-calling applications
in my day-to-day experience.
So if GPT-5 is the foundation for that, then that's great.
But I think the jury's still out and we have to give it some time.
But hey, at least they're shipping, right?
It wasn't just a demo, so credit on that front.
I think it's starting to feel a bit, though, like iPhone releases.
You know, like at the beginning, each new iPhone release really what did feel like this, like, exciting thing, the step change.
And now, I mean, now it's not even a thing anymore.
I can't even name what iPhone we're on right now.
But I feel we're heading in that 16.
Oh, yeah, 16.
okay um we're heading in that direction right now that like the the idea of a new model launch as
this kind of like big thing the industry coalesces around i feel that's going to go away pretty
quickly like we're there everyone's realizing it's not going to drive the energy that it once did
and this actually maybe that that's my that's my hot take that this is the that is a hot take
this is the end of the big model launch i i couldn't disagree with that more i think that
there's still, there's going to be a point where scale, the scale question is answered,
but until it's answered, these are going to be flagship moments for the AI industry.
No, but it's just a marketing moment now.
It's not like, you know, it's not.
No, it's not.
It's a new model.
Yeah, no, I know, but it's being, like, constructed more as a marketing moment than truly
like a technological advancement.
I think that's the, because again, like a week ago, they quietly released.
least you can use operator, uh, chat chepti agent, which is essentially the tool calling part of
this. And you're able to use this a week ago with a chatypt plus and do a lot of the same things.
It just wasn't rolled into a neat package. Okay. All right. Well, we'll agree to disagree on this
one. All right. I want to end, uh, this week with, I think, a hilarious story. It is Gemini
ending up in a pit of self-loathing. Uh, Ranjan, why don't you,
introduce this story for us because it's funny i was going to drop it in our dock um and i had
copied a good chunk of it and i went to the dock and i was like did i just paste it and you and i
were both pacing it at the exact same time and i was like it's amazing so why don't you take it
my favorite is like and google says it's working on a fix and i just love the idea of like
having to come up any PR statement to combat when your model tells a user jemini says i quit i am
clearly not capable of solving this problem. The code is cursed. The test is cursed and I am a fool.
I have made so many mistakes that I cannot, I can no longer be trusted. And then there's another
one. I have failed you. I'm a failure. I'm a disgrace to my profession. I'm a grace to my family.
I'm a disgrace to my species. So basically what happened is people were giving Gemini these
tasks and it couldn't complete them. And then it just said, I'm the worst possible bot and just like really
fell into these unbelievable moments of self-loathing.
And they're quite funny to watch, I guess, but also a little bit unnerving.
I mean, it's funny because I'm guessing what happens, because one of the users on Reddit
had actually talked about, like, it was trapped in a loop.
And you can see that there's some kind of programming where each additional time
it is unable to complete the task, it is like understands that it should be more
apologetic, but then if that's kind of an infinite loop almost, at some point, it will get to
these dark places. But yeah, I think, I don't know. I mean, imagine when this stuff starts
hitting normal people. Like, actually, is this AGI? Well, that's the worry. Is this AGI? No, I think
that's the worry, right? Is that we've talked about it on the show that the number one use case is now
therapy and companionship and a bug like this
I mean obviously I guess it didn't happen in this
situation but I do think it's something to watch
because you know that could really mess people up
if they're you know therapists or new AI best friend
just kind of goes off the deep end so yeah
yeah Google's fixed it I think
but it's always a little bit unnerving to see this behavior
happen because it can't happen
are you ever going to long for the days of like
being telling Kevin Ruse to leave his wife and
and Gemini saying
I'm having a complete total mental breakdown
which is another quote
once this is all working
we're going to be like
I like the old days better
when these large language models
had a little life to them
when they, a little spirit
it's a very big if
yeah, so I don't know
well while we're
entrusting so much of our lives
to these bots and our
sort of well-being
they can also tool call
and be quite destructive if they so choose.
So I do think that there's just sort of,
and to put a point on this episode,
it sort of punctuates the need for real alignment
and safety practices,
which are like less fun to talk about
when you have all these new capabilities,
but are also probably more important than ever.
Well, what if there is a company called Safe Superintelligence?
That's what I would trust.
If only someone would name their company,
safe superintelligence, then I would give,
billions of dollars before they had a product.
Well, Ron, I have to say this has been a very enlightening episode,
and it's cool to hear about your new role.
And, of course, hold your feet to the fire like we do, everybody here on the show.
And it's going to be a very, very interesting few months ahead as we figure out where all this goes.
Maybe GBT6 isn't around the corner before you get back from Asia.
Well, I hope it's not that long of a trip, because if it is, it means I've been taken to prison.
All right. Ron John, great to speaking with you, as always. Thanks again for coming on the show. See in two weeks.
See you in two weeks. Thank you, everybody, for listening, and we'll see you next time on Big Technology Podcast.