Limitless Podcast - The Coding Model Wars: Claude Opus 4.6 vs GPT-5.3 Codex
Episode Date: February 7, 2026Anthropic's Claude Opus 4.6 and OpenAI's Codex 5.3 have come out back to back, so we dive in and compare their shocking capabilities and implications for AI development. We compare Claude's ...orchestration skills against Codex's superior coding efficiency through live demos, revealing the potential impact on job automation in tech. Try them out, see which one you prefer, and let us know!------🌌 LIMITLESS HQ ⬇️NEWSLETTER: https://limitlessft.substack.com/FOLLOW ON X: https://x.com/LimitlessFTSPOTIFY: https://open.spotify.com/show/5oV29YUL8AzzwXkxEXlRMQAPPLE: https://podcasts.apple.com/us/podcast/limitless-podcast/id1813210890RSS FEED: https://limitlessft.substack.com/------TIMESTAMPS0:05 AI Showdown: Claude vs. Codex0:43 Live Demo of Coding Models4:13 Comparing Model Outputs4:47 Codex vs. Claude Performance6:15 Exploring the Models' Features8:58 The Future of Work with AI9:32 Building a Stock Analysis Tool11:44 Technical Demos Unveiled14:41 Self-Improving AI Models17:19 Automating Complex Tasks18:46 The Competitive Landscape20:32 Investor Perspectives on AI22:36 Major Updates from OpenAI23:43 Real-Time Quality Assurance Testing29:07 Creating a Stock Dashboard35:45 Conclusion and Future Insights------RESOURCESJosh: https://x.com/JoshKaleEjaaz: https://x.com/cryptopunk7213------Not financial or tax advice. See our investment disclosures here:https://www.bankless.com/disclosures
Transcript
Discussion (0)
48 hours ago, Anthropic dropped Claude Opus 4.6, the world's most powerful AI model.
And literally 20 minutes later, OpenAI dropped Codex 5.3, which is not only better, but also built itself.
Now, to say both of these models are powerful would literally be the understatement of the century.
By the time I'd eaten breakfast yesterday, one of the models had discovered 500 security floors,
which no one else had discovered before.
And by lunchtime, a bunch of software stocks were down hundreds of billions of dollars out of fear that these models
would replace entire teams.
And it's actually already happened.
These models can replace a team of 50 software engineers,
rebuild Pokemon from scratch,
and so much more.
And in this episode,
we're going to be doing a live demo side by side
to show you which model is the best.
Yeah, this is pretty cool.
I wanted to spend a lot of time this episode,
kind of introducing people to these models,
what they could do, how they work,
through demos that we're going to perform ourselves.
These are definitely two frontier models,
but I think more importantly,
their frontier coding models.
And when people hear that,
I think a lot of them get turned away because it seems like this complicated thing.
Like you need to be a developer in order to use them.
And we are here to tell you that is not the case.
From one non-technical person to another, I fed this model a prompt.
I fed it some assets.
And then I pressed play.
And what I got is a side-scrolling game, which was exactly what I asked for.
So on the screen now, you're seeing the one-shot prompt that I fed this model to ask to create a size-roll.
That was like Mario that we can actually play.
So it has coins.
and I don't think the gravity play works.
What you're saying is that it understands physics,
it is able to generate graphics,
and it plays like a pretty solid side scroller.
And I created this in five minutes with one prompt,
and it actually works.
What was the prompt that you used, Josh?
So I'll pause playing this game to actually show you the prompt.
It was very simple.
It was this one paragraph.
I want you to make a game.
You can use Python or C++,
whatever you find the most convenient,
a 2D platformer that closely,
resemble Super Mario. Use the attached background image and sprites found in the asset folder,
take into account that the sprites don't come with transparent background, but a pink one,
so you need to fill it to the background. And for those who are watching, you can actually
see the sprites on my screen. They were just the series of assets that there was no context given
as to what each one of them was, but the model reasoned through it, it removed the background,
and it actually generated a pretty good representation of that. Now, this was built one shot on
Codex, which is the new OpenAI Mac application that just released this week. And I wanted to
compare it to Claude. So I have another instance here on the screen with Claude. This is using Opus
4.6, the newest frontier model that they just released this week. And I want to do an exact one-to-one
comparison. So I'm going to launch the same exact prompt. We're going to have that cook on Codex,
or we're going to have that cook in Claude Code. And in the meantime, maybe we can kind of talk
about more of what these models do and how they work. Well, before we do that, actually, as you
set this game up, I ran it on Claude Opus 4.06 as well, but with a slight twist.
Okay, let's see your output. What do we have?
Okay.
I don't know if you can see my screen, but it is the exact game that you just created.
But I don't know if those characters look kind of familiar to you.
We have the hero protagonist character, which is My Beautiful Face and My Beautiful Person, EJaz.
And we have, who's this enemy over here?
That looks a lot like the bear guy.
And listen, we can double jump here, Josh.
And I think, yep, I can crush you.
But every time, I mean, this kind of jokes aside, this is insane.
This took me like around three minutes to build end to end.
I used the exact same prompt that you gave me.
And we didn't have sprites ready made of ourselves, right?
We didn't have like cartoon images of ourselves.
So I uploaded an image that we had taken, I don't know, like six months ago and said,
hey, can you make game avatars out of this?
It did it in 20 seconds.
And then I said, could you add these to the game and replace the enemy with Josh and the
protagonist with EJA, and I did it in a minute. So here we go. It's pretty amazing. And these are
really, these are just using standard desktop applications. So what you're using right here,
this was done in Claude code, right? You just went on to Claude, the Macbook, the Mac app,
you downloaded it, you put in the prompt, you shared some assets, and now it built this
amazing game in one single prompt. And we're actually going to experiment further in this episode
where we're going to create a trading room that does actual real-time stock analysis. So as I'm
curating the prompts and as we're getting ready for that second.
demo. Maybe we could walk through what makes this model so exceptional. Yeah, well, you might
actually notice the first difference on screen right now. If you notice, if you look closely,
my avatar is kind of glitching out, right? And if you compare it to your Codex game that you just
coded up, there's no glitches. It runs super smoothly. And the main takeaway here is Codex 5.3 is a
superior coding model to Anthropic. And that's a sentence I never thought I would say,
at least for the next couple of years, because Anthropic has held that prestige entire.
for so long. But since Code Red was initiated in open air around three months ago,
Sam has devoted pretty much all his resources towards building the best coding model,
and the benchmarks don't lie. It is a full 12 points on the software engineering benchmark ahead
of Claude Opus 4.6. That's a pretty significant difference. So I've actually pulled up a more general
comparison between the two models here, and it summarizes it really well. So if we look at Claude's
model, Opus 4.6, what's good about it? Well, they've 5x the context window. So it's gone up to a million
tokens or rather characters that you can put in a single prompt, which if you want to understand
how powerful this is, you can just put way more information into your initial prompt. It has much
better context and memory. So you can end up cooking up much better products overall, which is
very, very impressive and important to have. Number two, it is, I would think about this as an
orchestration model. So if you look at like specific benchmarks, it is beaten open AI at GDP
Eval. GDP Eval is a benchmark where they go out and they test a models performance at a really
complex task versus a professional human that would normally do that task. And the decision is,
would you use the AI model or would you use the human? And in this case, you would choose Claude 4.6
over humans way more than you would choose Open AI's latest model. So that's a really important thing.
And the point around Claude's latest model is that it can not only, it doesn't code as well as Codex,
but it can orchestrate a bunch of agents and overall activity better than Open AI.
Now, if you look at Codex and Open AI's new model specifically, it wins on the software engineering.
It is simply a better software engineer than Claude is, which is a massive flip around and shows that it's a testament to how much resources and fine-tuning that OpenAIs been able to achieve.
And to the note on the quality of the models here, my prompt is done in ClaudeCodecode that I use, the same one that we used in Codex.
And I'm going to run it here for the first time now.
You could see on screen.
And we'll see what it looks like.
So underneath we have our Codex version, which looks beautiful.
On top, we have our brand new version that was just made by Opus.
Now, I haven't tried this yet, so we're going to see what happens when I press space to start.
So it looks like Opus has failed to create a floor.
So I am just falling through the floor until the game ends.
Okay.
So just based on this one demo alone, this is a fairly significant difference where GPT's
Codex has created a beautiful side scroller.
It doesn't have gravity, but I could just ask it to, or it has gravity, it's a little too much.
I could ask it to lower it.
Opus doesn't even work at all.
And again, the test was just a one-shot problem.
So I'm going to get back to work, prompting it again to build this new application,
the trading application.
We'll follow up with that, but I think that's a funny kind of demo just to showcase that.
actually is kind of superior in the other in this one use case at least. Yeah, I mean, you said it
pretty clearly, which is Codex is the best coding AI model. And I have to like, I can't emphasize
that enough because Open AI for a long time was behind Anthropic and by a massive margin. And in some way,
shape or form, they have been able to catch up. Now, what's interesting here is both companies
have focused on each other's goals. So when Anthropic was typically meant to be
be the leading frontier model in coding. It now has decided to focus on what OpenAI was really
good at, which is overall augustration and being a better generalized model, right?
Open AI. Yeah, exactly. Open AI has decided to eat Anthropics lunch and say, okay, we've got
the generalized stuff sorted out. Let's try and figure out the coding-specific niche, highly defined,
professionalized functions. And it's produced the best coding model. So it's kind of a weird win-win
for both labs.
And what's awesome about this is
they both now have really well-rounded
but also very specialized models.
And the reason why this is important is,
and this is like kind of maybe my hot take,
I don't think the coding models matter, Josh.
I actually don't think the generalized models matter either.
I think they're both going off to something much bigger,
which is creating the operating system for the future of work.
They know that AI models and AI agents
are going to automate a ton of different industry.
industries, and the industries are only going to pick you if you can do both generalized work
and hyper-specific work really well. That is coding and augustration and managing your data.
And now we have two amazing models dropped within 20 minutes of each other that does exactly
that to the highest performance metric that we've ever seen before.
They're pretty exceptional. So now for this next demo, I have it queued up here. What we're going
to do is what I did is ask the model itself to build me a prompt for this. So I wanted it to
create me an AI stock portfolio world.
room and I asked, hey, I want to create this, create me a fully flushed out prompt that
kind of should solve this problem with one shot. So what I do is I loaded it up here in our
Claude code app, and then I also loaded it up into the Code app. I created its own project folder,
and now I'm going to hit Send. So both of these things are thinking in real time, we will check
back in once their outputs are done, and we'll compare again the second version, which is more of a
robust one. I mean, you'll see on the Cloud screen, it has this whole list of to dos that it wants
to do. It has an entire plan. There's nine different panels that it's going to build. It's going to do
risk analysis matrix and portfolio action bars and all this stuff. So we'll let that cook. And let's get
back to what separates these, what people have been freaking out about on the internet more as these
things get going. Could I take three minutes, show you some wild demos, please? Yeah, let's see what
the internet's been demoing while we wait for hours to cook. Okay, cool. Like, listen, our 2D Mario-inspired
game was cool. But imagine if I told you you could recreate the entire Pokemon game, including
levels, cities, characters, and creatures that you fight from scratch in about an hour and 30 minutes.
That's what we're looking at right now. Wow. It even has the fighting. Yeah, yeah, yeah. And buttons and the
multimodal gameplay. And obviously, this looks like it's been made by a child image-wise, but it's probably
going to take you what? Another couple of hours to make a really high fidelity game that you could,
you could probably run a new Nintendo switch or whatever. It is just so impressive that we can do these
things. Anyone can do these things with no previous background, just upload a few images or generate a
few images, and you can create childhood nostalgic games that are worth billions of dollars, which is
just super cool to see. Yeah, one of the cool things that I think it's really important to note is how
approachable this is. Like for the recent example that we're having run right now on my screen,
all I did was tell it what I wanted and ask it to develop the prompt with me. So even if it feels
overwhelming like you don't really know how to code, you don't know how to prompt things,
you can actually just ask the model to help you generate the prompt, help explain to you how it works.
And it's a really easy way to build basically anything you can imagine.
It's not just games. It's productivity tools. It's CRM tracking. It's whatever you want it to be.
So I think that's really interesting. But it also goes much more technical, right?
I saw another crazy example with the compiler.
Okay. So for the tech nerds out there, that's spent a lot of time coding.
You are going to be wowed by this. For one of their flagship demos for,
Opus 4.6, the Anthropic team decided to task the model with building a C compiler,
which is an incredibly complicated execution tool that is required to code up some of the most
craziest types of apps. And they just walked away. And they just kind of like looked at it,
monitored it, made sure that it wasn't going awry. And in two weeks, let me emphasize that.
Two whole weeks, 14 days, it coded nonstop and built this compiler. Now, you might
think two weeks is quite a long time. I want my thing done an hour and a half. Well, let me harken back
to history where previously, if you wanted to create something like this, in today's world,
it would take a team of around 50 or so humans, and it would take them a few months to build from
scratch. That's today. But back in the day, it would technically have taken them around a decade to build
and like thousands of people. So we have just kind of condensed the timeline to create really complicated
tools in a matter of hours or weeks in this case.
Now, the second thing I want to point out is the fact that these models can go untouched
for two weeks is just insane.
There was another stat that was released today by Open AI with, sorry, yesterday with Open
AI is 5.2, I think, 5.2 high, I believe, where it can go pretty much 50% hit rate for
6.6 hours at time horizon.
So that means if you gave it any kind of complicated coding task, 50% percent.
50% of the time, in 6.6 hours, it would get that done, completely done, and it would nail it,
50% of the time, which is just such an impressive track record. When you look back a year,
and that time was, what was it, like 30 minutes, maybe an hour. So every iteration, we see this
thing double. It's just so insane. Yeah, it's really, it's unbelievable and almost like
intimidating how capable and competent it is, even for someone who is a novel at writing code.
It's not about writing code. It's about being able to generate whatever you want it.
too. So like if you think of it, you kind of in a way it abstracts the code away and allows you to
just speak the English language and get what you want from speaking English in a way that you
understand and it will help walk you through the way. One of the things that I love about Claude
in particular is the plan mode where if you leave a lot of things out of your prompt, it'll actually
just continue to prompt you with additional questions to understand where you want. And one of the
most fascinating things that I read about GPT's 5.3 codex in particular is like you mentioned in the
intro, it helps build itself. And I don't think that can be overstated because this is the first
model in the history of Open AI that has helped with the building and construction of itself.
And what happens as that starts to ramp up, right? Like, if you think of each model iteration as a
flywheel, what is the constraint? The two constraints are the speed at which a developer can
actually build it and then create the test for it and make sure that it's safe to ready to deploy.
And then it's the hardware that's required to actually train the model. What we're seeing
with Codex and Opus, which I really believe was kind of sonnet, is the incremental improvements.
Now, for the incremental improvements that don't require an entirely new training run,
the real constraint is the actual software and what you could squeeze out of it. And when you
have a model that's helping you build this software that can think for six, 12, 24 hours at a time,
even longer, and that is, it kind of creates this self-fulfilling loop, right? Where the models
use the new models to make the new models, the future models stronger and more powerful
and better. And I thought that was a really interesting thing to note is that this is the first
self-propagating model where it ran a lot of the test for itself. It introduced new code that made
itself better. And as we continue to see that, you can start to imagine that vertical, that like
exponential progress line going pretty close to vertical and things getting really good, like really, really
quick. I think what most people listening to this might think is that, well, what was different
before. Well, previously, models would just kind of work in a very analog mode. You would just
point it at a problem and it would just understand what the problem was and then solve it. But it
lacked that awareness and wider context as to like what the wider vision and goal was to achieve
and then figuring out stuff for itself. You always had to kind of handhold it. But now with its
ability to kind of like understand what it's trying to do and look internally and say, huh,
I made that mistake because of this error in my code. I'm going to now like rewrite.
my code and then I'll be better at it, it kind of functions similarly to a human. Now, I actually
saw a great analogy. I forgot who wrote it, but it's fantastic, where if you imagine yourself
standing on a sidewalk, right, and a Bugatti Varon drives super fast by you at, let's say, 200
miles an hour, you'll be like, wow, that's kind of fast. And then two minutes later, another
Bugatti drives by you at 300 miles an hour, you'll be like, wow, that's kind of fast. But you
really notice the difference between that 100 mile an hour difference, right? But if you were in the car
strapped in, you would notice it is significantly improved. And that's how software engineers feel
right now. Now, if you're someone that doesn't code all the time, you're not necessarily going to
understand these impacts, but it's really important for those of you listen to this to figure out
that this is massively impactful and will change the way that a lot of things are happening today.
I mean, just take a look at this, right? This is a direct quote from someone who is building at a
major tech company, Racutern.
And the quote here says,
Claude Opus 4.6 autonomously closed
13 issues and assigned 12
issues to the right team members in a single
day managing a 50-person
organization across
six repositories.
Josh, do you know who else is
responsible for doing that?
An entire team of product managers
that each get paid a quarter
of a million dollars in compensation
minimum per year.
At least, yeah.
Their jobs are automated now.
Well, one of the earlier
moments in which I realize this was pretty profound is when Claude co-work, they said they built
it with what? Just a hint, like four people over the course of 10 days, and it was 100% built
by the current model of Claude, which was Opus 4.5 at the time. Like the amount of leverage
from these tools is so high, but it cuts both ways. It's like if you can design and develop
a product in 10 days, then that means another company can probably do that in five. And it starts
to lower the competitive threshold for these companies to catch up, and it starts to raise the bar
of what is possible. Like, if you could build something that profound in 10 days, what can you
build over the course of six months? Like, can you really build something fantastic that has a
moat that actually delivers on the total power that you have by leveraging this AI? It's going to
be interesting to see because, I mean, what we're finding, even with the Codex and Opus dual
launch is that these companies are right next to each other. And if one publishes something,
profound or something that attracts a lot of users,
they're just a few days and a few prompts away from copying it.
And that's like a pretty difficult thing to compete against on the software front.
Well, that's why if we look at the stock market over the last couple of days,
like it's down trillions of dollars.
And I'm not exaggerating.
If you look at Microsoft over the last two weeks, the stock is down 20%.
It's trading like a meme stock, which is just insane.
And the reason why that is, is a lot of investors are anticipating
that these models, specifically Opus 4.6 and Codex 5.3, will just create the tools that these
billions of dollars' worth of SaaS companies have spent or valued their entire lives on in a
couple of seconds, just as you described. Now, the counter argument to this, Josh, is,
and Jets of Juan actually kind of went live at a conference and spoke about this and made this point,
if you're an AI agent or AI model that is capable of building these tools, right?
Why would you rebuild the tool every single time you do a function?
Surely you would just access the best tool and use it.
So there's a bit more nuance where AI models aren't just going to recreate your entire
software stack if you are at a Fortune 500 company.
That kind of doesn't make any sense.
There are a bunch of tools that are hyper-optimized to do that.
But what it will do is it will connect all of these tools and silos in a much more effective way.
And maybe that requires rebuilding parts of it.
Maybe it requires kind of connecting different ways, but not rebuilding the entire tools.
And whatever operating system that ends up becoming will be the most sticky and valuable
company ever. Now, that could be Salesforce or it could be someone completely different, a startup
that we haven't even heard of. And I think that's really important to understand.
But people are experimenting. And if you look at this graph right here, which is,
may not look insane to some, but is insane to me at least, 4% of daily GitHub commits are now
clothed code. That was, I think, 5% of what it is today two months ago. So the ascent has just
been insane. These companies are adopting it and they are using it. Yeah, the number is just going to
keep going up. And there's no reason why it wouldn't. It's such a testament, one, the speed. Like,
it feels like we're strapped in that car and now we're flying. To an outsider might not look like
it. It certainly feels like that on the inside. And I think a lot of people are starting to notice this
and get a little nervous about it too.
Like, look at this example on the screen right now.
This is a prompt from GPT 5.3 Codex,
which basically created an entire Minecraft clone
in a single prompt.
And it looked awesome.
And it works really fast, and it was super lightweight.
And it says, I also tried on Opus 4.6,
but for some reason it got stuck.
But you can build anything that you want very, very quickly,
like very cheaply as well.
What Opus 5.3, or Opus 5.3.
I'm getting them all mixed up.
What GPT 5.3 Codex offered is double the rates,
the double the token rates for the next couple of months.
So you actually have the freedom for their $20 a month plan
to go and build whatever you want.
Can I maybe deliver a hot take, Josh?
Yeah, what do you got?
I think the most exciting part about these model releases
aren't the models themselves.
Largely, I think the models are kind of similar in capabilities.
They are around the same coding benchmarks,
and they can roughly do the same things.
They can spin up a bunch of agents and orchestrate themselves.
The bigger picture, which I think a lot of people missed, was both companies, Anthropic and Open AI, are at war with each other.
And they're trying to basically build and own the operating system for work, which isn't just a model.
It's a software suite.
So this week alone, Open AI didn't just release this new model.
They released the Codex app, which is a desktop Mac app, which is kind of like a command line interface, which makes the coding experience way better.
and they also launched an enterprise platform called Frontier,
which allows Fortune 500 companies
to basically take this magical model
and give it to non-coders and let them do magical things.
Now, all of these products together
creates a very sticky experience
where it starts to make sense for software engineers
and non-software engineers to use these products.
And it becomes incredibly sticky,
which results in billion-dollar contracts, right?
Anthropic has done the same thing.
Over the last two weeks,
They released Claude Co-work.
They released agent teams this week.
And then they released this new model.
They're going after the same thing,
which it kind of makes sense
why they're releasing Super Bowl ads
that are kind of shitting on each other now.
It makes all the sense.
And so the point is,
if they can own this operating system,
this future of work,
they will basically be the most valuable company.
And I think it's going to be when it takes most.
I have to interrupt you here.
We have some developments on our prompts
that we've been working on,
our AI Stock War Room.
Let's go.
That I'm going to have to share
on the screen right now.
So currently what it's doing is it's asking to do some quality assurance testing.
So you'll see it actually used a, it's taking over control of my browser, and it's asking to make prompts on the screen.
So you can see all of this that you're seeing right here is generated live.
And it's doing an actual real-time debug of the product that it made.
It's clicking around, it's resizing things, it's going through the links, and it's running real quality assurance testing on the actual product.
It's really amazing to see.
Like, this was all just built, all these visual charts, and they're all accurate.
So right now we're looking at Nvidia.
We have a chart, and I'm not going to mess with it because it's doing the real-time manipulation
to do quality assurance checks.
But it's actually clicking through.
It's making sure the stats are accurate.
It's making sure all of the widgets work.
And look, it has this amazing graphs already.
It has sentiment analysis.
85% of people are bullish on Nvidia.
It has recent signals from the news.
It has the assessment, a risk assessment matrix where it shows the export controls and chip
controls. It has revenue and earnings every single quarter charted, competitive modes. It has
sector comparisons. It's like, this is unbelievable. And it just generated this in a single prompt.
And I just find it really funny that we can actually watch this do it in real time. So you'll see
in this prompt, it's clicking through. It's taking screenshots of what it's seeing. And then it's
digesting, analyzing, and understanding what it made, what it messed up and what it actually still
has left to finish. And it generated everything. All of this in real time, as we're recording this
episode. So fascinating. Wow. It reminds me of some of the research platforms at the former
companies that I used to work at. And they would pay, I'm not joking, millions of dollars a year to get
access to these types of platforms that would give them analysis like what you're showing on the
screen right now. And you just build it from scratch. From scratch. And look, it's doing this.
I'm not even touching my keyboard. It just searched for Apple. And now I'm sure if I go over to the
prompt, it's taking screenshots of Apple. It says Apple dashboard looking great. Let me scroll to see the
three column button row layout, and it's checking the button rows. And it's really unbelievable.
Like, we have the investment thesis, the bulk case for it, the bear case for it, catalyst and timelines.
It has WWDC built in. It has the iPhone 18 launch, props set up for September. It's like so cool.
It's absolutely unbelievable. And now this is a real tool that I'll be able to use to type in
whatever stock I want to look at and actually get some analysis on it. Now, I'll go over to
codex over here and it looks like codex is taking its sweet time it's still zero out of six tasks
completed so it might take a little while for us to get a visual on that but it's just amazing to
watch this happen in real time as at least Claude Code and Opus 4.6 does some quality assurance
testing live by taking over my browser and running it for itself I just think this is like this is
amazing it's magic something I just noticed in your Opus chatbot screen when it's going through
its thinking, it seems to have like spun up a few different agents or instances of its own self
to pull this off. Like I think if you scroll up, like I saw a few kind of like prompts that like
suggested that that's what it was doing, which I think is underscores a very important point
that both of these models can do, which is they can spin up multiple versions of the same model
and task it with different things to run in parallel. What this means is you can get a really
complicated product, like what you're seeing on the screen right now in a matter of minutes,
because it's running in parallel. So imagine having a bunch of computer science geniuses
that you can just duplicate immediately and run at a fraction of the cost of electricity, the cost of
inference. And now you start to see why all these Nvidia chips and stuff are worth so much,
because you want to do cool stuff like this. This is insane. It's actually incredible. Okay, so
now I want to test it on Tesla. Someone choose Tesla and see if it actually can do it in a non-controlled environment.
So cool. It's very pretty. What the hell? This looks great. Okay, so here we have Tesla. It has the charts. We're going to click through the charts. It has the one week chart, the one month chart, the three-month chart. That looks fairly accurate. It has the price to earnings ratio, the 52-week high, 52-week low. So it looks like at one point it was trading at 488. Now it's trading at 389. The bulk case for Tesla. Robotaxy and FSD driving licenses could unlock $500 billion in revenue by 2030. It has the Robotxy service launch in Austin. That's a bullcase. That's a bullcase service launch in Austin. That's
it's preparing for. And let's see, the sector comparison. So it's comparing it to Rivian,
Baidu, Toyota, Ford. It has the competitive moat where it says it's most strong in brand power,
IP patents, and cost advantages. You can see the revenue, the estimate per share earnings.
Sentiment is much worse on Tesla than it was on Apple. It's at 52% right now. And it looks like
as it relates to the risk assessment, the valuation and competition and execution are all very
high risk. And that's probably an accurate assessment, although I'm not sure the competition is really
a problem. The execution is certainly going to be an issue. But it's just amazing to see how well it
does. And it even gives it a verdict. So the AI verdict on Tesla is, it's a hold. Tesla's
optionality is enormous, but current valuations already prices in multiple moonshots. Execution on
Robotaxi will be the key catalyst. That sounds about right. And it's amazing that we just built this
with a single prompt without any oversight from me. And it works. It actually works. It's really
just unbelievable how capable these things are. And now I have a dashboard that anytime I want to
make a decision, I can type in the ticker and get all this optionality. It even has menus that work.
Look at this. Profit margins, PE ratios, market cap. Wow. Pretty unbelievable. It's a reactive
in real-time Bloomberg terminal. Oh, wait. For the modern age. There's another feature here that looks like
you could compare stocks. Let's see if this actually works here. So if I type in, let's say,
Apple's ticker and I hit go, will that compare the two? And it looks like that doesn't work very well.
Oh my God, but it has moving average lines and everything. This is pretty robust. I know it's like
the trader and investors dream. Just crazy. Kind of like a side note on this, but like the fact that
Tesla's down and everyone's kind of like bearish on this company, even though they're like
rumored to be merging and stuff like this. The point being is,
There's an asymmetry between what the market is seeing
and what these inventors and builders are seeing.
These AI labs have created what they define as pretty much a low form of AGI.
You literally have an AI model that is building the next version of itself.
That by description is like a super genius
and it's only limited by the function of energy and compute, right?
And then investors are looking at this and saying,
huh, Amazon and Google are about to spend a combined $500 billion worth of Kappex this year.
Kind of bearish. That's a lot of money. So there is a real investment opportunity here to really
understand the difference of what these things can actually do. And that might lead to a lot of
opportunities to invest. I don't know, but I know that I'm buying Tesla today and a bunch of Google stock.
Yeah. I mean, look at this Google evaluation. One, this chart looks absolutely gorgeous. But two,
the AI verdict is a buy. Even the AI thinks Google is a buy because they just have,
Alphabet offers the best value in mega cap tech, dominant AI capabilities, diversified growth,
and a cheap valuation if search remote holds.
Give me the week.
Give me the week.
Let's see the weekly chart here.
Do you want some moving average lines as well?
Because we could drop those in.
Please.
Let's see.
Let's see.
I'm actually super.
Yeah, look, see, it's had a slight dip.
Markets is so reactive, crazy.
Yeah.
And I think to the point of the CAPEX, markets are viewing that as a scary, high-risk statement.
But while that's true, I also think it's a testament.
to the fact that scaling laws are going to work,
and the largest companies in the world
are betting on the continuation of them working.
And the shared consensus between all of these large-cap companies
decided to spend record cap-x this year
is a testament to the fact that things are only going to go faster,
and they believe that the more money they put in,
the more outputs they will get.
And they're going to continue to put their foot on the gas.
So I think any question that anyone had,
if these scaling laws could continue to hold up
and we can continue to be on the path to whatever AGI looks like and beyond,
I think that was answered this week through these earnings reports,
and the overwhelming answer is, yes, it's true.
It is likely that this is going to happen,
and everyone is betting their entire company on it.
I think we have done a great job,
if I pat ourselves at the back virtually, Josh,
of showing what these models are capable of.
And remember, it's been less than 48 hours that these models have been alive.
In fact, I think it's been like 36 models, that is six hours.
So if any of you are interested in trying these out,
I cannot urge you enough to go out and try these things.
Try to solve a problem that you're finding at work
or try to solve a problem that you're finding
just in your casual leisure time
to code up a hobby or a project in a matter of seconds.
It's so, so easy.
And it'll put you at an advantage to understand
how these tools work and why they're really changing the world
as we see it around us,
why stocks are dumping, why some stocks are pumping.
But yes, go demo it.
Let us know what you actually end up building.
Josh and I are trying to give you more live demos.
in a lot of the episodes that we put out.
And with every other model release and feature that drops,
we are going to be trying and testing these things.
So we can bring to you exactly what these things can do
and show you kind of like the benefits and disadvantages,
what's real and what's really not.
Yeah, and I can't stress this enough.
The best way to stay on top of things,
the best way to feel like you're not being left behind,
is just to use the tools as they come out and to understand them
and what makes them different.
And for a single subscription to chat GPT or to Claude,
you can access tools just like this and build stuff just like this. I'm not, this wasn't like
an incredibly difficult technical challenge. You just ask it what you want and you ask it to help you.
And it will actually walk through and help you through the process and build whatever you want.
So the most important thing for, for anyone listening is just to train that muscle and to
get familiar with these tools and these skills that you're able to leverage them to your advantage.
However, it may best fit in your life. And that's what kind of we wanted to share with this.
Like, it's simple. You download the app. You log into your account.
and you're on your way. It's really not as difficult as I think a lot of people make it seem like it is.
And I mean, this beautiful dashboard is a testament to that. Okay, so EJAS, it also looks like our
codex output has finished itself. So we have here on the screen, we have Opus, which we saw,
which is really a lovely dashboard, but it seems like Codex now has its own version that we
could quickly compare. So maybe we'll try, we'll go to our favorite Google, we'll type Google in,
and we'll click analyze and kind of see how this compares. I find it funny how they
They've merged on the same type of design style.
Yeah.
Oh, okay, this is interesting.
This is different.
So it has the moving average to select.
Oh, is that?
Okay, yeah, so it has the charts.
Is that accurate?
Has the P.E. Rish.
Yeah, that's what I was looking at.
Let's go to that one week chart and see.
I have some questions about this.
It looks pretty right.
Okay.
That looks very wrong.
Yeah, the one year I'm a little confused about.
Let's compare it to Claude here.
Let's go to Google and we'll analyze that.
While it thinks we can look at the rest.
So it looks like it emulated it pretty well.
It has the verdict.
It has the same stats.
The risk assessment matrix is good,
but you could see some of the text you can't really read
because it's black on black.
But nonetheless, pretty interesting.
They both succeeded.
Yeah, I mean, as we said before,
like these models are very equally capable.
And maybe it's just the way that you prompt something
or the way that some of these things work.
but largely they kind of achieved the same goal and same quality.
And like, listen, like, we're talking about like minor discrepancies here.
I can't wait to see what we will build with this.
Like, this is insane.
It's amazing.
Both of these one shot prompts didn't touch anything and here we are.
I do think that Google, when your chart is wrong.
I think Claude got that one right.
But I mean, overall, both succeeded in the mission.
Both look great and both are just excellent models.
Amazing.
Okay.
Well, that's it.
Wherever you're listening to this, if it is on YouTube and you're watching our lovely faces,
or if you're listening to us on Spotify, Apple Music,
or wherever you listen to us,
please subscribe, give us a rating,
leave us some comments.
We love your feedback,
and we respond to pretty much every single comment
because we're trying to figure out how to make this show better
and bring you the content that you guys deserve and want.
Turn on notifications because we are releasing more and more videos every week
on the hottest topics as they come out.
We also have the sickest newsletter ever,
where one of us will either write a essay
or give you the five top highlights of the week.
So if you don't want to watch any of these videos,
you can just read and digest that
and you'll know everything that you need to know
in AI and Frontier Tech.
Thank you for listening,
and we will see you on the next one.
See you in the next one.
Peace.
