Limitless Podcast - The Coding Model Wars: Claude Opus 4.6 vs GPT-5.3 Codex

Starting point is 00:00:00 48 hours ago, Anthropic dropped Claude Opus 4.6, the world's most powerful AI model. And literally 20 minutes later, OpenAI dropped Codex 5.3, which is not only better, but also built itself. Now, to say both of these models are powerful would literally be the understatement of the century. By the time I'd eaten breakfast yesterday, one of the models had discovered 500 security floors, which no one else had discovered before. And by lunchtime, a bunch of software stocks were down hundreds of billions of dollars out of fear that these models would replace entire teams. And it's actually already happened.

Starting point is 00:00:33 These models can replace a team of 50 software engineers, rebuild Pokemon from scratch, and so much more. And in this episode, we're going to be doing a live demo side by side to show you which model is the best. Yeah, this is pretty cool. I wanted to spend a lot of time this episode,

Starting point is 00:00:46 kind of introducing people to these models, what they could do, how they work, through demos that we're going to perform ourselves. These are definitely two frontier models, but I think more importantly, their frontier coding models. And when people hear that, I think a lot of them get turned away because it seems like this complicated thing.

Starting point is 00:01:02 Like you need to be a developer in order to use them. And we are here to tell you that is not the case. From one non-technical person to another, I fed this model a prompt. I fed it some assets. And then I pressed play. And what I got is a side-scrolling game, which was exactly what I asked for. So on the screen now, you're seeing the one-shot prompt that I fed this model to ask to create a size-roll. That was like Mario that we can actually play.

Starting point is 00:01:26 So it has coins. and I don't think the gravity play works. What you're saying is that it understands physics, it is able to generate graphics, and it plays like a pretty solid side scroller. And I created this in five minutes with one prompt, and it actually works. What was the prompt that you used, Josh?

Starting point is 00:01:45 So I'll pause playing this game to actually show you the prompt. It was very simple. It was this one paragraph. I want you to make a game. You can use Python or C++, whatever you find the most convenient, a 2D platformer that closely, resemble Super Mario. Use the attached background image and sprites found in the asset folder,

Starting point is 00:02:02 take into account that the sprites don't come with transparent background, but a pink one, so you need to fill it to the background. And for those who are watching, you can actually see the sprites on my screen. They were just the series of assets that there was no context given as to what each one of them was, but the model reasoned through it, it removed the background, and it actually generated a pretty good representation of that. Now, this was built one shot on Codex, which is the new OpenAI Mac application that just released this week. And I wanted to compare it to Claude. So I have another instance here on the screen with Claude. This is using Opus 4.6, the newest frontier model that they just released this week. And I want to do an exact one-to-one

Starting point is 00:02:36 comparison. So I'm going to launch the same exact prompt. We're going to have that cook on Codex, or we're going to have that cook in Claude Code. And in the meantime, maybe we can kind of talk about more of what these models do and how they work. Well, before we do that, actually, as you set this game up, I ran it on Claude Opus 4.06 as well, but with a slight twist. Okay, let's see your output. What do we have? Okay. I don't know if you can see my screen, but it is the exact game that you just created. But I don't know if those characters look kind of familiar to you.

Starting point is 00:03:07 We have the hero protagonist character, which is My Beautiful Face and My Beautiful Person, EJaz. And we have, who's this enemy over here? That looks a lot like the bear guy. And listen, we can double jump here, Josh. And I think, yep, I can crush you. But every time, I mean, this kind of jokes aside, this is insane. This took me like around three minutes to build end to end. I used the exact same prompt that you gave me.

Starting point is 00:03:36 And we didn't have sprites ready made of ourselves, right? We didn't have like cartoon images of ourselves. So I uploaded an image that we had taken, I don't know, like six months ago and said, hey, can you make game avatars out of this? It did it in 20 seconds. And then I said, could you add these to the game and replace the enemy with Josh and the protagonist with EJA, and I did it in a minute. So here we go. It's pretty amazing. And these are really, these are just using standard desktop applications. So what you're using right here,

Starting point is 00:04:03 this was done in Claude code, right? You just went on to Claude, the Macbook, the Mac app, you downloaded it, you put in the prompt, you shared some assets, and now it built this amazing game in one single prompt. And we're actually going to experiment further in this episode where we're going to create a trading room that does actual real-time stock analysis. So as I'm curating the prompts and as we're getting ready for that second. demo. Maybe we could walk through what makes this model so exceptional. Yeah, well, you might actually notice the first difference on screen right now. If you notice, if you look closely, my avatar is kind of glitching out, right? And if you compare it to your Codex game that you just

Starting point is 00:04:39 coded up, there's no glitches. It runs super smoothly. And the main takeaway here is Codex 5.3 is a superior coding model to Anthropic. And that's a sentence I never thought I would say, at least for the next couple of years, because Anthropic has held that prestige entire. for so long. But since Code Red was initiated in open air around three months ago, Sam has devoted pretty much all his resources towards building the best coding model, and the benchmarks don't lie. It is a full 12 points on the software engineering benchmark ahead of Claude Opus 4.6. That's a pretty significant difference. So I've actually pulled up a more general comparison between the two models here, and it summarizes it really well. So if we look at Claude's

Starting point is 00:05:20 model, Opus 4.6, what's good about it? Well, they've 5x the context window. So it's gone up to a million tokens or rather characters that you can put in a single prompt, which if you want to understand how powerful this is, you can just put way more information into your initial prompt. It has much better context and memory. So you can end up cooking up much better products overall, which is very, very impressive and important to have. Number two, it is, I would think about this as an orchestration model. So if you look at like specific benchmarks, it is beaten open AI at GDP Eval. GDP Eval is a benchmark where they go out and they test a models performance at a really complex task versus a professional human that would normally do that task. And the decision is,

Starting point is 00:06:06 would you use the AI model or would you use the human? And in this case, you would choose Claude 4.6 over humans way more than you would choose Open AI's latest model. So that's a really important thing. And the point around Claude's latest model is that it can not only, it doesn't code as well as Codex, but it can orchestrate a bunch of agents and overall activity better than Open AI. Now, if you look at Codex and Open AI's new model specifically, it wins on the software engineering. It is simply a better software engineer than Claude is, which is a massive flip around and shows that it's a testament to how much resources and fine-tuning that OpenAIs been able to achieve. And to the note on the quality of the models here, my prompt is done in ClaudeCodecode that I use, the same one that we used in Codex. And I'm going to run it here for the first time now.

Starting point is 00:06:53 You could see on screen. And we'll see what it looks like. So underneath we have our Codex version, which looks beautiful. On top, we have our brand new version that was just made by Opus. Now, I haven't tried this yet, so we're going to see what happens when I press space to start. So it looks like Opus has failed to create a floor. So I am just falling through the floor until the game ends. Okay.

Starting point is 00:07:17 So just based on this one demo alone, this is a fairly significant difference where GPT's Codex has created a beautiful side scroller. It doesn't have gravity, but I could just ask it to, or it has gravity, it's a little too much. I could ask it to lower it. Opus doesn't even work at all. And again, the test was just a one-shot problem. So I'm going to get back to work, prompting it again to build this new application, the trading application.

Starting point is 00:07:38 We'll follow up with that, but I think that's a funny kind of demo just to showcase that. actually is kind of superior in the other in this one use case at least. Yeah, I mean, you said it pretty clearly, which is Codex is the best coding AI model. And I have to like, I can't emphasize that enough because Open AI for a long time was behind Anthropic and by a massive margin. And in some way, shape or form, they have been able to catch up. Now, what's interesting here is both companies have focused on each other's goals. So when Anthropic was typically meant to be be the leading frontier model in coding. It now has decided to focus on what OpenAI was really good at, which is overall augustration and being a better generalized model, right?

Starting point is 00:08:23 Open AI. Yeah, exactly. Open AI has decided to eat Anthropics lunch and say, okay, we've got the generalized stuff sorted out. Let's try and figure out the coding-specific niche, highly defined, professionalized functions. And it's produced the best coding model. So it's kind of a weird win-win for both labs. And what's awesome about this is they both now have really well-rounded but also very specialized models. And the reason why this is important is,

Starting point is 00:08:51 and this is like kind of maybe my hot take, I don't think the coding models matter, Josh. I actually don't think the generalized models matter either. I think they're both going off to something much bigger, which is creating the operating system for the future of work. They know that AI models and AI agents are going to automate a ton of different industry. industries, and the industries are only going to pick you if you can do both generalized work

Starting point is 00:09:14 and hyper-specific work really well. That is coding and augustration and managing your data. And now we have two amazing models dropped within 20 minutes of each other that does exactly that to the highest performance metric that we've ever seen before. They're pretty exceptional. So now for this next demo, I have it queued up here. What we're going to do is what I did is ask the model itself to build me a prompt for this. So I wanted it to create me an AI stock portfolio world. room and I asked, hey, I want to create this, create me a fully flushed out prompt that kind of should solve this problem with one shot. So what I do is I loaded it up here in our

Starting point is 00:09:50 Claude code app, and then I also loaded it up into the Code app. I created its own project folder, and now I'm going to hit Send. So both of these things are thinking in real time, we will check back in once their outputs are done, and we'll compare again the second version, which is more of a robust one. I mean, you'll see on the Cloud screen, it has this whole list of to dos that it wants to do. It has an entire plan. There's nine different panels that it's going to build. It's going to do risk analysis matrix and portfolio action bars and all this stuff. So we'll let that cook. And let's get back to what separates these, what people have been freaking out about on the internet more as these things get going. Could I take three minutes, show you some wild demos, please? Yeah, let's see what

Starting point is 00:10:27 the internet's been demoing while we wait for hours to cook. Okay, cool. Like, listen, our 2D Mario-inspired game was cool. But imagine if I told you you could recreate the entire Pokemon game, including levels, cities, characters, and creatures that you fight from scratch in about an hour and 30 minutes. That's what we're looking at right now. Wow. It even has the fighting. Yeah, yeah, yeah. And buttons and the multimodal gameplay. And obviously, this looks like it's been made by a child image-wise, but it's probably going to take you what? Another couple of hours to make a really high fidelity game that you could, you could probably run a new Nintendo switch or whatever. It is just so impressive that we can do these things. Anyone can do these things with no previous background, just upload a few images or generate a

Starting point is 00:11:11 few images, and you can create childhood nostalgic games that are worth billions of dollars, which is just super cool to see. Yeah, one of the cool things that I think it's really important to note is how approachable this is. Like for the recent example that we're having run right now on my screen, all I did was tell it what I wanted and ask it to develop the prompt with me. So even if it feels overwhelming like you don't really know how to code, you don't know how to prompt things, you can actually just ask the model to help you generate the prompt, help explain to you how it works. And it's a really easy way to build basically anything you can imagine. It's not just games. It's productivity tools. It's CRM tracking. It's whatever you want it to be.

Starting point is 00:11:46 So I think that's really interesting. But it also goes much more technical, right? I saw another crazy example with the compiler. Okay. So for the tech nerds out there, that's spent a lot of time coding. You are going to be wowed by this. For one of their flagship demos for, Opus 4.6, the Anthropic team decided to task the model with building a C compiler, which is an incredibly complicated execution tool that is required to code up some of the most craziest types of apps. And they just walked away. And they just kind of like looked at it, monitored it, made sure that it wasn't going awry. And in two weeks, let me emphasize that.

Starting point is 00:12:27 Two whole weeks, 14 days, it coded nonstop and built this compiler. Now, you might think two weeks is quite a long time. I want my thing done an hour and a half. Well, let me harken back to history where previously, if you wanted to create something like this, in today's world, it would take a team of around 50 or so humans, and it would take them a few months to build from scratch. That's today. But back in the day, it would technically have taken them around a decade to build and like thousands of people. So we have just kind of condensed the timeline to create really complicated tools in a matter of hours or weeks in this case. Now, the second thing I want to point out is the fact that these models can go untouched

Starting point is 00:13:10 for two weeks is just insane. There was another stat that was released today by Open AI with, sorry, yesterday with Open AI is 5.2, I think, 5.2 high, I believe, where it can go pretty much 50% hit rate for 6.6 hours at time horizon. So that means if you gave it any kind of complicated coding task, 50% percent. 50% of the time, in 6.6 hours, it would get that done, completely done, and it would nail it, 50% of the time, which is just such an impressive track record. When you look back a year, and that time was, what was it, like 30 minutes, maybe an hour. So every iteration, we see this

Starting point is 00:13:47 thing double. It's just so insane. Yeah, it's really, it's unbelievable and almost like intimidating how capable and competent it is, even for someone who is a novel at writing code. It's not about writing code. It's about being able to generate whatever you want it. too. So like if you think of it, you kind of in a way it abstracts the code away and allows you to just speak the English language and get what you want from speaking English in a way that you understand and it will help walk you through the way. One of the things that I love about Claude in particular is the plan mode where if you leave a lot of things out of your prompt, it'll actually just continue to prompt you with additional questions to understand where you want. And one of the

Starting point is 00:14:23 most fascinating things that I read about GPT's 5.3 codex in particular is like you mentioned in the intro, it helps build itself. And I don't think that can be overstated because this is the first model in the history of Open AI that has helped with the building and construction of itself. And what happens as that starts to ramp up, right? Like, if you think of each model iteration as a flywheel, what is the constraint? The two constraints are the speed at which a developer can actually build it and then create the test for it and make sure that it's safe to ready to deploy. And then it's the hardware that's required to actually train the model. What we're seeing with Codex and Opus, which I really believe was kind of sonnet, is the incremental improvements.

Starting point is 00:15:06 Now, for the incremental improvements that don't require an entirely new training run, the real constraint is the actual software and what you could squeeze out of it. And when you have a model that's helping you build this software that can think for six, 12, 24 hours at a time, even longer, and that is, it kind of creates this self-fulfilling loop, right? Where the models use the new models to make the new models, the future models stronger and more powerful and better. And I thought that was a really interesting thing to note is that this is the first self-propagating model where it ran a lot of the test for itself. It introduced new code that made itself better. And as we continue to see that, you can start to imagine that vertical, that like

Starting point is 00:15:43 exponential progress line going pretty close to vertical and things getting really good, like really, really quick. I think what most people listening to this might think is that, well, what was different before. Well, previously, models would just kind of work in a very analog mode. You would just point it at a problem and it would just understand what the problem was and then solve it. But it lacked that awareness and wider context as to like what the wider vision and goal was to achieve and then figuring out stuff for itself. You always had to kind of handhold it. But now with its ability to kind of like understand what it's trying to do and look internally and say, huh, I made that mistake because of this error in my code. I'm going to now like rewrite.

Starting point is 00:16:25 my code and then I'll be better at it, it kind of functions similarly to a human. Now, I actually saw a great analogy. I forgot who wrote it, but it's fantastic, where if you imagine yourself standing on a sidewalk, right, and a Bugatti Varon drives super fast by you at, let's say, 200 miles an hour, you'll be like, wow, that's kind of fast. And then two minutes later, another Bugatti drives by you at 300 miles an hour, you'll be like, wow, that's kind of fast. But you really notice the difference between that 100 mile an hour difference, right? But if you were in the car strapped in, you would notice it is significantly improved. And that's how software engineers feel right now. Now, if you're someone that doesn't code all the time, you're not necessarily going to

Starting point is 00:17:10 understand these impacts, but it's really important for those of you listen to this to figure out that this is massively impactful and will change the way that a lot of things are happening today. I mean, just take a look at this, right? This is a direct quote from someone who is building at a major tech company, Racutern. And the quote here says, Claude Opus 4.6 autonomously closed 13 issues and assigned 12 issues to the right team members in a single

Starting point is 00:17:35 day managing a 50-person organization across six repositories. Josh, do you know who else is responsible for doing that? An entire team of product managers that each get paid a quarter of a million dollars in compensation

Starting point is 00:17:49 minimum per year. At least, yeah. Their jobs are automated now. Well, one of the earlier moments in which I realize this was pretty profound is when Claude co-work, they said they built it with what? Just a hint, like four people over the course of 10 days, and it was 100% built by the current model of Claude, which was Opus 4.5 at the time. Like the amount of leverage from these tools is so high, but it cuts both ways. It's like if you can design and develop

Starting point is 00:18:18 a product in 10 days, then that means another company can probably do that in five. And it starts to lower the competitive threshold for these companies to catch up, and it starts to raise the bar of what is possible. Like, if you could build something that profound in 10 days, what can you build over the course of six months? Like, can you really build something fantastic that has a moat that actually delivers on the total power that you have by leveraging this AI? It's going to be interesting to see because, I mean, what we're finding, even with the Codex and Opus dual launch is that these companies are right next to each other. And if one publishes something, profound or something that attracts a lot of users,

Starting point is 00:18:58 they're just a few days and a few prompts away from copying it. And that's like a pretty difficult thing to compete against on the software front. Well, that's why if we look at the stock market over the last couple of days, like it's down trillions of dollars. And I'm not exaggerating. If you look at Microsoft over the last two weeks, the stock is down 20%. It's trading like a meme stock, which is just insane. And the reason why that is, is a lot of investors are anticipating

Starting point is 00:19:25 that these models, specifically Opus 4.6 and Codex 5.3, will just create the tools that these billions of dollars' worth of SaaS companies have spent or valued their entire lives on in a couple of seconds, just as you described. Now, the counter argument to this, Josh, is, and Jets of Juan actually kind of went live at a conference and spoke about this and made this point, if you're an AI agent or AI model that is capable of building these tools, right? Why would you rebuild the tool every single time you do a function? Surely you would just access the best tool and use it. So there's a bit more nuance where AI models aren't just going to recreate your entire

Starting point is 00:20:08 software stack if you are at a Fortune 500 company. That kind of doesn't make any sense. There are a bunch of tools that are hyper-optimized to do that. But what it will do is it will connect all of these tools and silos in a much more effective way. And maybe that requires rebuilding parts of it. Maybe it requires kind of connecting different ways, but not rebuilding the entire tools. And whatever operating system that ends up becoming will be the most sticky and valuable company ever. Now, that could be Salesforce or it could be someone completely different, a startup

Starting point is 00:20:37 that we haven't even heard of. And I think that's really important to understand. But people are experimenting. And if you look at this graph right here, which is, may not look insane to some, but is insane to me at least, 4% of daily GitHub commits are now clothed code. That was, I think, 5% of what it is today two months ago. So the ascent has just been insane. These companies are adopting it and they are using it. Yeah, the number is just going to keep going up. And there's no reason why it wouldn't. It's such a testament, one, the speed. Like, it feels like we're strapped in that car and now we're flying. To an outsider might not look like it. It certainly feels like that on the inside. And I think a lot of people are starting to notice this

Starting point is 00:21:17 and get a little nervous about it too. Like, look at this example on the screen right now. This is a prompt from GPT 5.3 Codex, which basically created an entire Minecraft clone in a single prompt. And it looked awesome. And it works really fast, and it was super lightweight. And it says, I also tried on Opus 4.6,

Starting point is 00:21:37 but for some reason it got stuck. But you can build anything that you want very, very quickly, like very cheaply as well. What Opus 5.3, or Opus 5.3. I'm getting them all mixed up. What GPT 5.3 Codex offered is double the rates, the double the token rates for the next couple of months. So you actually have the freedom for their $20 a month plan

Starting point is 00:21:58 to go and build whatever you want. Can I maybe deliver a hot take, Josh? Yeah, what do you got? I think the most exciting part about these model releases aren't the models themselves. Largely, I think the models are kind of similar in capabilities. They are around the same coding benchmarks, and they can roughly do the same things.

Starting point is 00:22:16 They can spin up a bunch of agents and orchestrate themselves. The bigger picture, which I think a lot of people missed, was both companies, Anthropic and Open AI, are at war with each other. And they're trying to basically build and own the operating system for work, which isn't just a model. It's a software suite. So this week alone, Open AI didn't just release this new model. They released the Codex app, which is a desktop Mac app, which is kind of like a command line interface, which makes the coding experience way better. and they also launched an enterprise platform called Frontier, which allows Fortune 500 companies

Starting point is 00:22:51 to basically take this magical model and give it to non-coders and let them do magical things. Now, all of these products together creates a very sticky experience where it starts to make sense for software engineers and non-software engineers to use these products. And it becomes incredibly sticky, which results in billion-dollar contracts, right?

Starting point is 00:23:11 Anthropic has done the same thing. Over the last two weeks, They released Claude Co-work. They released agent teams this week. And then they released this new model. They're going after the same thing, which it kind of makes sense why they're releasing Super Bowl ads

Starting point is 00:23:24 that are kind of shitting on each other now. It makes all the sense. And so the point is, if they can own this operating system, this future of work, they will basically be the most valuable company. And I think it's going to be when it takes most. I have to interrupt you here.

Starting point is 00:23:38 We have some developments on our prompts that we've been working on, our AI Stock War Room. Let's go. That I'm going to have to share on the screen right now. So currently what it's doing is it's asking to do some quality assurance testing. So you'll see it actually used a, it's taking over control of my browser, and it's asking to make prompts on the screen.

Starting point is 00:23:55 So you can see all of this that you're seeing right here is generated live. And it's doing an actual real-time debug of the product that it made. It's clicking around, it's resizing things, it's going through the links, and it's running real quality assurance testing on the actual product. It's really amazing to see. Like, this was all just built, all these visual charts, and they're all accurate. So right now we're looking at Nvidia. We have a chart, and I'm not going to mess with it because it's doing the real-time manipulation to do quality assurance checks.

Starting point is 00:24:21 But it's actually clicking through. It's making sure the stats are accurate. It's making sure all of the widgets work. And look, it has this amazing graphs already. It has sentiment analysis. 85% of people are bullish on Nvidia. It has recent signals from the news. It has the assessment, a risk assessment matrix where it shows the export controls and chip

Starting point is 00:24:41 controls. It has revenue and earnings every single quarter charted, competitive modes. It has sector comparisons. It's like, this is unbelievable. And it just generated this in a single prompt. And I just find it really funny that we can actually watch this do it in real time. So you'll see in this prompt, it's clicking through. It's taking screenshots of what it's seeing. And then it's digesting, analyzing, and understanding what it made, what it messed up and what it actually still has left to finish. And it generated everything. All of this in real time, as we're recording this episode. So fascinating. Wow. It reminds me of some of the research platforms at the former companies that I used to work at. And they would pay, I'm not joking, millions of dollars a year to get

Starting point is 00:25:23 access to these types of platforms that would give them analysis like what you're showing on the screen right now. And you just build it from scratch. From scratch. And look, it's doing this. I'm not even touching my keyboard. It just searched for Apple. And now I'm sure if I go over to the prompt, it's taking screenshots of Apple. It says Apple dashboard looking great. Let me scroll to see the three column button row layout, and it's checking the button rows. And it's really unbelievable. Like, we have the investment thesis, the bulk case for it, the bear case for it, catalyst and timelines. It has WWDC built in. It has the iPhone 18 launch, props set up for September. It's like so cool. It's absolutely unbelievable. And now this is a real tool that I'll be able to use to type in

Starting point is 00:26:04 whatever stock I want to look at and actually get some analysis on it. Now, I'll go over to codex over here and it looks like codex is taking its sweet time it's still zero out of six tasks completed so it might take a little while for us to get a visual on that but it's just amazing to watch this happen in real time as at least Claude Code and Opus 4.6 does some quality assurance testing live by taking over my browser and running it for itself I just think this is like this is amazing it's magic something I just noticed in your Opus chatbot screen when it's going through its thinking, it seems to have like spun up a few different agents or instances of its own self to pull this off. Like I think if you scroll up, like I saw a few kind of like prompts that like

Starting point is 00:26:47 suggested that that's what it was doing, which I think is underscores a very important point that both of these models can do, which is they can spin up multiple versions of the same model and task it with different things to run in parallel. What this means is you can get a really complicated product, like what you're seeing on the screen right now in a matter of minutes, because it's running in parallel. So imagine having a bunch of computer science geniuses that you can just duplicate immediately and run at a fraction of the cost of electricity, the cost of inference. And now you start to see why all these Nvidia chips and stuff are worth so much, because you want to do cool stuff like this. This is insane. It's actually incredible. Okay, so

Starting point is 00:27:27 now I want to test it on Tesla. Someone choose Tesla and see if it actually can do it in a non-controlled environment. So cool. It's very pretty. What the hell? This looks great. Okay, so here we have Tesla. It has the charts. We're going to click through the charts. It has the one week chart, the one month chart, the three-month chart. That looks fairly accurate. It has the price to earnings ratio, the 52-week high, 52-week low. So it looks like at one point it was trading at 488. Now it's trading at 389. The bulk case for Tesla. Robotaxy and FSD driving licenses could unlock $500 billion in revenue by 2030. It has the Robotxy service launch in Austin. That's a bullcase. That's a bullcase service launch in Austin. That's it's preparing for. And let's see, the sector comparison. So it's comparing it to Rivian, Baidu, Toyota, Ford. It has the competitive moat where it says it's most strong in brand power, IP patents, and cost advantages. You can see the revenue, the estimate per share earnings. Sentiment is much worse on Tesla than it was on Apple. It's at 52% right now. And it looks like as it relates to the risk assessment, the valuation and competition and execution are all very high risk. And that's probably an accurate assessment, although I'm not sure the competition is really

Starting point is 00:28:37 a problem. The execution is certainly going to be an issue. But it's just amazing to see how well it does. And it even gives it a verdict. So the AI verdict on Tesla is, it's a hold. Tesla's optionality is enormous, but current valuations already prices in multiple moonshots. Execution on Robotaxi will be the key catalyst. That sounds about right. And it's amazing that we just built this with a single prompt without any oversight from me. And it works. It actually works. It's really just unbelievable how capable these things are. And now I have a dashboard that anytime I want to make a decision, I can type in the ticker and get all this optionality. It even has menus that work. Look at this. Profit margins, PE ratios, market cap. Wow. Pretty unbelievable. It's a reactive

Starting point is 00:29:22 in real-time Bloomberg terminal. Oh, wait. For the modern age. There's another feature here that looks like you could compare stocks. Let's see if this actually works here. So if I type in, let's say, Apple's ticker and I hit go, will that compare the two? And it looks like that doesn't work very well. Oh my God, but it has moving average lines and everything. This is pretty robust. I know it's like the trader and investors dream. Just crazy. Kind of like a side note on this, but like the fact that Tesla's down and everyone's kind of like bearish on this company, even though they're like rumored to be merging and stuff like this. The point being is, There's an asymmetry between what the market is seeing

Starting point is 00:30:02 and what these inventors and builders are seeing. These AI labs have created what they define as pretty much a low form of AGI. You literally have an AI model that is building the next version of itself. That by description is like a super genius and it's only limited by the function of energy and compute, right? And then investors are looking at this and saying, huh, Amazon and Google are about to spend a combined $500 billion worth of Kappex this year. Kind of bearish. That's a lot of money. So there is a real investment opportunity here to really

Starting point is 00:30:37 understand the difference of what these things can actually do. And that might lead to a lot of opportunities to invest. I don't know, but I know that I'm buying Tesla today and a bunch of Google stock. Yeah. I mean, look at this Google evaluation. One, this chart looks absolutely gorgeous. But two, the AI verdict is a buy. Even the AI thinks Google is a buy because they just have, Alphabet offers the best value in mega cap tech, dominant AI capabilities, diversified growth, and a cheap valuation if search remote holds. Give me the week. Give me the week.

Starting point is 00:31:05 Let's see the weekly chart here. Do you want some moving average lines as well? Because we could drop those in. Please. Let's see. Let's see. I'm actually super. Yeah, look, see, it's had a slight dip.

Starting point is 00:31:14 Markets is so reactive, crazy. Yeah. And I think to the point of the CAPEX, markets are viewing that as a scary, high-risk statement. But while that's true, I also think it's a testament. to the fact that scaling laws are going to work, and the largest companies in the world are betting on the continuation of them working. And the shared consensus between all of these large-cap companies

Starting point is 00:31:37 decided to spend record cap-x this year is a testament to the fact that things are only going to go faster, and they believe that the more money they put in, the more outputs they will get. And they're going to continue to put their foot on the gas. So I think any question that anyone had, if these scaling laws could continue to hold up and we can continue to be on the path to whatever AGI looks like and beyond,

Starting point is 00:31:58 I think that was answered this week through these earnings reports, and the overwhelming answer is, yes, it's true. It is likely that this is going to happen, and everyone is betting their entire company on it. I think we have done a great job, if I pat ourselves at the back virtually, Josh, of showing what these models are capable of. And remember, it's been less than 48 hours that these models have been alive.

Starting point is 00:32:18 In fact, I think it's been like 36 models, that is six hours. So if any of you are interested in trying these out, I cannot urge you enough to go out and try these things. Try to solve a problem that you're finding at work or try to solve a problem that you're finding just in your casual leisure time to code up a hobby or a project in a matter of seconds. It's so, so easy.

Starting point is 00:32:39 And it'll put you at an advantage to understand how these tools work and why they're really changing the world as we see it around us, why stocks are dumping, why some stocks are pumping. But yes, go demo it. Let us know what you actually end up building. Josh and I are trying to give you more live demos. in a lot of the episodes that we put out.

Starting point is 00:32:56 And with every other model release and feature that drops, we are going to be trying and testing these things. So we can bring to you exactly what these things can do and show you kind of like the benefits and disadvantages, what's real and what's really not. Yeah, and I can't stress this enough. The best way to stay on top of things, the best way to feel like you're not being left behind,

Starting point is 00:33:14 is just to use the tools as they come out and to understand them and what makes them different. And for a single subscription to chat GPT or to Claude, you can access tools just like this and build stuff just like this. I'm not, this wasn't like an incredibly difficult technical challenge. You just ask it what you want and you ask it to help you. And it will actually walk through and help you through the process and build whatever you want. So the most important thing for, for anyone listening is just to train that muscle and to get familiar with these tools and these skills that you're able to leverage them to your advantage.

Starting point is 00:33:45 However, it may best fit in your life. And that's what kind of we wanted to share with this. Like, it's simple. You download the app. You log into your account. and you're on your way. It's really not as difficult as I think a lot of people make it seem like it is. And I mean, this beautiful dashboard is a testament to that. Okay, so EJAS, it also looks like our codex output has finished itself. So we have here on the screen, we have Opus, which we saw, which is really a lovely dashboard, but it seems like Codex now has its own version that we could quickly compare. So maybe we'll try, we'll go to our favorite Google, we'll type Google in, and we'll click analyze and kind of see how this compares. I find it funny how they

Starting point is 00:34:22 They've merged on the same type of design style. Yeah. Oh, okay, this is interesting. This is different. So it has the moving average to select. Oh, is that? Okay, yeah, so it has the charts. Is that accurate?

Starting point is 00:34:36 Has the P.E. Rish. Yeah, that's what I was looking at. Let's go to that one week chart and see. I have some questions about this. It looks pretty right. Okay. That looks very wrong. Yeah, the one year I'm a little confused about.

Starting point is 00:34:50 Let's compare it to Claude here. Let's go to Google and we'll analyze that. While it thinks we can look at the rest. So it looks like it emulated it pretty well. It has the verdict. It has the same stats. The risk assessment matrix is good, but you could see some of the text you can't really read

Starting point is 00:35:07 because it's black on black. But nonetheless, pretty interesting. They both succeeded. Yeah, I mean, as we said before, like these models are very equally capable. And maybe it's just the way that you prompt something or the way that some of these things work. but largely they kind of achieved the same goal and same quality.

Starting point is 00:35:27 And like, listen, like, we're talking about like minor discrepancies here. I can't wait to see what we will build with this. Like, this is insane. It's amazing. Both of these one shot prompts didn't touch anything and here we are. I do think that Google, when your chart is wrong. I think Claude got that one right. But I mean, overall, both succeeded in the mission.

Starting point is 00:35:43 Both look great and both are just excellent models. Amazing. Okay. Well, that's it. Wherever you're listening to this, if it is on YouTube and you're watching our lovely faces, or if you're listening to us on Spotify, Apple Music, or wherever you listen to us, please subscribe, give us a rating,

Starting point is 00:35:59 leave us some comments. We love your feedback, and we respond to pretty much every single comment because we're trying to figure out how to make this show better and bring you the content that you guys deserve and want. Turn on notifications because we are releasing more and more videos every week on the hottest topics as they come out. We also have the sickest newsletter ever,

Starting point is 00:36:18 where one of us will either write a essay or give you the five top highlights of the week. So if you don't want to watch any of these videos, you can just read and digest that and you'll know everything that you need to know in AI and Frontier Tech. Thank you for listening, and we will see you on the next one.

Starting point is 00:36:31 See you in the next one. Peace.

Limitless Podcast - The Coding Model Wars: Claude Opus 4.6 vs GPT-5.3 Codex

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.