Limitless Podcast - Everyone Needs to Use OpenAI Codex... Until Claude Mythos Comes Out

Episode Date: May 5, 2026

Let's examine the fierce competition between AI coding tools Anthropic's Claude and OpenAI's Codex. As Codex emerges with robust updates, we discuss user experiences and showcase demos compar...ing game development and dashboard creation. Highlights include Codex's superior interface and innovative features like auto-review and Chronicle. We also explore the broader implications for AI integration in coding tasks.------🌌 LIMITLESS HQ ⬇️NEWSLETTER:    https://limitlessft.substack.com/FOLLOW ON X:   https://x.com/LimitlessFTSPOTIFY:             https://open.spotify.com/show/5oV29YUL8AzzwXkxEXlRMQAPPLE:                 https://podcasts.apple.com/us/podcast/limitless-podcast/id1813210890RSS FEED:           https://limitlessft.substack.com/------TIMESTAMPS0:00 Claude vs Codex3:25 Image Generation Capabilities4:52 Long Horizon Autonomy8:55 Chronicles10:19 Demo16:49 Dashboard Creation Challenge20:30 The AI Model Harness Explained24:27 The Future of AI Tools26:20 Claude Mythos27:26 Verdicts------RESOURCESJosh: https://x.com/JoshKaleEjaaz: https://x.com/cryptopunk7213------Not financial or tax advice. See our investment disclosures here:https://www.bankless.com/disclosures⁠

Transcript
Discussion (0)
Starting point is 00:00:00 A few months ago, we told you to use Claude. Now, we're telling you to switch back, because for those of you who aren't familiar, well, over Christmas break, there was a major vibe shift. We're AI coding went from this, like, fun tool to things that developers actually use when they're shipping code. And even if you're not a developer, the amount of use cases and applications that were created around that time were really strong. And since then, Anthropic has gone on this generational run of shipping these incredible
Starting point is 00:00:21 products seemingly every single day that has turned Cloud Code into this supercharged super app that is the place that EJAS, I know you've gone to, I've gone there, two in order to get all of our AI progress done, any work that we have, we've gone to cloud code. Now, Open AI has woken up, and over the last few weeks, Codex has shipped more features than most companies ship in a year, and I bet, I guarantee that you haven't heard of some of these features that we're going to talk about in this episode. The pendulum has fully swung back, or at least I believe, because I'm totally codex-pilled. And in this episode, we're going to kind of walk through the differences between these two and why the model that's using today
Starting point is 00:00:55 probably won't be the model you're using tomorrow. And I don't think we're going to convince you, but maybe we could show you why you might want to consider using something else here. I just want to talk through some of the crazy stats here because the script has genuinely flipped. A few months ago, Claude Code was anything everyone could talk about, and every software engineer was using ClaudeCode, every enterprise was installing it. It was crazy. But just over the last couple of weeks, specifically by the end of April, chat GPT 5.5 was released, and that was plugged into the coding AI model.
Starting point is 00:01:26 It's all one and the same. And Open Air went on this Code Red Run, where they focus on nothing but building the best coding AI model and the best LLM. And the numbers showed that it's worked. Over the last week, Codex has been downloaded over or installed over 46 million times. Called code, under 500,000 times.
Starting point is 00:01:44 Now, that is crazy to say, because if you look at the historical data, called code downloads and installs has absolutely dwarfed codex, but something changed over the last couple of weeks. That something was Open AI putting out just a better model. You mentioned that you were Codex pill, Josh. I think so am I. I've spent the last couple of days playing around with Codex.
Starting point is 00:02:04 This morning we prepped a bunch of really cool demos, and it is just completely flipped script. But it's one thing saying it. It's another thing actually showing the direct comparison. So we created this visual artifact to kind of give you the scoreboard. And you can see it at the top here. It's opening AI Codex at 11 Anthropic Claude at 2. But let me explain why.
Starting point is 00:02:23 Okay. So, number one, computer use. Codex and Claudecotech. can use your computer. It can take over your desktop and it can move your cursor around. Now, Claude pioneered it. There were the first ones there.
Starting point is 00:02:36 But it was super slow. It kind of runs into a bunch of obstacles and you have to kind of like handhold it and improve it to do a bunch of different things. Codex is not only quicker than me. It's quicker than the average person. In fact, I can actually see the cursor move around so quickly and it's like using a computer,
Starting point is 00:02:52 but it's a superhuman and it can run pretty much 24-7 at this point. Long Horizon Autonomy. Codex can work for longer in a much more intelligent manner versus Claude code, which is, again, crazy to say because literally a month ago, it was the inverse of this. Claude right now can run for a decent number of times or amount of time, but not as long as Codex can.
Starting point is 00:03:14 And then the last two that I want to talk about here is browser use. So Codex can take over your browser. It can do a lot more intentional things. It understands what it's looking at very importantly. Previously, it could not do that. Claude can do the same, but not as intelligent. And then finally, chat GPT Images 2.0 got released, what was it, like two weeks ago now. Oh, it's so good.
Starting point is 00:03:35 Yeah, it's the image generation model from OpenAI, and it is absolutely astounding. In fact, it beat all the other predecessors, including Google's, what is it, Nanobanana 2.0 Pro, which previously held the lead. It beat it across every single benchmark. Anthropic, on the other hand, doesn't even have an image gen model. So so far, it's crushing. Yeah, I think a lot of the best is now bundled into codex. The image gen for anyone who uses any sort of visual work is unbelievable. And being able to use that directly in your software is awesome. One thing that you mentioned is the long horizon autonomy. I think that needs a double
Starting point is 00:04:11 clicking on because it's really impressive how well it works. Traditionally, there's been this thing called a Ralph loop that we use. It's actually named after the character from the Simpsons, who is very persistent. And it's basically a planning mode where you give the AI a goal and it will continue to iterate towards that goal until it accomplishes it. So like, let's say you want to build a Lego car or something and you give it the exact parameters, it will go and go and go until it solves that problem and gives you exactly what you want in a way that other AI models haven't. Codex did that. And this is the only native implementation that you can get of this long horizon thinking where it actually will go for days on end. I've seen screenshots of some thinking
Starting point is 00:04:46 for as long as 36 hours to accomplish the goal. So if you have really difficult tasks, Codex is going to be really good at solving those. Now, continue to scroll down. There was another feature that was just released this week called auto review. And a huge pain in the ass for people who are creating code for working on complex projects, whatever it may be, is you're constantly having to sit there and approve things because the permission system's a little finicky, right? You don't want to give it full access to your computer, but you also don't want to sit there and approving every time it wants to use Chrome or every time I want to access your file. So Codex created auto review, and they rolled it out last week where the agent is kind of smart. It knows which
Starting point is 00:05:21 things are going to possibly be systemic existential threats and which approvals aren't, and it will just automatically approve all the things that aren't going to get you in a lot of trouble. It creates a much easier user interface where you can just kind of walk away from the computer for a little while and come back and things get done. Memory in context is pretty strong. I'd say the one thing, and we haven't mentioned many Claude winners, the place where Claude wins currently is on their OpenClaw capability, funny enough because OpenAI bought OpenClaw. But Dispatch is the mobile app feature for Claude in which you can actually, engage with Claude Code remotely, that doesn't currently exist on Codex. And while the team has
Starting point is 00:05:55 promised to ship that, you don't actually have that currently today. Claude has that. Also, in terms of the personality and UI, Cloud is just so much better. I think we're going to get into our personal takes, but whenever you're using an LLM versus an actual tool set or a harness, Clod is pretty great and the UI is very warm. So there's some kind of instances in which Claude is better, but for the most part, Codex is really just kind of crushing it. And I've really enjoyed using it. One of the fun things is pets. I mean, just recently they released pets. And Cloud also released pets, but these pets are a little bit different. This is an example of Angry Dario we're seeing on the screen. And it's fun, because you have this persistent character that exists throughout your computer use. And as you're
Starting point is 00:06:35 engaging with Codex, it'll just kind of chat with you in the background so you could see your progress, see where you're at. It's fun, it's playful. And it just shows that they kind of care about the user experience. Now, one feature I would guarantee most people don't know is Chronicle, EJAS. And you were just telling me about Chronicle and how cool it is, how it kind of monitors your screen. green as you go. This seems like novel technology that we haven't seen yet. Yeah. So one of the earliest episodes that we did here on Limitless was an interview with the folks at OpenAI that created something called, what was it called, Josh? Do you remember? It was like agent mode or personal mode, something like that. Yes. It thought overnight for you, right? Yes. It basically took all the
Starting point is 00:07:12 conversations that you had with chat GPT the night before or the day before or the week before, and it created important context around you in the form of something called memories. This is where AI memory was birthed from Open AI themselves from the Open AI team. And what it would do is it would feed you a report in the morning that would update you on information that it thought you would be interested to read about.
Starting point is 00:07:35 So say, for example, you were interested in the stock market, it will give you an update on a bunch of advancements that had happened overnight or over the last week or whatever it might be, right? Now, fast forward today, memory is embedded across every single AI model and tool. The reason why is context is so important. It's one thing a user asking for something explicitly and directly.
Starting point is 00:07:54 It's a complete other thing for an AI to actually understand what you mean, the nuance in the sentence that you've created, and even better to predict what you want. But there was still an obstacle, which was you needed to feed it the context and say, Hey, Claude, hey, chat, JPD, can you remember this? OpenAI recently released a feature called Chronicle, where it observes what you scroll through, what you click on, what you type,
Starting point is 00:08:21 and it builds its own context and memories around you without you needing to feed it, which actually led to a really cool prompt that you pointed out, Josh, or that you found, which was, what have I been doing very inefficiently on my computer,
Starting point is 00:08:34 according to Chronicle, which is this new memory feature, make some recommendations, be direct, tell me what I need to hear. That's pretty awesome. Yeah, so this is alpha,
Starting point is 00:08:43 because I don't think a lot of people recognize that this is a possibility because Codex and OpenAI didn't do a good job of explaining this. When they released Chronicle, they said it's a way of the system to review your code as you've gone because it's been taking sequential screenshots. But the reality is that it's much bigger than this. And I suspect they didn't market it this way because it could be a bit of a privacy issue. But it's essentially constantly monitoring your screen and taking screenshots of what's happening on your screen and interpreting it so it understands your habits, the way that you work, the thing that you do. And then you can ask it, what have I been doing very inefficiently?
Starting point is 00:09:14 on my computer, according to Chronicle, make some recommendations, be direct, tell me what I need to hear. And it'll actually evaluate how you've been using your computer, how long you've been scrolling on Twitter, perhaps, how long you haven't been doing the things you're supposed to be working on, or just generally how to improve your workflow and give you real feedback based on your actual actions that it's seen. And I think this is a super powerful thing currently only available to pro members. So if you pay for the $100, $200 a month subscription, you get access to this. But I suspect this is the early signs of a very important feature they're going to roll out, which is that entire computer monitoring system to improve your system
Starting point is 00:09:48 and also probably train the models to get better at engaging with your system. But I found Chronicle to be one of those kind of secret features that not a lot of people know about, but has a lot of upside if you use it to your advantage and let it monitor what you're doing and improve your workflow on a day-to-day basis. Yep. So the point is from both of these companies, Anthropic and Open AI, we are getting feature releases every single week, in fact, every single day. And it's becoming, I'm being bombarded by this.
Starting point is 00:10:13 and it's hard to keep track with all of this. So what is the number one litmus test for both of these models and products and companies? It's to actually use the thing. It's to build the thing. And we have two special demos that we have prepared for you that we're about to jump into. Now, Josh, can you guess what my first demo is about the theme?
Starting point is 00:10:31 First one's a game. We're gamers, man. I want to play a game. I want to see how well it does on a game. I know we did this demo in the past months ago. It left a lot to be desired, so I'm curious to see the current up-to-date status as it relates to cloud code versus codex. Who's winning on the one shot game prompt? Indeed.
Starting point is 00:10:46 Okay, so I am a nostalgic kind of guy. And so I was like, back in the day, I loved Mario. So I want you, both of these models, to create the best Mario type or inspired game, a side scroller, but make it futuristic. Maybe add a little bit of neon, sprinkle a bit of neon in there,
Starting point is 00:11:04 create levels. I want game design. I want there to be enemies. I want there to be pitfalls. And I also want there to be a scoreboard. and also tell me how to do this thing. Give me the whole package, basically. I fed this prompt or idea into chat GPT and Claude,
Starting point is 00:11:17 and I said, can you create a detailed prompt that I can then feed into your coding models? I then set each of the coding models to their highest settings. So what you're about to see is the best of the best for the most detailed prompt that they came up with, and let's see what they did. So step number one or example number one is Claude Opus 4.7. So this is called code at the highest setting
Starting point is 00:11:39 with their latest model. Okay, it took the prompt pretty literally. It's titled this Neon Plummer Moonbase Run, which is obviously Mario Inspired, and it said, hey, this is a demo edition, by the way. This is not production ready. What I like about this is it's giving me the instructions, but how does the game actually play out? Let's see. It looks good.
Starting point is 00:11:57 Can you see me here, Josh? I can. Yes, I can. And it looks like... The animations are pretty good. I'm jumping around. I think I'm like a little robot. I can see my feet pit-pittering.
Starting point is 00:12:07 Now, I'm guessing this thing is about to kill me. So let's see if I can jump. Oh, I can jump. There we go. That's awesome. One bit, can I kill this guy. Oh, yes, I can. Now, one bit of feedback I've noticed is I can't double jump.
Starting point is 00:12:21 And it told me in the menu that I could double jump. So that's weird. So the physics hasn't really paid off. Can I die? Oh, it certainly looks like you could die. I can die. Great. Okay.
Starting point is 00:12:32 So that is Claude's attempted it. What's your feedback on this, Josh? I think the graphics are pretty good. The graphics are great. For one shot, I mean, granted, this is only one single prompt. So for one prompt, it created great graphics. It had sound design that actually sounds pretty accurate to what you would expect in the game. It has similar principles.
Starting point is 00:12:48 It's following gaming principles. You kind of understand what looks dangerous, what doesn't. You knew that those spikes were going to hurt you, and they hurt you. The logic seems to be a little bit flawed. I think it's having problems with gravity or at least that double jump functionality, because it looks like those coins that you probably want to collect, you can't actually reach because you can't do the double jump. So in terms of logic, not so.
Starting point is 00:13:08 hot in terms of visuals, aesthetics, in terms of, I mean, how good this game is from one shot, very impressive. Yeah, I think it's important to understand that I started from zero. It literally asked me to give it a folder to build in, and the folder was completely empty. So all the visual renderings, all the graphics, the animation style, the scoring system, the way that the avatar moves and looks was created from scratch from a bunch of characters from this AI model. So this is Claude Codd's current best attempt, and it is way better than what we tested out and honestly demoed on this show about a month ago. But now let's see what OpenAI's chat GPT 5.5 codex at the highest possible setting cooked up. Okay. And this is using the same prompt. So you just fed the model
Starting point is 00:13:52 the same prompt and now we're going to see the output. Identical. All right. Oh, God, I'm excited. I hope codex did well because now that I'm a fan, I'm gassing it up, it better perform here. Okay, so this is GPT 5.5's attempt. Now you might notice that this isn't the entire browser. That's because Codex has a very unique feature, which is not only can it do all the coding in a single app for you, but it has an in-app browser. So it can live test the thing in the app without you needing to go to Google Chrome or whatever.
Starting point is 00:14:18 But anyway, we have the starting screen here. It has also called it neon plumber moonbase run. It looks a little more rudimentary from the start, but I do like the background animation, Josh. We didn't get this in the previous one, or at least not this side-scrolling thing. Well, let's... Oh, oh, this is nice.
Starting point is 00:14:35 This is nice I think this has good logic Wait but this is no music There's no music I can't double jump Might be a skill issue Might be a prompt issue Let's have a look
Starting point is 00:14:47 Did it say you can double jump That's a good question actually I mean this is looking This is a fully playable game Yes this is and I like that it's like zoomed in There's like Oh we got the boost I can jump on the platforms
Starting point is 00:14:59 Let's see if I can kill this guy Yes Nice okay and can I jump to the gap There's a scoring system. And you could see your hearts. Oh, dude. This is way better. Power up.
Starting point is 00:15:09 Wait, oh my God, I want the power up. I'm still going to go back. I can't double jump. No, you can. You could go back. Go back to the last platform. Oh, God, I died. I'm going.
Starting point is 00:15:18 I'm going to the last platform. Here we go. It looks like they're sequentially gaining height, which is interesting. Oh, but okay, so if I'm comparing these two, I'm actually, I'm not feeling very let down. This is good. Aside from the music not existing, which we may not have explicitly asked. It looks like the logic plays better.
Starting point is 00:15:32 The actual gameplay is usable. this is a full, I don't know if it's glitching or if this is you glitching. No, no, that is, it's glitching. It's glitching a bit. Okay. So it's still, there are some edge case errors. Yeah. But this is different in the sense that you have your hearts clearly projected.
Starting point is 00:15:46 You have a score system that's clearly in place. You're able to get these powerups. They work. They function. I mean, this is a very clean and functional game. So I would give this to Codex. I think the experience, perhaps the design of Claude was better. And perhaps the music, I mean, music was definitely better versus none.
Starting point is 00:16:03 But Klaudex in terms of just, or Kodex in terms of just coding logic and making a better game. I give this codex. Do you have a take? Yeah. So on the build side of things, I had a much more pleasant experience using Kodex as well. So I think Kodx wins on this. I one-shotted it in the true sense where I just gave it a single prompt and Kodix didn't ask for any permissions. It just kind of went on and did the thing.
Starting point is 00:16:27 I saw it, it's thinking. and at points where it was unsure, it thought amongst itself and then made the decision to progress forwards. Whereas with Claude Code, it would come to me. Now, that might just be a developer engineer's preference, right? Like, if you're building a production ready app for, like, I don't know, a big company that you work for,
Starting point is 00:16:47 you probably want to have more hands-on involvement. Whereas if you're just building a game like we did today where I don't really care what it ends up looking like or what it does, then the hands-off preference is probably something that you would use codex for. But I think Codex wins this. So for our second demo, we have this handwritten piece of paper that I actually wrote and took a picture of. I didn't. It's GPT.
Starting point is 00:17:08 I'm a shot 2.0. But it looks like it's handwritten. The handwriting was too nice, Josh. That was the giveaway. Yeah, my handwriting is far sloppier than this. But the idea is that you can even write things on the back of a napkin and you could turn that into an application. So what we did here is we just asked for it to create a generic limitless dashboard application on the back of a piece of paper, fed it into the model.
Starting point is 00:17:26 And this is what we got. So it looks like it did a pretty good. good job. I could tell this is Claude before you even tell me which model is because it has the standard design principles. Clod design is so basic. And it's so predictable where like, okay, I've seen this dashboard before. It looks like it was a mission success. There's a lot of text on this page and a lot of stuff going on. But I give it a lot of credit for kind of inferring what we would want to be seeing from something like this where we have a proper trip budget. I don't think we ask for a trip budget. But okay. I think.
Starting point is 00:17:59 It looks like it made it did a lot of inferring, right? Like it kind of made a lot of assumptions, but in the end of the day, it did take what we had on the napkin and it turned it into a pretty generic dashboard of sorts based on a very limited information that we gave it. I think the issue with this is we asked for something completely different.
Starting point is 00:18:16 It created a dashboard, but we asked it for it to be based around the limitless podcast and it created a travel planning board. So I don't know whether that was a prompt issue or whether we just fed it the wrong image. But here we go. Here is where we're at. Now, let's take a look at what OpenAI did.
Starting point is 00:18:35 Okay, so here we have the same prompt fed into GPD 5.5. And it's funny, I can instantly tell this is GPD 515 because it's cleaner and it's not neon and it's not trying to go for some futuristic spin. It looks very simplistic. This is actually a website or app that I would probably be more inclined to engage with. It's also more visually perceptive to me, right? like what do I have at the front here? It's this five-day trip that, you know, I want to go on.
Starting point is 00:19:03 It's giving me the basic information that I need to know at the start. It has a bunch of different tabs as well. But again, it isn't what I specified on the napkin. So I think this might be a skill of sure on our side, Josh. But otherwise, like, look at these graphics. They're like really good. One thing I've noticed is stylistically, although both models create very different looking things,
Starting point is 00:19:22 the animation style looks the same. Have you noticed that? even with the game previously that we just demoed, the avatar looked the same. It was given the same sort of title and the objects interacted in the same way. We're seeing this here. So maybe it's just a change in quality.
Starting point is 00:19:39 I actually prefer GPT 5.15 on this one. Yeah, this is crazy. I'm just going to suspect there was a prompt issue there. Yeah. Like, clearly we asked for something that we didn't actually want. But here it is, I think if you're just comparing them apples to apples, chat GPT and Codex is like no-brainer 10 times better. I far prefer this.
Starting point is 00:19:58 If you look at the original napkin photo, this is much more accurate to what the design look like on that original piece of paper. And then if you also just compare the general design, this is far easier to understand. It's just a lot less dense,
Starting point is 00:20:09 it's designed better. I wouldn't even say this is really a fair comparison. It seems like Codex just like completely crush this. And it has all the functionality built in. It looks good. I am giving another win to Codex here. That's two for two.
Starting point is 00:20:21 Wow, look, I've got like a reoptimization toggle at the top and it actually updated. I wonder where it's pulling that data from. And it's already hooked into data. Look at that. Yeah, impressive stuff. Very, very cool. Now, one major reason why both of these models have advanced so rapidly over the last
Starting point is 00:20:37 couple of months is something known as the AI model harness. Now, you have the AI model, which is something that you and I have interacted with quite a lot. It's via chat GPT or Claude itself. But there's an added layer that you can put on top of this model, which comes in the form of prescripted prompts that are engineered to make the model act in a particular way. But it's also the environment that the model works in. It's also the policies that you set to make sure that the model acts and behaves and sounds in a particular way.
Starting point is 00:21:08 That's why we talked about Claude's personality earlier being better than chat chibati. It all plays into the product experience. And what we figured out was it's an entirely new product category on its own. In fact, Cursor had some news over the last couple of days where they made their harness, Cursor SDK, available via API. And the reason why this is such a big deal is critics criticized Cursor for being an AI rapper, which meant that Cursor doesn't have a model of its own. It would just create this harness, a set of prompts and environments around, say, Claude or ChatGPET. And so people would say, Cursor isn't actually special.
Starting point is 00:21:47 Turns out the wrapper or the harness actually made these models way more intelligent. In fact, if you added cursus harness on top of GBT 5.5 and Claude Opus 4.7 right now, you end up with a smarter, more intelligent, more efficient model than the actual base models themselves. Now remember, AI Lab spent hundreds of millions of dollars to train these models and to create the best thing and put their best foot forward. And still, you have a startup which is worth, what is it now, $10 billion right now, potentially being acquired by XAI for $60 billion, creating a better model on top.
Starting point is 00:22:19 So the harness in the AI model are arguably one and the same at this point. And it's just a valuable mode to point out that these models aren't just better at coding because of the base model itself. It's because of this thing known as a harness. Yeah. And the harness is the difference maker when it comes to building this super app. It's like every single company is trying to build the super app, the all in one application that kind of serves as your operating system. Anytime you need to engage with AI, this is the place that you could do it. and it's all encompassing. It's all in one.
Starting point is 00:22:47 Now, one of the best applications we've seen for this in the early days has been something like OpenClaw, where it's this extension of what an operating system could look like, starting with AI at the foundation. And OpenClaught did a really amazing job of that. Now, in some news this week, you can now use your chat UPT account to generate tokens with OpenClau. So previously you had to use the API, whether you were using Anthropic or OpenAI or any of the other models, and it was pretty expensive. It costs a lot of money. Now, thanks to Sam Altman this week announcing, you can actually use your account connected with it.
Starting point is 00:23:17 And I think this is the beginning of a multi-step plan to really integrate OpenClaught directly into Codex in a way that Anthropic can't. Because if you'll remember, OpenAI owns OpenClaw. They bought Peter and Granted OpenClaught will stay open source forever, but they have the ability to actually integrate directly into their products, and I suspect that's what we're going to see. In fact, we even got some confirmation from another post
Starting point is 00:23:38 from one of the Codex developers who replied to a post that was saying, Codex only needs a native editor, an iOS app, a full browser, and open claw, and the developer, Tebow, said, all of this and more is coming, to which Sam Altman retweeted it. So we are indeed getting open claw inside of Codex. We're getting a mobile iOS app so that you can access it remotely. And soon, there's going to be no reason to really use a different app because it's going to be all-encompassing. Now, are there still downfalls? Yes. Computer use, 20% faster on Codex, but yesterday I was playing around with it. I told it to increase the volume of my music. And it took 10 minutes to do. do it because it tried to increase the slider on Spotify, even though it was max, without actually
Starting point is 00:24:17 increasing my system audio. So it's still a little dumb, but it is getting better. And I think this leads me to this post that I really love the vanilla maxing post we have to talk about, which starts by saying, you should 100% be vanilla maxing. Just use the tools as they're handed to you. That's it. Because a lot of people, and I've found this personally, and in fact, I've been caught by this personally, is that you try to get caught up and using all these different repos and these skills and these plugins, when the reality is, is if you just wait, the AI labs are shipping fast enough, they'll just integrate it into your own native application. So I'm vanilla maxing, you, Jess. I'm totally vanilla maxing as well, dude. Like, listen, OpenClaw, when it was hyped up,
Starting point is 00:24:56 was incredibly impressive and still is incredibly impressive. It opened up an entirely new product market and segment. That's why Open Air acquired them. But something's majorly changed over the last couple of months, which is open claw's kind of fallen off. No one talks about it. anymore. People who were complaining about the errors and bugs that we're facing have kind of gone silent because they've just grown bored and they don't want to put their energy and effort into it. And the reason why is because although these tools are very frontier level, they can't actually be scaled to a practical use. You don't feel safe integrating open claw into your desktop where you have personal files. I've seen horror stories where they access credit card data and exposed that or where they
Starting point is 00:25:36 deleted old wedding photos and the wife was super angry, bunch of the stuff like that. If you are able to get given or access to a tool that comes under a branded reputation, such as chat GPT, Codex or Claude Co-work where it kind of like takes over your computer, but in a sandboxed environment, I know that Nvidia also released NemoClaw, which is like the enterprise-grade, secure version of OpenClau, you're vanilla maxing. That is the way to do it, and there's no need to rush ahead and lose all your data as a consequence. So that's basically it for the episode. We wanted to give you a comprehensive guide and insight into Codex GPT 5.5 versus Claude Opus 4.7. There's a lot of numbers in there, but basically the best coding models from both sides to see which is better.
Starting point is 00:26:21 And the truth is there isn't a clear winner right now. I would say it's probably Codex GPT 5.5, but the narrative switched so recently that maybe, maybe Claude can still catch up. And the only reason why I say that, Josh, is there's a model that we haven't discussed or demonstrated yet because we could. can't. It's called Claude Mythos. It was kind of pseudo-released about a few weeks ago, and on all benchmarks, it is technically better than 5.5. But the reason why we can't demo it is we can't get access to it. And the reason cited by Anthropic was because it's too dangerous. It's a cybersecurity risk. In fact, it wasn't just Anthropic saying it. It was Peter Hesketh of the U.S. Department of War also saying this, right? So there's concerns around that.
Starting point is 00:27:05 Open Air has created a mythos level type model here, but has made it available to everyone. And so the argument could be made that it's just because Anthropic doesn't have enough compute. So there's a lot of rumors around this, but I'm excited to get my hands on the best models from each of these and compare them directly. Yeah, and the compute's actually been degrading. So I think I want to wrap this up on, like, what do you actually currently use? What is the limitless production stack? How are we using these AI models? And for me, at least, it's not even close. I'm codex-pilled. I'm fully switched over. I am codex superior domination. It's going to be the month of codex. Maybe Anthropic will have a comeback, but that's not happening until at least
Starting point is 00:27:39 June, July, because this month is Codex month. So I've been using Codex for basically everything, all of the difficult tasks that I need. What I have found is that GPT 5.5 as an LLM, as a language model, as a chatbot, is a little bit inferior to Opus 4.7, which I believe to be the better model. If you're just chatting with an AI, I like its personality, it's warmer, it's more precise. It normally gets the idea of what I want. So if I am building a complex project, Opus 4.7 is the orchestrator and Codex is the actual implementer, the executor of this code, of this plan. I've also noticed that Opus 4.7 is a bit inferior to 4.6 at a few things. And I think this is another piece of alpha here. I actually use Opus 4.6 whenever I'm doing anything relating to writing
Starting point is 00:28:22 or word ingestion. So one of the projects I've been doing recently is Andre Carpathy, he created this like wiki for your own person, where it ingest files and it kind of writes these summaries for you and it creates a personal knowledge wiki. I use Opharmatty. I use Opharty, he created a 4.6 exclusively for that because opus 4.7, I think, is far inferior at summarizing and kind of rewriting these topics that I use in my obsidian. So that's kind of my stack. I use opus for LLMs, codex for everything else. It's just what are you currently optimizing for? What do you plan with here? So it's two things. When I have a, uh, my stack is actually way more diverse when it comes to just like the research side of things, only because I'm using the AI that's like
Starting point is 00:29:00 available readily wherever I am, right? So if I'm on X a lot and I see breaking news, I'm just tapping GROC because honestly, it's a recent model. I think it's like GROC, what is it, 4.3 at this point is actually pretty good. And they have multiple agents that are kind of like running at this, right? But for the core bulk of the work, I've started shifting towards GPD 5.5 for the research, because 5.5 research, it thinks for so much longer and it has a much more in-depth discussion. In fact, I tested it out today because I was curious about the AI power stack and what stocks I should be investing in to get exposure to the power grid lines that are currently constraining AI data centers, right? And I was like, all right, I gave a detail prompt
Starting point is 00:29:40 to both Claude Opus 4.7 and 5.5 completely cooked 4.7. And it gave good reasoning why, whereas 4.7 did not allow to kind of like ask it more questions. So all in all, I think 5.5 is my preference right now. I still use 4.7 because of the personality. It's like less of an AI type of voice versus GPD 5.5. But again, I feel like open air is on a generation one right now, and they might just kind of fix this in the next couple of hours at this point. Yeah, it's coming, it's coming quick. And I think now is a good time to kind of get familiar with Codex to understand the way it works. And as they implement these features, you'll be able to adopt them within the hour, within the day. It's pretty amazing. And it's been fun to
Starting point is 00:30:21 just experiment. It's been fun to try something new. And it's, again, competition is just better for everyone. So the end winner of this is the user. Because for as as $20 a month you get access to all this frontier intelligence, all these capabilities, and it's really been unbelievable to watch. So that is the comparison, Codex versus Opus. If you have not tried both of them, I encourage you to give it a try, test the prompts against one another if you have any type of work that you need. If you're working on a computer at all, chances are you can use AI to help you do your job even better. Or you could just use it to help you do hobbies and side projects that you've always wanted to do. So give it a try. Let us know
Starting point is 00:30:56 your preference codex cloud code which one is it going to be um i think that's probably it for the episode thank you guys so much for watching if you enjoyed it please don't forget to share with your friends let them know which model they picked and also don't forget to rate a five stars on your favorite podcast listening platform any final thoughts each guys before we we go no that's it thank you so much for listening and we'll see you on the next one

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.