Limitless Podcast - Everyone Needs to Use OpenAI Codex... Until Claude Mythos Comes Out
Episode Date: May 5, 2026Let's examine the fierce competition between AI coding tools Anthropic's Claude and OpenAI's Codex. As Codex emerges with robust updates, we discuss user experiences and showcase demos compar...ing game development and dashboard creation. Highlights include Codex's superior interface and innovative features like auto-review and Chronicle. We also explore the broader implications for AI integration in coding tasks.------🌌 LIMITLESS HQ ⬇️NEWSLETTER: https://limitlessft.substack.com/FOLLOW ON X: https://x.com/LimitlessFTSPOTIFY: https://open.spotify.com/show/5oV29YUL8AzzwXkxEXlRMQAPPLE: https://podcasts.apple.com/us/podcast/limitless-podcast/id1813210890RSS FEED: https://limitlessft.substack.com/------TIMESTAMPS0:00 Claude vs Codex3:25 Image Generation Capabilities4:52 Long Horizon Autonomy8:55 Chronicles10:19 Demo16:49 Dashboard Creation Challenge20:30 The AI Model Harness Explained24:27 The Future of AI Tools26:20 Claude Mythos27:26 Verdicts------RESOURCESJosh: https://x.com/JoshKaleEjaaz: https://x.com/cryptopunk7213------Not financial or tax advice. See our investment disclosures here:https://www.bankless.com/disclosures
Transcript
Discussion (0)
A few months ago, we told you to use Claude.
Now, we're telling you to switch back, because for those of you who aren't familiar,
well, over Christmas break, there was a major vibe shift.
We're AI coding went from this, like, fun tool to things that developers actually use when
they're shipping code.
And even if you're not a developer, the amount of use cases and applications that were
created around that time were really strong.
And since then, Anthropic has gone on this generational run of shipping these incredible
products seemingly every single day that has turned Cloud Code into this supercharged
super app that is the place that EJAS, I know you've gone to, I've gone there,
two in order to get all of our AI progress done, any work that we have, we've gone to cloud
code. Now, Open AI has woken up, and over the last few weeks, Codex has shipped more features
than most companies ship in a year, and I bet, I guarantee that you haven't heard of some
of these features that we're going to talk about in this episode. The pendulum has fully swung back,
or at least I believe, because I'm totally codex-pilled. And in this episode, we're going to
kind of walk through the differences between these two and why the model that's using today
probably won't be the model you're using tomorrow. And I don't think we're going to convince you,
but maybe we could show you why you might want to consider using something else here.
I just want to talk through some of the crazy stats here because the script has genuinely flipped.
A few months ago, Claude Code was anything everyone could talk about,
and every software engineer was using ClaudeCode, every enterprise was installing it.
It was crazy.
But just over the last couple of weeks, specifically by the end of April,
chat GPT 5.5 was released, and that was plugged into the coding AI model.
It's all one and the same.
And Open Air went on this Code Red Run,
where they focus on nothing but building the best coding AI model
and the best LLM.
And the numbers showed that it's worked.
Over the last week,
Codex has been downloaded over or installed over 46 million times.
Called code, under 500,000 times.
Now, that is crazy to say,
because if you look at the historical data,
called code downloads and installs has absolutely dwarfed codex,
but something changed over the last couple of weeks.
That something was Open AI putting out just a better model.
You mentioned that you were Codex pill, Josh.
I think so am I.
I've spent the last couple of days playing around with Codex.
This morning we prepped a bunch of really cool demos,
and it is just completely flipped script.
But it's one thing saying it.
It's another thing actually showing the direct comparison.
So we created this visual artifact to kind of give you the scoreboard.
And you can see it at the top here.
It's opening AI Codex at 11 Anthropic Claude at 2.
But let me explain why.
Okay.
So, number one, computer use.
Codex and Claudecotech.
can use your computer.
It can take over your desktop
and it can move your cursor around.
Now, Claude pioneered it.
There were the first ones there.
But it was super slow.
It kind of runs into a bunch of obstacles
and you have to kind of like handhold it
and improve it to do a bunch of different things.
Codex is not only quicker than me.
It's quicker than the average person.
In fact, I can actually see the cursor move around so quickly
and it's like using a computer,
but it's a superhuman and it can run pretty much 24-7
at this point.
Long Horizon Autonomy.
Codex can work for longer in a much more intelligent manner versus Claude code,
which is, again, crazy to say because literally a month ago,
it was the inverse of this.
Claude right now can run for a decent number of times or amount of time,
but not as long as Codex can.
And then the last two that I want to talk about here is browser use.
So Codex can take over your browser.
It can do a lot more intentional things.
It understands what it's looking at very importantly.
Previously, it could not do that.
Claude can do the same, but not as intelligent.
And then finally, chat GPT Images 2.0 got released, what was it, like two weeks ago now.
Oh, it's so good.
Yeah, it's the image generation model from OpenAI, and it is absolutely astounding.
In fact, it beat all the other predecessors, including Google's, what is it, Nanobanana 2.0 Pro, which previously held the lead.
It beat it across every single benchmark.
Anthropic, on the other hand, doesn't even have an image gen model.
So so far, it's crushing.
Yeah, I think a lot of the best is now bundled into codex. The image gen for anyone who uses any
sort of visual work is unbelievable. And being able to use that directly in your software is
awesome. One thing that you mentioned is the long horizon autonomy. I think that needs a double
clicking on because it's really impressive how well it works. Traditionally, there's been this thing
called a Ralph loop that we use. It's actually named after the character from the Simpsons,
who is very persistent. And it's basically a planning mode where you give the AI a goal and it
will continue to iterate towards that goal until it accomplishes it. So like, let's say you want to
build a Lego car or something and you give it the exact parameters, it will go and go and go
until it solves that problem and gives you exactly what you want in a way that other AI models
haven't. Codex did that. And this is the only native implementation that you can get of this long
horizon thinking where it actually will go for days on end. I've seen screenshots of some thinking
for as long as 36 hours to accomplish the goal. So if you have really difficult tasks,
Codex is going to be really good at solving those. Now, continue to scroll down. There was another
feature that was just released this week called auto review. And a huge pain in the ass for people
who are creating code for working on complex projects, whatever it may be, is you're constantly
having to sit there and approve things because the permission system's a little finicky, right? You
don't want to give it full access to your computer, but you also don't want to sit there and approving
every time it wants to use Chrome or every time I want to access your file. So Codex created
auto review, and they rolled it out last week where the agent is kind of smart. It knows which
things are going to possibly be systemic existential threats and which approvals aren't,
and it will just automatically approve all the things that aren't going to get you in a lot
of trouble. It creates a much easier user interface where you can just kind of walk away from the
computer for a little while and come back and things get done. Memory in context is pretty strong.
I'd say the one thing, and we haven't mentioned many Claude winners, the place where Claude wins
currently is on their OpenClaw capability, funny enough because OpenAI bought OpenClaw. But Dispatch
is the mobile app feature for Claude in which you can actually,
engage with Claude Code remotely, that doesn't currently exist on Codex. And while the team has
promised to ship that, you don't actually have that currently today. Claude has that. Also, in terms
of the personality and UI, Cloud is just so much better. I think we're going to get into our personal
takes, but whenever you're using an LLM versus an actual tool set or a harness, Clod is pretty great
and the UI is very warm. So there's some kind of instances in which Claude is better, but for the most
part, Codex is really just kind of crushing it. And I've really enjoyed using it. One of the fun
things is pets. I mean, just recently they released pets. And Cloud also released pets, but these pets are
a little bit different. This is an example of Angry Dario we're seeing on the screen. And it's fun,
because you have this persistent character that exists throughout your computer use. And as you're
engaging with Codex, it'll just kind of chat with you in the background so you could see your progress,
see where you're at. It's fun, it's playful. And it just shows that they kind of care about the user
experience. Now, one feature I would guarantee most people don't know is Chronicle, EJAS. And you were just
telling me about Chronicle and how cool it is, how it kind of monitors your screen.
green as you go. This seems like novel technology that we haven't seen yet. Yeah. So one of the earliest
episodes that we did here on Limitless was an interview with the folks at OpenAI that created
something called, what was it called, Josh? Do you remember? It was like agent mode or personal mode,
something like that. Yes. It thought overnight for you, right? Yes. It basically took all the
conversations that you had with chat GPT the night before or the day before or the week before,
and it created important context around you
in the form of something called memories.
This is where AI memory was birthed
from Open AI themselves from the Open AI team.
And what it would do is it would feed you a report
in the morning that would update you on information
that it thought you would be interested to read about.
So say, for example, you were interested in the stock market,
it will give you an update on a bunch of advancements
that had happened overnight or over the last week
or whatever it might be, right?
Now, fast forward today, memory is embedded
across every single AI model and tool.
The reason why is context is so important.
It's one thing a user asking for something explicitly and directly.
It's a complete other thing for an AI to actually understand what you mean,
the nuance in the sentence that you've created,
and even better to predict what you want.
But there was still an obstacle, which was you needed to feed it the context and say,
Hey, Claude, hey, chat, JPD, can you remember this?
OpenAI recently released a feature called Chronicle,
where it observes what you scroll through,
what you click on, what you type,
and it builds its own context and memories around you
without you needing to feed it,
which actually led to a really cool prompt
that you pointed out, Josh,
or that you found,
which was,
what have I been doing very inefficiently
on my computer,
according to Chronicle,
which is this new memory feature,
make some recommendations,
be direct,
tell me what I need to hear.
That's pretty awesome.
Yeah,
so this is alpha,
because I don't think a lot of people
recognize that this is a possibility because Codex and OpenAI didn't do a good job of explaining
this. When they released Chronicle, they said it's a way of the system to review your code as you've
gone because it's been taking sequential screenshots. But the reality is that it's much bigger than
this. And I suspect they didn't market it this way because it could be a bit of a privacy issue.
But it's essentially constantly monitoring your screen and taking screenshots of what's happening
on your screen and interpreting it so it understands your habits, the way that you work, the thing that
you do. And then you can ask it, what have I been doing very inefficiently?
on my computer, according to Chronicle, make some recommendations, be direct, tell me what I need to
hear. And it'll actually evaluate how you've been using your computer, how long you've been
scrolling on Twitter, perhaps, how long you haven't been doing the things you're supposed to be
working on, or just generally how to improve your workflow and give you real feedback based on
your actual actions that it's seen. And I think this is a super powerful thing currently only
available to pro members. So if you pay for the $100, $200 a month subscription, you get access
to this. But I suspect this is the early signs of a very important feature they're going to roll out,
which is that entire computer monitoring system to improve your system
and also probably train the models to get better at engaging with your system.
But I found Chronicle to be one of those kind of secret features that not a lot of people know about,
but has a lot of upside if you use it to your advantage and let it monitor what you're doing
and improve your workflow on a day-to-day basis.
Yep.
So the point is from both of these companies, Anthropic and Open AI,
we are getting feature releases every single week, in fact, every single day.
And it's becoming, I'm being bombarded by this.
and it's hard to keep track with all of this.
So what is the number one litmus test
for both of these models and products and companies?
It's to actually use the thing.
It's to build the thing.
And we have two special demos that we have prepared for you
that we're about to jump into.
Now, Josh, can you guess what my first demo is about the theme?
First one's a game. We're gamers, man.
I want to play a game. I want to see how well it does on a game.
I know we did this demo in the past months ago.
It left a lot to be desired,
so I'm curious to see the current up-to-date status
as it relates to cloud code versus codex.
Who's winning on the one shot game prompt?
Indeed.
Okay, so I am a nostalgic kind of guy.
And so I was like,
back in the day, I loved Mario.
So I want you, both of these models,
to create the best Mario type or inspired game,
a side scroller, but make it futuristic.
Maybe add a little bit of neon,
sprinkle a bit of neon in there,
create levels.
I want game design.
I want there to be enemies.
I want there to be pitfalls.
And I also want there to be a scoreboard.
and also tell me how to do this thing.
Give me the whole package, basically.
I fed this prompt or idea into chat GPT and Claude,
and I said, can you create a detailed prompt
that I can then feed into your coding models?
I then set each of the coding models to their highest settings.
So what you're about to see is the best of the best
for the most detailed prompt that they came up with,
and let's see what they did.
So step number one or example number one is Claude Opus 4.7.
So this is called code at the highest setting
with their latest model.
Okay, it took the prompt pretty literally.
It's titled this Neon Plummer Moonbase Run, which is obviously Mario Inspired, and it said,
hey, this is a demo edition, by the way.
This is not production ready.
What I like about this is it's giving me the instructions, but how does the game actually play out?
Let's see.
It looks good.
Can you see me here, Josh?
I can.
Yes, I can.
And it looks like...
The animations are pretty good.
I'm jumping around.
I think I'm like a little robot.
I can see my feet pit-pittering.
Now, I'm guessing this thing is about to kill me.
So let's see if I can jump.
Oh, I can jump.
There we go.
That's awesome.
One bit, can I kill this guy.
Oh, yes, I can.
Now, one bit of feedback I've noticed is I can't double jump.
And it told me in the menu that I could double jump.
So that's weird.
So the physics hasn't really paid off.
Can I die?
Oh, it certainly looks like you could die.
I can die.
Great.
Okay.
So that is Claude's attempted it.
What's your feedback on this, Josh?
I think the graphics are pretty good.
The graphics are great.
For one shot, I mean, granted, this is only one single prompt.
So for one prompt, it created great graphics.
It had sound design that actually sounds pretty accurate to what you would expect in the game.
It has similar principles.
It's following gaming principles.
You kind of understand what looks dangerous, what doesn't.
You knew that those spikes were going to hurt you, and they hurt you.
The logic seems to be a little bit flawed.
I think it's having problems with gravity or at least that double jump functionality,
because it looks like those coins that you probably want to collect,
you can't actually reach because you can't do the double jump.
So in terms of logic, not so.
hot in terms of visuals, aesthetics, in terms of, I mean, how good this game is from one shot,
very impressive. Yeah, I think it's important to understand that I started from zero. It literally
asked me to give it a folder to build in, and the folder was completely empty. So all the visual
renderings, all the graphics, the animation style, the scoring system, the way that the avatar
moves and looks was created from scratch from a bunch of characters from this AI model. So this is
Claude Codd's current best attempt, and it is way better than what we tested out and honestly
demoed on this show about a month ago. But now let's see what OpenAI's chat GPT 5.5 codex at the highest
possible setting cooked up. Okay. And this is using the same prompt. So you just fed the model
the same prompt and now we're going to see the output. Identical. All right. Oh, God, I'm excited.
I hope codex did well because now that I'm a fan, I'm gassing it up, it better perform here.
Okay, so this is GPT 5.5's attempt. Now you might notice that this isn't the entire browser. That's
because Codex has a very unique feature,
which is not only can it do all the coding in a single app for you,
but it has an in-app browser.
So it can live test the thing in the app
without you needing to go to Google Chrome or whatever.
But anyway, we have the starting screen here.
It has also called it neon plumber moonbase run.
It looks a little more rudimentary from the start,
but I do like the background animation, Josh.
We didn't get this in the previous one,
or at least not this side-scrolling thing.
Well, let's...
Oh, oh, this is nice.
This is nice
I think this has good logic
Wait but this is no music
There's no music
I can't double jump
Might be a skill issue
Might be a prompt issue
Let's have a look
Did it say you can double jump
That's a good question actually
I mean this is looking
This is a fully playable game
Yes this is and I like that it's like zoomed in
There's like
Oh we got the boost
I can jump on the platforms
Let's see if I can kill this guy
Yes
Nice okay and can I jump to the gap
There's a scoring system.
And you could see your hearts.
Oh, dude.
This is way better.
Power up.
Wait, oh my God, I want the power up.
I'm still going to go back.
I can't double jump.
No, you can.
You could go back.
Go back to the last platform.
Oh, God, I died.
I'm going.
I'm going to the last platform.
Here we go.
It looks like they're sequentially gaining height, which is interesting.
Oh, but okay, so if I'm comparing these two, I'm actually, I'm not feeling
very let down.
This is good.
Aside from the music not existing, which we may not have explicitly asked.
It looks like the logic plays better.
The actual gameplay is usable.
this is a full, I don't know if it's glitching or if this is you glitching.
No, no, that is, it's glitching.
It's glitching a bit.
Okay.
So it's still, there are some edge case errors.
Yeah.
But this is different in the sense that you have your hearts clearly projected.
You have a score system that's clearly in place.
You're able to get these powerups.
They work.
They function.
I mean, this is a very clean and functional game.
So I would give this to Codex.
I think the experience, perhaps the design of Claude was better.
And perhaps the music, I mean, music was definitely better versus none.
But Klaudex in terms of just, or Kodex in terms of just coding logic and making a better game.
I give this codex.
Do you have a take?
Yeah.
So on the build side of things, I had a much more pleasant experience using Kodex as well.
So I think Kodx wins on this.
I one-shotted it in the true sense where I just gave it a single prompt and Kodix didn't ask for any permissions.
It just kind of went on and did the thing.
I saw it, it's thinking.
and at points where it was unsure,
it thought amongst itself
and then made the decision to progress forwards.
Whereas with Claude Code, it would come to me.
Now, that might just be a developer engineer's preference, right?
Like, if you're building a production ready app
for, like, I don't know, a big company that you work for,
you probably want to have more hands-on involvement.
Whereas if you're just building a game like we did today
where I don't really care what it ends up looking like or what it does,
then the hands-off preference is probably something that you would use codex for.
But I think Codex wins this.
So for our second demo, we have this handwritten piece of paper that I actually wrote and took a picture of.
I didn't.
It's GPT.
I'm a shot 2.0.
But it looks like it's handwritten.
The handwriting was too nice, Josh.
That was the giveaway.
Yeah, my handwriting is far sloppier than this.
But the idea is that you can even write things on the back of a napkin and you could turn that into an application.
So what we did here is we just asked for it to create a generic limitless dashboard application
on the back of a piece of paper, fed it into the model.
And this is what we got.
So it looks like it did a pretty good.
good job. I could tell this is Claude before you even tell me which model is because it has the
standard design principles. Clod design is so basic. And it's so predictable where like, okay, I've
seen this dashboard before. It looks like it was a mission success. There's a lot of text on this
page and a lot of stuff going on. But I give it a lot of credit for kind of inferring what we would
want to be seeing from something like this where we have a proper trip budget. I don't think we
ask for a trip budget. But okay. I think.
It looks like it made
it did a lot of inferring, right?
Like it kind of made a lot of assumptions,
but in the end of the day,
it did take what we had on the napkin
and it turned it into a pretty generic dashboard of sorts
based on a very limited information that we gave it.
I think the issue with this is we asked for something completely different.
It created a dashboard,
but we asked it for it to be based around the limitless podcast
and it created a travel planning board.
So I don't know whether that was a prompt issue
or whether we just fed it the wrong image.
But here we go.
Here is where we're at.
Now, let's take a look at what OpenAI did.
Okay, so here we have the same prompt fed into GPD 5.5.
And it's funny, I can instantly tell this is GPD 515 because it's cleaner and it's not neon
and it's not trying to go for some futuristic spin.
It looks very simplistic.
This is actually a website or app that I would probably be more inclined to engage with.
It's also more visually perceptive to me, right?
like what do I have at the front here?
It's this five-day trip that, you know, I want to go on.
It's giving me the basic information that I need to know at the start.
It has a bunch of different tabs as well.
But again, it isn't what I specified on the napkin.
So I think this might be a skill of sure on our side, Josh.
But otherwise, like, look at these graphics.
They're like really good.
One thing I've noticed is stylistically,
although both models create very different looking things,
the animation style looks the same.
Have you noticed that?
even with the game previously that we just demoed,
the avatar looked the same.
It was given the same sort of title
and the objects interacted in the same way.
We're seeing this here.
So maybe it's just a change in quality.
I actually prefer GPT 5.15 on this one.
Yeah, this is crazy.
I'm just going to suspect there was a prompt issue there.
Yeah.
Like, clearly we asked for something that we didn't actually want.
But here it is, I think if you're just comparing them apples to apples,
chat GPT and Codex is like no-brainer 10 times better.
I far prefer this.
If you look at the original napkin photo,
this is much more accurate
to what the design look like
on that original piece of paper.
And then if you also just compare
the general design,
this is far easier to understand.
It's just a lot less dense,
it's designed better.
I wouldn't even say this is really
a fair comparison.
It seems like Codex just like completely crush this.
And it has all the functionality built in.
It looks good.
I am giving another win to Codex here.
That's two for two.
Wow, look, I've got like a reoptimization toggle at the top
and it actually updated.
I wonder where it's pulling that data from.
And it's already hooked into data.
Look at that.
Yeah, impressive stuff.
Very, very cool.
Now, one major reason why both of these models have advanced so rapidly over the last
couple of months is something known as the AI model harness.
Now, you have the AI model, which is something that you and I have interacted with quite a lot.
It's via chat GPT or Claude itself.
But there's an added layer that you can put on top of this model, which comes in the form
of prescripted prompts that are engineered to make the model act in a particular way.
But it's also the environment that the model works in.
It's also the policies that you set to make sure that the model acts and behaves and sounds
in a particular way.
That's why we talked about Claude's personality earlier being better than chat chibati.
It all plays into the product experience.
And what we figured out was it's an entirely new product category on its own.
In fact, Cursor had some news over the last couple of days where they made their harness, Cursor SDK, available via API.
And the reason why this is such a big deal is critics criticized Cursor for being an AI rapper,
which meant that Cursor doesn't have a model of its own.
It would just create this harness, a set of prompts and environments around, say, Claude or ChatGPET.
And so people would say, Cursor isn't actually special.
Turns out the wrapper or the harness actually made these models way more intelligent.
In fact, if you added cursus harness on top of GBT 5.5 and Claude Opus 4.7 right now,
you end up with a smarter, more intelligent, more efficient model than the actual base models themselves.
Now remember, AI Lab spent hundreds of millions of dollars to train these models
and to create the best thing and put their best foot forward.
And still, you have a startup which is worth, what is it now, $10 billion right now,
potentially being acquired by XAI for $60 billion,
creating a better model on top.
So the harness in the AI model are arguably one and the same at this point.
And it's just a valuable mode to point out that these models aren't just better at coding because of the base model itself.
It's because of this thing known as a harness.
Yeah.
And the harness is the difference maker when it comes to building this super app.
It's like every single company is trying to build the super app, the all in one application that kind of serves as your operating system.
Anytime you need to engage with AI, this is the place that you could do it.
and it's all encompassing. It's all in one.
Now, one of the best applications we've seen for this in the early days has been something like OpenClaw,
where it's this extension of what an operating system could look like, starting with AI at the foundation.
And OpenClaught did a really amazing job of that.
Now, in some news this week, you can now use your chat UPT account to generate tokens with OpenClau.
So previously you had to use the API, whether you were using Anthropic or OpenAI or any of the other models,
and it was pretty expensive. It costs a lot of money.
Now, thanks to Sam Altman this week announcing,
you can actually use your account connected with it.
And I think this is the beginning of a multi-step plan
to really integrate OpenClaught directly into Codex
in a way that Anthropic can't.
Because if you'll remember, OpenAI owns OpenClaw.
They bought Peter and Granted OpenClaught will stay open source forever,
but they have the ability to actually integrate directly into their products,
and I suspect that's what we're going to see.
In fact, we even got some confirmation from another post
from one of the Codex developers who replied to a post that was saying,
Codex only needs a native editor, an iOS app, a full browser, and open claw, and the developer, Tebow, said,
all of this and more is coming, to which Sam Altman retweeted it. So we are indeed getting open claw
inside of Codex. We're getting a mobile iOS app so that you can access it remotely. And soon,
there's going to be no reason to really use a different app because it's going to be all-encompassing.
Now, are there still downfalls? Yes. Computer use, 20% faster on Codex, but yesterday I was playing
around with it. I told it to increase the volume of my music. And it took 10 minutes to do.
do it because it tried to increase the slider on Spotify, even though it was max, without actually
increasing my system audio. So it's still a little dumb, but it is getting better. And I think this
leads me to this post that I really love the vanilla maxing post we have to talk about, which starts
by saying, you should 100% be vanilla maxing. Just use the tools as they're handed to you. That's it.
Because a lot of people, and I've found this personally, and in fact, I've been caught by this
personally, is that you try to get caught up and using all these different repos and these skills and
these plugins, when the reality is, is if you just wait, the AI labs are shipping fast enough,
they'll just integrate it into your own native application. So I'm vanilla maxing, you, Jess.
I'm totally vanilla maxing as well, dude. Like, listen, OpenClaw, when it was hyped up,
was incredibly impressive and still is incredibly impressive. It opened up an entirely new product
market and segment. That's why Open Air acquired them. But something's majorly changed over
the last couple of months, which is open claw's kind of fallen off. No one talks about it.
anymore. People who were complaining about the errors and bugs that we're facing have kind of gone
silent because they've just grown bored and they don't want to put their energy and effort into it.
And the reason why is because although these tools are very frontier level, they can't actually be
scaled to a practical use. You don't feel safe integrating open claw into your desktop where you have
personal files. I've seen horror stories where they access credit card data and exposed that or where they
deleted old wedding photos and the wife was super angry, bunch of the stuff like that. If you are
able to get given or access to a tool that comes under a branded reputation, such as chat
GPT, Codex or Claude Co-work where it kind of like takes over your computer, but in a sandboxed
environment, I know that Nvidia also released NemoClaw, which is like the enterprise-grade,
secure version of OpenClau, you're vanilla maxing. That is the way to do it, and there's no
need to rush ahead and lose all your data as a consequence. So that's basically it for the episode.
We wanted to give you a comprehensive guide and insight into Codex GPT 5.5 versus Claude Opus 4.7.
There's a lot of numbers in there, but basically the best coding models from both sides to see which is better.
And the truth is there isn't a clear winner right now.
I would say it's probably Codex GPT 5.5, but the narrative switched so recently that maybe, maybe Claude can still catch up.
And the only reason why I say that, Josh, is there's a model that we haven't discussed or demonstrated yet because we could.
can't. It's called Claude Mythos. It was kind of pseudo-released about a few weeks ago,
and on all benchmarks, it is technically better than 5.5. But the reason why we can't demo it is
we can't get access to it. And the reason cited by Anthropic was because it's too dangerous.
It's a cybersecurity risk. In fact, it wasn't just Anthropic saying it. It was Peter Hesketh
of the U.S. Department of War also saying this, right? So there's concerns around that.
Open Air has created a mythos level type model here, but has made it available to
everyone. And so the argument could be made that it's just because Anthropic doesn't have enough compute.
So there's a lot of rumors around this, but I'm excited to get my hands on the best models
from each of these and compare them directly. Yeah, and the compute's actually been degrading.
So I think I want to wrap this up on, like, what do you actually currently use? What is the limitless
production stack? How are we using these AI models? And for me, at least, it's not even
close. I'm codex-pilled. I'm fully switched over. I am codex superior domination. It's going
to be the month of codex. Maybe Anthropic will have a comeback, but that's not happening until at least
June, July, because this month is Codex month. So I've been using Codex for basically everything,
all of the difficult tasks that I need. What I have found is that GPT 5.5 as an LLM, as a language model,
as a chatbot, is a little bit inferior to Opus 4.7, which I believe to be the better model.
If you're just chatting with an AI, I like its personality, it's warmer, it's more precise.
It normally gets the idea of what I want. So if I am building a complex project, Opus 4.7 is the
orchestrator and Codex is the actual implementer, the executor of this code, of this plan.
I've also noticed that Opus 4.7 is a bit inferior to 4.6 at a few things. And I think this is
another piece of alpha here. I actually use Opus 4.6 whenever I'm doing anything relating to writing
or word ingestion. So one of the projects I've been doing recently is Andre Carpathy, he created
this like wiki for your own person, where it ingest files and it kind of writes these summaries
for you and it creates a personal knowledge wiki. I use Opharmatty. I use Opharty, he created a
4.6 exclusively for that because opus 4.7, I think, is far inferior at summarizing and kind of
rewriting these topics that I use in my obsidian. So that's kind of my stack. I use opus for LLMs,
codex for everything else. It's just what are you currently optimizing for? What do you plan with
here? So it's two things. When I have a, uh, my stack is actually way more diverse when it
comes to just like the research side of things, only because I'm using the AI that's like
available readily wherever I am, right? So if I'm on X a lot and I see breaking news, I'm just
tapping GROC because honestly, it's a recent model. I think it's like GROC, what is it,
4.3 at this point is actually pretty good. And they have multiple agents that are kind of like
running at this, right? But for the core bulk of the work, I've started shifting towards
GPD 5.5 for the research, because 5.5 research, it thinks for so much longer and it has a much
more in-depth discussion. In fact, I tested it out today because I was curious about the AI power
stack and what stocks I should be investing in to get exposure to the power grid lines that are
currently constraining AI data centers, right? And I was like, all right, I gave a detail prompt
to both Claude Opus 4.7 and 5.5 completely cooked 4.7. And it gave good reasoning why,
whereas 4.7 did not allow to kind of like ask it more questions. So all in all, I think
5.5 is my preference right now. I still use 4.7 because of the personality. It's like less
of an AI type of voice versus GPD 5.5. But again, I feel like open air is on a generation
one right now, and they might just kind of fix this in the next couple of hours at this point.
Yeah, it's coming, it's coming quick. And I think now is a good time to kind of get familiar
with Codex to understand the way it works. And as they implement these features, you'll be
able to adopt them within the hour, within the day. It's pretty amazing. And it's been fun to
just experiment. It's been fun to try something new. And it's, again, competition is just better
for everyone. So the end winner of this is the user. Because for as
as $20 a month you get access to all this frontier intelligence, all these capabilities,
and it's really been unbelievable to watch. So that is the comparison, Codex versus Opus.
If you have not tried both of them, I encourage you to give it a try, test the prompts against
one another if you have any type of work that you need. If you're working on a computer at all,
chances are you can use AI to help you do your job even better. Or you could just use it to
help you do hobbies and side projects that you've always wanted to do. So give it a try. Let us know
your preference codex cloud code which one is it going to be um i think that's probably it for the
episode thank you guys so much for watching if you enjoyed it please don't forget to share with your
friends let them know which model they picked and also don't forget to rate a five stars on your
favorite podcast listening platform any final thoughts each guys before we we go no that's it thank you
so much for listening and we'll see you on the next one
