How I AI - Gemini 3 vs. Claude Opus 4.5 vs. GPT-5.1 Codex: Which AI model is the best designer?
Episode Date: December 3, 2025I put three cutting-edge AI models to the test in a head-to-head design competition. Using the exact same prompt, I challenged Google’s Gemini 3, Anthropic’s Opus 4.5, and OpenAI’s Codex 5.1 to ...redesign my blog page, evaluating them on visual design quality, user experience improvements, and SEO optimization capabilities. One model produced a beautiful, polished, production-ready redesign. One was fine. And one completely whiffed. If you’re trying to figure out where each model fits in your workflow—design, planning, back-end, or something else—this episode will save you a lot of trial and error.What you’ll learn:How each AI model approaches the same design challenge differentlyWhy planning capabilities dramatically impact design qualityThe specific visual and functional improvements each model madeWhich model excels at front-end design versus back-end functionalityHow to strategically choose the right AI model for different parts of your workflowThe importance of model-switching based on specific use cases—Blog design: https://www.chatprd.ai/blog—Brought to you by:Lovable—Build apps by simply chatting with AI—Where to find Claire Vo:ChatPRD: https://www.chatprd.ai/Website: https://clairevo.com/LinkedIn: https://www.linkedin.com/in/clairevo/X: https://x.com/clairevo—In this episode, we cover:(00:00) Introduction to the AI design challenge(01:25) The question: Which model is the better designer?(03:08) The prompt used for all three models(04:10) Gemini 3 Pro’s approach and results(06:00) Opus 4.5’s approach and results(10:54) Codex 5.1’s approach and disappointing results(14:51) Comparing the three designs side by side(16:03) Analyzing the change logs and SEO improvements from each model(22:43) Final verdict(23:00) Conclusion and next steps—Tools referenced:• Gemini 3 Pro: https://deepmind.google/models/gemini/pro/• Anthropic Opus 4.5: https://www.anthropic.com/news/claude-opus-4-5• OpenAI Codex 5.1: https://platform.openai.com/docs/models/gpt-5.1-codex• Cursor: https://cursor.com/—Production and marketing by https://penname.co/. For inquiries about sponsoring the podcast, email jordan@penname.co.
Transcript
Discussion (0)
Welcome back to How IAI. I'm Clairevaux, product leader and AI obsessive, here on a mission to help you build better with these new tools.
Today I have a really fun mini episode where I'm going to answer the question on everyone's mind.
Which of these new models is actually the best designer?
I'm going to take a page on my site that I don't think is particularly well designed and have Gemini 3, Opus 45, and Codex 51 duke it out and see which one can redesign.
my page, better, one shot. Let's get to it. This episode is brought to you by Lovable. If you've ever
had an idea for an app but didn't know where to start, Lovable is for you. Loveable lets you build
working apps and websites by simply chatting with AI. Then you can customize it, add automations,
and deploy it to a live domain. It's perfect for marketers spinning up tools, product managers
prototyping new ideas, or founders launching their next business. Unlike no-code tools,
Loveable isn't about static pages. It builds full apps with real functionality. And it's fast.
What used to take weeks, months, or even years, you can now do over the weekend. So if you've been
sitting on an idea, now's the time to bring it to life. Get started for free at lovable.dev.
That's lovable.com. If you've been paying attention to the last couple of weeks,
it seems like every single model provider has released a brand new coding model.
And what I heard the most from people is sure they're fast and sure they're great and sure they're beating benchmarks, but they are all really good at design.
If you've been on X or social media, you've probably seen these beautifully designed landing pages, apps, and user experience components generated using Gemini 3 or Opus 4.5 or even Codex 51.
And I thought, let's put these side by side and actually see which ones better at redesigning an existing page.
I think it's easy to one shot something and make it look beautiful, especially if you're a great
prompter and know exactly what to say as a designer.
But if you have an existing site and you want to make it better, who's your trusted design engineer?
Which of these models is really going to do the trick?
And I'm going to show you what I think today in a couple minutes on which of these models is the better.
designer or redesigner of a page that I don't think is really great. So this is the chat PRD blog. It is not
very good. I don't think this is a very beautiful site. It's not my favorite. I think it could be a lot
better. And it could be a lot better from a functional perspective, but it can also be a lot better
from a design perspective. And, you know, if I had a team, which I have a little small one, but if I had a
team that was not AI, I might send this to designer and say, hey, we just launched this early on. It's
great, can you redesign it? And so I wanted to test that flow with some of the new models that
have come out that have said that they are better designers than previous versions. And so I fired
up cursor and I did a model by model comparison of redesigns. And I used the exact same prompt,
exact same input code, and we're just going to see which one we think is the better designer.
So I'm going to show you my prompt here in cursor.
It was pretty straightforward.
It was this.
Redesign the blog page.
So I just showed it the directory of where our blog page is to improve both the visual appeal
and user experience.
So sort of both like will it look nicer and will it be functionally a little easier to use?
And then I added a functional component to it, which was add best practices for
SEO and navigation.
And then I did that for three different models.
did it for Gemini 3 Pro. I did it for Opus 4-5 for Anthropic and I did it from GPT-51 Codex. These are all
recently released models that have been said to be their best in-class models from OpenAI,
from Anthropic and from Google. And so we're going to see exactly what it did. And I started with
Gemini 3 Pro. The reason why I started with Gemini 3 Pro is I've heard over and over and over again,
what a great designer Gemini 3 Pro is. And I really wanted to see.
what it did. And so you can see here, it thought quite a bit about visual design user experience
SEO navigation. It looked at the code and it started executing. So it started writing some code.
And we're going to switch over and see exactly what it generated. So it generated this. This was the
before, if you recalls, very, very boring, not very good. And in the after, it generated a nice
hero image of the most recent blog post. So there's now this like highlighted blog post at the top.
And then these cards at the bottom. And a couple improvements I see here. There's some tagging here.
There's some date of releases. There's this nice hover effect that zooms in on our featured images
when you zoom in. Haven't done anything regarding pagination, which is a current functionality that
doesn't really take into account whether or not we have featured images and making that look good.
So there's some things there that could be improved, but I think overall it's pretty good.
One thing that I noticed that it did that I did not love is that there's this tag at the
very top of the page and it's just a little too tight with the rest of the navigation.
So one of my reflections here is, you know, it doesn't have like the full visual context
of the page, but it did a pretty nice job, and it was very fast. But I have to say, despite Gemini
3's reputation for being the best designer, it was actually not my favorite. So I ran the exact
same query in cursor with Opus 4.5. So if you look up here, redesigned the blog to improve both
the visual appeal in UX and add best practices for SEO and navigation. Now, the difference that I thought
was really interesting when using Gemini 3 versus Opus 4.5 is Opus 4-5 actually triggered a to-do list
inside cursor. So it did a tool call to create a to-do list and it gave a step-by-step flow
it was going to follow. So Gemini 3 sort of did that chain of thought reasoning and then just
you'll load code. Opus 4.
4-5 created four to-dos. So the to-dos were redesigned the blog listing page, improve the
blog layout, enhance the post display, and add comprehensive SEO, structured data, canonical
URLs, and meta tags. And so it was very precise step-by-step on what it was going to do
in terms of implementing. And so I think the planning capabilities of Opus 4-5 are certainly better.
I think Anthropic has really differentiated themselves as experts in coding models.
You know, if I wanted to get the best outcome here, I probably should have done this in Claude Code because I think there's some optimizations they've done there recently as well.
But I thought it was really interesting that the output of a planned implementation was much better than the output of a straight shot, one shot implementation.
And so you can see it went step by step and actually checked off those changes and then provided me a summary of changes.
And I'm going to switch and show you exactly what that looked like because I was actually impressed by the design.
So this is what we got from Opus 4.5, which I think, spoiler alert, from all the models was the most beautifully designed blog page that.
I got and also, honestly, the most functional from an SEO perspective. And so what you can see that Opus 4.5
did here is it pulled some images. We have a repository, a beautiful background images and featured
images that we used throughout the chat PRD website. It actually pulled and looked for assets that it
could bring in that would look nice. These rings are some design elements that we use commonly.
And so it pulled in some interesting assets, if you recall, Gemini 3 just had a gradient background.
Opus 4.5 actually added some imagery in the background.
Very similar concept in terms of the layout.
So you see, again, a featured article that is the most recent blog post.
Again, three column cards with the zoom in trick.
So I guess people like it.
But if you look at this, a couple nice design tweaks that.
Opus 4 or 5 added when you hover not only does the image zoom in but it gives you this nice little
call to action here this little arrow I think it is so cute just does that nice little touch hover
treatment on the anchor link for the blog post again tags are in and then it did a little bit more
on the SEO side and I will wrap back around to the SEO changes that each of them made but if you
see here not only do you have the author which is me Clervo you have
You have the date, which we also saw in the Gemini 3 option, but it also has an estimated time reading and a link.
And so I just think the quality of the design here went probably 20 or 30 percent further than the Gemini 3 model went.
And it's those nice edge touches that I feel like AI can add into any design that just makes it so much nicer to work with.
And I was really impressed with Opus 4.5 in terms of the course.
quality of the detail orientation. Now let's go down. You know, one of the things that it did
is it handled no images a little smarter than Gemini 3 did. So if you recall, Gemini 3
kind of collapsed these cards here, did not put placeholder images in. Here with Opus 4.5,
it saw that we were missing images for some of our blog posts and put a little placeholder
with a nice little book icon here, which I think is lovely. It makes these cards just look a lot
nicer and is really well designed. So overall, I think that Opus 4-5 did an excellent job out of the
box of redesigning a page and not only redesigning the page, but really thinking about the
functional components of it. And I think a lot of that goes to its planning mode and its ability
to call tools and then do some of these implementations step by step. Now, let's get to the last
model that I tested, which was Codex 51 Pro. So again, same prompt here. Redesign the blog to improve
the visual appeal in UX and add best practices from SEO. I did GPT-51 Codex, the leading coding model
from OpenAAI. Again, codex like Opus 4-5 thought and generated to-dos. The to-dos were a little less
granular than the one from Opus. So if you look at Opus, the to-dos were redesign the blog listing
page with specifics about how I was going to redesign, improve the blog layout, enhance a specific
component, and then at SEO. The plans for 551 codex were a little bit more general. They were
investigate current layout, redesign, apply SEO. So I think the planning was just not as thoughtful
from a design perspective, as the planning was from Opus 4,5.
And then if we actually look at the design, oh, Open AI, you know I love you, some of my favorite models,
but it did not do well on this redesign. And so you can see a couple things that it didn't do well
right out the gate. One, it gave me AI slop purple gradient. Like we do not need any more purple
blue gradients in AI designs. We need to get them out of here. And so just the fact that we got
AI purple is an immediate disappointment. The other thing, and this may be a me problem,
but I think we have a white wordmark and a better logo to use here. And you can tell here,
just the image it's selected is not nice on top of a colored background. Now, I do think that
the headline and copy from the blog is really nice, stories, playbooks, and experiments,
from the team. So it gives a little bit more context. So this was the model that did the best
copywriting, perhaps. But overall, the design was not very good. And then again, it did have featured
post here. This is the image from our most recent blog post. But there's no context. There's no
call to action. It doesn't link to anywhere. And so I'm just
really unsure what it was expecting users to experience. Now it's repeated here, the featured
blocks. So again, I think these models really like, I guess there aren't that many fancy
things in blog design and that you all have to have a featured image and then a three row
layout for your blog post. So it did do the featured image here. But the problem is it added a
bunch of these links that don't really, I don't understand how they work. They only do the
featured image in each of these categories.
the jumping's kind of weird. And then if you look at it at Browse the library,
it doesn't even show the blog posts that exist in our overall library. And so it's both not
pretty. It's purple and it doesn't work. And so I was really surprised because I've had pretty
good experience with GPT5 and 51 in functional sort of backend work. But
the front end work, it just really struggled. And I will tell you, this is not a complicated app.
This is a basic, you know, blog layout with a basic CMS on the background. It is nothing that is
technically complicated. And so what I would say from a GPT Codex 51 perspective is it's not going to be
the designer on your team. It has another role to play on your team. And I have found plenty of
places for this model to be really, really useful, but design is not one of them. And so I would say
just looking back, Opus 4-5, absolutely my favorite from a front-end design perspective. Gemini
3, very serviceable could probably benefit from some planning and implementation. And then Codex
5-1 is just not your front-end girl. So we got to get something else in the front-end. And what I
like about testing these models on a specific use case like this where it's repeated is you can start
to understand which model goes at what part of your workflow. I'm a real believer in model switching. I know
everybody has their personal preferences, but I think there are great models for writing. I think
there are great models for design. I think there are great models for image gen. I think there's
great models for planning and strategic thought and I think there's great models for back end coding.
And not all of these models are created equal. They're all exceptional. I mean, think about the work that
they can do on behalf of teams, but they're not all the same. And I think as you test them out,
looking at similar use cases over models and making a decision about where you're going to
place a model on a team is a really important skill to have as you're developing your AI fluency
as a designer, as a product manager, and as an engineer. Now, I want to go through the functional
side of things before we wrap up this little mini app, which is going to be a true mini app,
hopefully under 20 minutes, which is summarizing the changes you made. So I asked each of the models
to summarize the changes they made in terms of design changes, SEO changes, and just what did it do?
And so, you know, I like this as a workflow as you're working with coding agents, especially
if you're running a lot of them and you're not paying attention to them. Asking it to summarize
the changes it made so you can compare them are really useful. And so if you look at the
at Gemini 3. It made a new hero section, which we know. It made feature post layout, which we know.
Glassmorphism card. Thank you. Thank you, Apple, for giving us glassmorphism. I think we could live
without it, but it's at least a standard, likable design style. So it has scaling images,
deepening shadows, improved typography, related articles, and visual breadcrumbs. Now, let's look at this
because one of the things I did not check is if these models actually changed how the blog post
themselves showed up. So let's click into that and see if there were layout changes that were made
to the actual blog posts. And there were. Okay, so Gemini 3 did make some changes to the actual
blog posts and it said it added related articles. Okay, so that's a little bonus. Is it went beyond
just the blog homepage and it added some SEO functionality into the blog post.
itself. Now let's read the rest of the changes. From an SEO perspective, good, JSON-L-D, great SEO
schema that we definitely want, breadcrumbs, which we definitely want semantic HTML, which is really
helpful, especially in blog, and then related articles and metadata. So lots of very helpful, I think,
high-quality SEO changes to my blog post from Gemini 3 Pro. So I'm going to give it a little bit more
credit and then it went a little further than I initially analyzed.
and actually went to the blog pages itself. But let's check that against my favorite, which was
Opus 4-5. So I'm going to look at Opus 4-5. What changes did it make? Now, see, these changes are
extensive. So again, I think that planning mode really allowed it to make very specific changes across
a variety of components. So you made feature posts in three-colum card grid, which we know,
the little arrow slide-in that I noted, reading time badges, category pills, breadcrumbs,
which we like and graceful empty states. So these are all things that I identified when I was visually
scanning the design that I thought was really nice. The blog layout had this nice rings pattern,
improving spacing, and then the post display has more information. So let's actually see what it did
on the posts, if anything. So let's click through. So it made again very similar changes to what we
saw in Gemini 3. So again, like, don't redesign everything. If you are doing something like a blog,
you're going to get best practices. So it brought in that metadata in terms of author, date, and
reading time. Let's see if it added those anchor links. It did not add any related links. So it maybe
didn't do as great of a job on the SEO on the individual article pages, but it did do a really
nice job redesigning the call to action at the bottom of our blog post, which is something that I don't
think Gemini did. So it added, I'm sorry to say, it is again, AI purple slop. So we got to say no more
purple, especially Chapyrity is so pink. It should know it. It should see pink everywhere in my repo.
It should do this. But other than the purple, I think this is a really nice call to action for a newsletter
subscribe. There's a subtle gradient in here. There's a drop shadow. This little kind of, kind of
of avatar call out next to how many product managers are subscribed.
Actually, there's like 90,000 product managers subscribes and we got to update the content there.
I think this is a really nice little component.
And this is another thing I've noticed about these new coding models is we're all getting
wowed by these beautiful page designs and app designs.
What is really impressive is you give it like a small little component, a little widget,
and have it redesign.
It looks so much nicer.
So that's what it did from a design perspective.
let's see from an SEO perspective. So metadata again, open graph, structured data. Let's see if it did
JSON LD. It didn't specifically call out JSON LD. So I'm going to have to check to see if it did that.
That's one important part of our SEO roadmap at ChapyRD we've been working on. So it's surprised not to see it.
But again, maybe you put Opus 4-5 in the designer mode and you put some of these other models in your like SEO
engineer mode and then another model in your sort of like backend engineering mode.
So maybe we just have figured out where each of these models need to live.
Now let's do our last one and look at Codex 5-1.
What were the changes it made?
Now, this is the shortest summary.
And again, this is the one that did the worst job at this.
I will say also, GPT5 models love a bullet point.
If you see a bullet point, this is a 5 or 5-1 response here.
And so I asked it to categorize the changes.
you made again use the exact same prompt it gave me five bullet points very lazy um so hero panel
category chips featured article layout and then SEO changes did metadata and embedded a schema.org so they did the
jason ld block um so that's good so again we weren't really impressed with the codex 5 gpt 51 codex
model on design and actually not that impressed on the details in terms of user experience in
SEO. So I think maybe this this guy belongs in the back end. I probably could have prompted it better.
But again, the point of this mini episode is to show if we have a basic prompt, the same way I would
speak to a colleague that I don't have time to tell exactly how to make better. I'm hoping they can
research and understand how to make a page better. I would just say, hey, our blog is not good. We need
to prove the SEO. We need to prove the UX and it needs to be prettier. Can you just take care of it?
what it would do. And that's how I like to think about these models is how do they respond to these
natural requests you would make in the day-to-day of your work and then compare how they do on the
outset. So to recap for everyone, we did a, we started with a existing layout. It was not pretty. It was
not functional. It was not good. We gave a three-line prompt to redesign it for U.X, visual appeal,
and SEO and then we compared three models. We compared Google's Gemini 3, Anthropics, Opus
45, and Open AIs, GPD-51 Codex. And the winner was for sure on the design side, Anthropics
Opus 45 model, both from a design perspective, as well as a usability and SEO perspective. And it
went further than even my prompt requested. The hypothesis here is both it is better trained on
high quality front end design as well as its detailed planning allows it to do a much better job
on the details and implementation than these other models that do more shallow planning or no planning
at all as we saw in the Gemini 3 case. And so we just got a better outcome. I love my new blog design. I am
very excited about this. If we just take a step back, it is incredible that in less than 20 minutes,
we were able to generate not one, not two, but three alternative designs for an existing website.
We were able to get massive upgrades on the functionality of it, especially some technical SEO stuff.
And I was able to pick the one I like. Imagine asking your teammate to design you three different options,
give you three different plans for SEO and then tell you what you know have to go back and forth
on which one you like better. I think this is an awesome flow. I loved it so much. I'm actually just
going to go ahead and ship this today. So we'll put it in the show notes so you can see exactly what
happened. And that is my takedown of which of the new models from November 2025 is the best
designer. And I think the winner is Opus 4.5. Thank you so much for joining this mini episode of How IAA. I cannot
wait to share more tips and tricks and hands-on experience with AI and I will see you soon.
Thanks so much for watching. If you enjoyed this show, please like and subscribe here on YouTube
or even better, leave us a comment with your thoughts. You can also find this podcast on Apple Podcasts,
Spotify, or your favorite podcast app. Please consider leaving us a rating and review, which will
help others find the show. You can see all our episodes and learn more about the show at How IAI.
ipod.com. See you next time.
