Everyday AI Podcast – An AI and ChatGPT Podcast - EP 504: Has Anthropic’s Claude lost its edge? What happened & can Claude recover?

Starting point is 00:00:00 This is the Everyday AI Show, the Everyday Podcast where we simplify AI and bring its power to your fingertips. Listen daily for practical advice to boost your career, business, and everyday life. Meet Firefly AI Assistant, now live and Adobe Firefly, the All In One Creative AI Studio. Just describe what you want to create and the assistant handles the rest, orchestrating multi-step workflows across Photoshop, Premiere Express, and more in one conversational interface. You direct the outcome. The assistant accelerates execution. I would say for the better part of two years,

Starting point is 00:00:48 the large language model race was three teams. You had Anthropic, Open AI, and Google racing for the lead. And going back and forth, jab for jab as the best AI model maker in the land. Obviously, you know, my. Microsoft's in there, but they're more of a system that uses other technology. But when it came to actual AI frontier labs, it's always been a three-team race. I don't know if it's like that anymore. I think right now, OpenAI and Google are so far ahead of everyone else.

Starting point is 00:01:33 And I'm left wondering, what happened to Anthropic? What happened to Claude? is it still a top tier large language model or has clawed completely lost its edge and can they ever catch up with Google and Open AI? All right. We're going to be talking about that and a lot more on Everyday AI. What's going on, y'all? My name's Jordan Wilson and I'm the host of Everyday AI. This thing, it's yours.

Starting point is 00:02:06 It's your daily live stream podcast and free. daily newsletter helping us all not just learn AI but how we can leverage it to grow our careers because you can try to keep up with AI news and developments and new large language model updates you can try to keep up but just hearing about them reading about them doesn't do anything you need to leverage it and that is what our website is all about your everyday AI.com so there we recap each and every day's podcast episodes sometimes I have guests on sometimes it's just myself so we bring you exclusive insights every single day, we're actually the only AI newsletter that does that, as well as we keep you up with everything

Starting point is 00:02:43 else happening in the world of AI. So you can be the smartest person in your company or your department when it comes to generative AI. All right, let's actually do that and go over a quick recap of what's happening in AI news for April 15th. So Apple is responding to criticism over its AI performance, particularly in areas like notification summaries with a timely pivot toward synthetic data and differential privacy. So yeah, Apple kind of responding, according to reports, by focusing a little bit more on synthetic data, right? So the company generates, according to the report, the company is now generating synthetic

Starting point is 00:03:28 data to emulate user information without using real content, enabling private testing on data's of users who opt into device analytics. So this approach ensures accuracy while safeguarding privacy. So yeah, Apple obviously has had a super, super slow rollout. And by super slow rollout, they're years behind everyone else. And their Apple intelligence, let's just say it has not been well received. So some new reports and information are showing Apple's kind of new or updated approach by using synthetic data, kind of tying it to those who sign up.

Starting point is 00:04:09 for kind of this device analytics. So by polling devices with synthetic data comparisons, Apple is hoping to enhance its Apple intelligence with better email summaries and other functions, signaling a broader commitment to addressing user concerns and advancing its AI capabilities responsibly. Our next piece of AI news, Nvidia has committed $500 billion to USAI manufacturing

Starting point is 00:04:37 amid changing tariff policies here in the U.S. So, Nvidia announced plans to invest up to $500 billion in AI infrastructure manufacturing within the U.S. over the next four years, marking a significant shift in its supply chain strategy to meet surging demand for AI chips and supercomputers. So the move coincides with U.S. President Trump's ever-changing tariff policies, which initially imposed steep levies on imports from Taiwan and China, but recently exempted chips in other tech products, easing concerns for companies like Nvidia and Apple

Starting point is 00:05:13 that rely heavily on overseas productions. So, Nvidia will partner with Taiwan superconductor in Taiwan semiconductor in for chip production and with Foxconn and Wistron in Texas for supercomputer manufacturing, aiming to achieve mass production at these facilities within 12 to 15 months. So by using digital twins of factories and advanced robotics for automation, Nvidia hopes and plans to streamline operations and enhance efficiency in its U.S.-based facilities demonstrating how AI technology can transform the manufacturing process.

Starting point is 00:05:53 So yeah, if you're wondering like, okay, what the heck does this matter? Well, so many big companies and all the AI systems that we use, like ChadGBT, Google, Microsoft, everyone else, they're struggling to keep up. with demand, right? So essentially, everyone's looking for more compute. This is a pretty big move from Nvidia to bring more kind of AI power to the U.S. And then last, but definitely not least, Open AI has launched a new family of models with the GPT 4.1 series. Probably the big headliners there is it now has a million token context window, but right now at least, It is only available on the API's end.

Starting point is 00:06:38 So only for developers right now. So Open AI has launched its new family of models GPD 4.1 as a major upgrade to its previous models offering advancements in context processing, reliability, and cost efficiency. But like I said, you're not going to find it. If you go to chat gbt.com, it's not there. At least right now, OpenAI did not announce any plans for it to live on the front end inside chat chad chpd. and is only available for developers on the back end. But let's talk a little bit about the model because some pretty, pretty impressive specs here.

Starting point is 00:07:11 So GPT 4.1 introduces a 1 million token context window far surpassing GPT4O's previous tops on the API end, which was 128,000. So that's big. So, you know, Claude and Gemini and others were really beating Open AI historically in context window, right, but not anymore. So pretty big news there. And then unlike previous models integrated into chat, GBT, like I said,

Starting point is 00:07:44 GPD 4.1 is exclusively available through OpenAI's API, making it a tool tailored for developers rather than general use. The performance is pretty impressive across coding, instruction, following, and complex reasoning tasks. What's also important is OpenAI has said some of those improvements have also been rolled out kind of under the hood to its GPT 40 model. I would assume it was the late March update that there wasn't a lot of updates about. And there are now three new varieties. So there is GBT 4.1, kind of the full version.

Starting point is 00:08:22 GPT 4.1 MIDI, which is more affordable and compact. And then GPT 4.1 nano. Yeah. the first time, you know, Open AI has gone at Nano, and that is their smallest, fastest, and cheapest model. Yeah. If it's, it's, as if it's not hard enough to already understand these models, now we have two variety of small ones. Yeah, if you thought Mini was small, no, now apparently mini is medium and Nano is small. And then some sad news for some old school, you know, if you like some of these old models, Open AI is planning to face out older models like the

Starting point is 00:09:00 OG GPT 4 by April 30th. And then also somewhat surprisingly, OpenAI announced they'd be phasing out GPT 4.5 preview by July 14th to focus on the more efficient 4.1 lineup. Also, this release coincides with a delay in GPT5's launch now expected in a few months as OpenAI navigates some integration challenges. And yeah, so FYI. Obviously, Open AI has changed course a couple of times. They essentially said, hey, we're going to stop releasing non-reasoning models. And GPT5 is going to be more of a hierarchy or a system.

Starting point is 00:09:42 So they said, yeah, we're not going to be releasing a lot of new models before GPT5. And here we are. So, all right, let's get into it. A lot more on those stories on our website at your everyday AI.com. What's up? Livestream, crew. Yeah, if you listen on the podcast, come join us sometime on a live stream. You know, when I have guests, we take questions.

Starting point is 00:10:07 Sometimes I ask you all things. So thanks to everyone for joining in. George from YouTube, Big Bogey saying GPD 4.1 is a coding powerhouse. Yeah, it is already early benchmarks. Trade printing here from YouTube. Thanks for joining us on the LinkedIn, Kimberly and Dennis, Allison. Thank you all for tuning in. But let's just get straight into it.

Starting point is 00:10:29 Has Anthropics Claude lost its edge? It's Tuesday, y'all. I'm going to take a sip of my coffee. And let me know. Should I crank this up? It's been a while since I really brought it on a hot take Tuesday. I'm a little tired, but live stream audience if you could, leave me an emoji or two.

Starting point is 00:10:52 Should I be one fire emoji? Should I be kind of nice? Two fire emoji. Should I bring the heat? Or three fire emojis? Burn, baby, burn. I mean, I don't know. One thing, and let me tell you this,

Starting point is 00:11:05 I tell you all the truth. I do, period, right? As an example, if you would have asked me 18 to 20 months ago, hey, Jordan, what are your thoughts on Google Gemini? I'd say, eh, don't use it. Ask me today, Google Gemini is the king of the hill, right? I do think it is Google and Open AI now going jab for jab.

Starting point is 00:11:32 But I tell you the truth. Right. So I'm not going to hold back if you all want a little bit, a little bit of fire. All right. Rolando here is saying to crank it up. Fred, all right, Fred, Fred, thank you. Fred's like, all right, Jordan, be nice today. He wants me to be kind of nice.

Starting point is 00:11:49 Allison here, throwing in some dynamite. That's dangerous. All right. All right. We'll see. We'll see. I don't want to offend anyone. Because let me say this.

Starting point is 00:12:05 Let me say this. Claude is still one of the most impressive pieces of AI technology ever created. All right. Period. So I don't want to overlook that. But what I've found is I've been using Claude less and less. I would say probably nine months ago, Claude probably accounted for about 25% of my usage.

Starting point is 00:12:43 It's probably down to about 5% now. I'm finding it hard to find actual use cases for Claude. And I'm talking about on the front end, y'all. So I'm not talking about on the back end. I know that Claude 35 has historically been, you know, one of the most used models if you look, on like open router. I know that Quad 3-7 is still popular for developers,

Starting point is 00:13:08 although not, it's not the most popular anymore. It's not the most popular anymore with Gemini 2.0 Flash and Gemini 2.5 Pro. It's really not. But this has been a long time coming. So back in September, back in September, if you want to go listen to this,

Starting point is 00:13:29 what episode was this year? 351. All right. So I told you all back in September, three reasons businesses shouldn't use Anthropics Clawed yet. And this was after like a year, right? This was a year of me being hesitant. So what a lot of people don't know, people are like, okay, Jordan's just some random guy that, you know, jumps on a podcast and talks about AI. Well, yes, on the surface.

Starting point is 00:13:58 Right. On the other end, I do a lot of things that you all don't see on this show. consult big companies, companies with tens of thousands of employees. I work with research organizations. They reach out to me, big ones, big main ones. And they're like, hey, Jordan, can you help us better understand generative AI? So it's much more than, you know, this little podcast. Although thank you all for listening, you know,

Starting point is 00:14:23 and making everyday AI a top 10 tech podcast in the U.S. But I'm talking with a lot of businesses, a lot of things that you don't hear. And it's not just me. Big enterprise companies have always been hesitant to use Claude at scale. All right. And it's been a long time coming. I even said three big reasons. This was back in September that I said,

Starting point is 00:14:52 Claude was in trouble. And enterprises shouldn't be using it yet. Number one, there is no enterprise access. So I'm talking about on the front end, right? So keep that in mind. Everyday AI, it's for largely non-technical people. Right. And I'm talking about logging on to, you know, claw.aI.

Starting point is 00:15:11 Or I'm talking about logging on to Gemini.com, chatjpd.com. Right. Using this on the front end with your team. One thing I'm a huge advocate for, if you listen to the show, is having your AIOS, right? Your AI operating system. Your team needs one, right? In addition to whatever your company may be doing on the back end, you need a front-end AI operating system where you and your team collaborate to get work done.

Starting point is 00:15:38 No internet access. Claude went the first two years without having internet access. They just rolled out internet access about a month ago. Okay. Very limited third-party integrations, all right? Google technically on the front end has not a lot of third-party integrations, but because they're Google, right? because they have, you know,

Starting point is 00:16:04 anything that could be a third-party integration, they essentially have in-house, right? Google has like a trillion of their own products, right? Extremely limited third-party integrations. It's improved since September, since I had this show, episode 351. And then I said extremely restrictive testing tiers. So both free and paid,

Starting point is 00:16:24 I'd say the one biggest thing that's been knocking Claude off from real business adoption is you can't. can't even go and test it. You like, if you have a paid, even a paid plan of Claude, right? And you're like, all right, you know, let's go test this. Let's see if this is right for our business. You know, you're paying the $25 a month or whatever. There's been, and I'm not exaggerating, hundreds of cases because I use large language models.

Starting point is 00:16:53 I mean, it varies. I don't know, anywhere from four to 12 hours. Recently, it's been a lot of 12-hour days using large language models, right? It's so easy on a paid plan to hit your rate limits on Claude, I kid you not, within 10 minutes. It's happened to me hundreds of times where I will hit on a paid plan, the rate limit within 10 minutes. Yes, I'm generally working in multiple tabs.

Starting point is 00:17:18 If I'm in Claude, I'm working with long context windows, yes. I can't tell you the last time I've hit chatGBT limits, right? doesn't happen. Gemini doesn't happen. Claude has been extremely restrictive. And I think that was a major misstep early on. How do you expect, aside from, you know, appealing to your, to your core audience, which we'll talk about because I think they're losing space there, right, you know, coding,

Starting point is 00:17:51 development, software, engineering, et cetera, right? How are you going to appeal to the average business owner, to the average enterprise, use case when, you know, a company pays and maybe they get a team plan. I think those rates are about double. But still, you can't even use the thing when you pay for it. It is extremely restrictive. All right. And another reason why I think that Claude has lost his edge.

Starting point is 00:18:21 It's no longer innovating. Right. In the early part of 2024, even midway through the year, I still think Claude was an innovator, right? They came out with artifacts, which when it came out extremely impressive. So if you don't know Claude artifacts, it's actually kind of hidden. You have to like enable it and then you have to make a call to it. Right. But it's it's still, and right now, let me be honest, because I still said Claude is still one of the most impressive

Starting point is 00:18:53 pieces of AI technology. There's still great use cases, right? Even though I'm trying to, you know, you all wanted the flame emojis, I'm not going to totally. poo-poo on Claude. There's still some use cases. I said maybe now it's 5 to 10% of, you know, my use. But the only thing I use Claude for right now is using 3.7 thinking on artifacts. That's it. Nothing else.

Starting point is 00:19:17 Because everything else, Claude is not a top model anymore. In many cases, it's not even a top five or a top 10 model, which sounds crazy to say. because nine months ago, they were that, that tier one, right? If we go back to our ranking tiers, right, like S, ABC, right?

Starting point is 00:19:38 They were us. They've fallen. How the mighty have fallen. But that's all I really use it for. But Claude and Enthropic were innovators, you know, early on. So the artifacts,

Starting point is 00:19:55 so, you know, that's something that can render code in natural language. You know, you can have it build, you a business dashboard, you know, games, whatever, right? And you can run it in the browser.

Starting point is 00:20:05 And then guess what? Chat GPT and Gemini said, all right, yeah, let's do this as well. So they came in with Canvas. All right. Similarly, Claude was an innovator with projects, Anthropic innovated with projects, right? A good way to, you know, organize your chats, a good way to leave custom instructions and project knowledge, right?

Starting point is 00:20:30 Chat GPT follow suit. Computer use, right? Anthropic was innovating. Although back in October, when that came out, it was extremely clunky, extremely clunky, right? You know, one of the easier ways to run computer use was, you know, you had to download Docker.

Starting point is 00:20:49 You had to go to GitHub, you know, work with their repo, which is fine. But for non-technical people, not that good. In the rent limits, I did a live show where going over Claude's computer use, again, very hard to use with the rate limits. So I think Claude is no longer innovating. Now I think they're clone chasing, whereas before others were copying their innovation, now they're copying other people's innovation.

Starting point is 00:21:18 So yeah, like now a lot of the things that you see and that are going to be rolling out, like as an example, according to some online sleuths right now, Claude is testing voice mode and all these things, right? They're just now seemingly cloning features that were popular six months ago, a year ago. And one of the reasons, I think, is Enthropic dropped the ball. Right. Back in September, when I gave those three, those three reasons, those three different scenarios on why I thought enterprise companies and why I told countless enterprise companies don't use Claude for those three reasons.

Starting point is 00:22:01 They didn't address those. Those were not secrets. It was no secret that you couldn't, it was so hard to literally use Claude on the front end. They knew that, right? Their, you know, team is interacting with people online on Twitter. You know, everyone's been complaining about work rate limits and, you know, Claude's team has been saying, oh, we're working on it for years.

Starting point is 00:22:26 It's too late. It's too late. One of the reasons why, right? I don't know this, but we've heard stories that as an example, Open AI is losing money, right? Even CEO, Sam Altman said on their new pro $200 a month subscription that they were losing money, even though it's been extremely popular. So I don't know. This is my hunch, but my hunch has been, Claude has been maybe more profitable,

Starting point is 00:22:53 at least by percentages, then maybe their main and closest competitor to check. GPD. But at what costs? Because I don't think they're growing their user base. I don't think about it. You know, I think sometimes, you know, if you are an avid listener to this show, if you're, you know, an AI nerd like myself, we, I mean, we all live in an echo chamber as well, right? Outside of our little echo chamber, no one knows about Claude.

Starting point is 00:23:26 Right? But they could have. they could have a year ago if Anthropic would have listened to its customer base a little more closely and continued to innovate and improve the product

Starting point is 00:23:41 improve usability I don't think we'd be having the same conversation today always have receipts y'all always have receipts all right so on my screen this is January 2025 web traffic

Starting point is 00:23:59 all right Adobe just introduced an entirely new way to create, bringing the power and precision of its creative suite into one conversational experience. Meet Firefly AI Assistant, now live in the Adobe Firefly app, the all-in-one creative AI studio. Powered by Adobe's creative agent, Firefly AI Assistant lets you start with your vision, just describe what you want, and shape the outcome as it takes form with the Assistant. The Assistant orchestrates multi-step workflows, drawing on 60-plus pro-grade tools across Adobe Creative Cloud apps, including Photoshop, Illustrator, Premiere, Lightroom Express, and more to help bring your ideas to life.

Starting point is 00:24:46 You can also get started with creative skills, a growing library of pre-built workflows for common creative tasks, like batch editing photos, creating mood boards, portrait retouching, and creating social variations. Every step the assistant takes is visible so you can refine, redirect, or take over at any time. You stay in the driver's seat as the creative director. Adobe Firefly AI assistant now in public beta. See it today at firefly.ad.com. No one uses Claude. Comparatively, no one uses Claude.

Starting point is 00:25:26 They don't. I know I'm going to catch them flat for that. And I'll be like, Jordan, you're a, you know, a chat GPT fanboy or, you know, jumping on the Gemini bandwagon. No, I'm not. I've been using Claude since the day it came out. I've enjoyed certain features. I use all. You know, I've used dozens of LLMs.

Starting point is 00:25:49 And like I said, hours every single day. Claw's not good anymore. It's not. I got more stats. I got more receipts. Don't worry, y'all. You said you wanted some flame emojis, right? So let's look at total visits in January 2025, web visits.

Starting point is 00:26:10 ChatGPT.com. 3.9 billion with a B. Yeah, with a B. Clawed, 76 million. Gemini, 267 million. Deep Seek, 277 million. So, you know, essentially Gemini and Deep Seek are right there with each other in terms of people visiting the front end.

Starting point is 00:26:42 Perplexity, 99 million. Y'all, chat GPD. Let me do some quick, some quick napkin math here. ChatGPT has more than 10 times the users of Claude, Gemini, Deep Seek, and perplexity combined. It's my math right there, 500. All right, almost. Sorry. My matkin math was a little wrong there.

Starting point is 00:27:09 All right, so we have, that's about 500 million, 600 million, all right. So about five times. So chat GBT has. five times more users than Claude, Gemini, Deep Seek, and Perplexity combined. And Claude is in, at least according to kind of online demographic or online website information, which is pretty accurate, right? I've been using these different SEO tools for 10 plus years. They're very accurate.

Starting point is 00:27:38 No one's using Claude. Hot take, ready? It's been less than two months since Claude released its latest. model in Claude 3.7. Claude Sonnet 3.7. And it already feels antiquated. So they announced it February 24th. Claude 3.7.

Starting point is 00:28:02 And let me just call this one out, right? They made a big deal of Claude being, you know, the world's, you know, first hybrid model, right? So, you know, when you think of old school transformers and then you think of these, you know, quote unquote new school models that think in reason under the hood, I don't know, to me that seem like a marketing gimmick from Anthropic, right? Why? Well, you have to actually, if you want to use that extra thinking, right, you have to go in and you have to click the button.

Starting point is 00:28:37 So is it actually a hybrid model? I don't know. I'd say not. So now I think you also have Anthropic falling down this trap that Google fell in in late 2023 where they're getting caught up in the market. in not listening to their users in shipping new, capable, powerful models. But Claw 3-7s-on-it feels antiquated because since that time, we've already had multiple updates from OpenAI.

Starting point is 00:29:06 We've had multiple updates from Google. We've even had multiple updates from models that I'd say never use like Deepseek, right? If you care about your privacy, don't use it unless you're, you know, downloading it and fine-tuning it locally, right? But don't use Deepseek on the web or, their API if you care about your data. If you are a business, don't do it, especially in the U.S. All right. But anyways, how are we at the point now where a model that is not even two months old

Starting point is 00:29:36 feels antiquated? That's where we're at. And I don't know if Anthropic can keep up. Like I said, they're very innovative to begin with. They're great researchers. You know, obviously, I think they are a world leader in terms of, of AI safety in terms of ethics, right? All of those things.

Starting point is 00:29:58 But in terms of like, okay, are they just going to be more of a research arm that kind of drops AI models? Or are they trying to actually dominate? Are they actually trying to be relevant? Are they actually trying to be one of the top large language model makers in the world? I don't know. I was personally very underwhelmed with Claude 3.7 sonnet, their newest model. even the thinking variation when you have to toggle it on.

Starting point is 00:30:24 I know a lot of people, I was reading, you know, online, you know, a lot of people are using, are using it inside like George here on YouTube says, you know, he says, Claude seems lazy in windsurf and cursor, but it is not when you use it in an app. Yeah. So I know a lot of people, yes, Claude, up until, you know, a week ago when Google said, oh, wait, Claude, you are no longer. relevant because we're dropping Gemini 2.5 Pro, which wipes, wipes all the way the competitive advantages that Claude 3.5 or Claude 37 Sonnet hat, right? Google just said, yeah, we're,

Starting point is 00:31:03 we're going to knock you off this pedestal. You're not going to compete. Google straight up wiped them, which is interesting, right, because Google is, you know, has invested, you know, but they're still technically competitors in some regards as well. let me tell you what I mean. And here's my hot take. I think there's a lot of Twitter talk and hipster hype when it comes to Claude 37 or Claude 35. But I care about business utility.

Starting point is 00:31:39 Anthropics lost its edge there. I care about benchmarks. I care about real human usage. Claude's not competing there anymore. And like I said, I think one of the biggest things that's happened in that I would not want to be working at Anthropic right now. Jevini 2.5 Pro and 2.5 Flash, I don't know if I'm being honest,

Starting point is 00:32:01 unless Anthropic has been sitting on a world-changing model, I don't know how Anthropic is going to compete against Jevonai 2.5 Pro and Gemini 2.5 Flash. Good luck. I know, you know, a lot of people have said, oh, well, there's still, you know, Claude 37 Opus, right? you know, intrafate Claude had kind of these three tiers of models.

Starting point is 00:32:25 They have their small, haiku, their medium sonnet, and their big one opus, and they haven't updated opus in a very long time. So everyone's like, oh, you know,

Starting point is 00:32:32 Claude 37 opus or, you know, Claude 4.0 will, I don't know. I don't know, because there's also rumors, even though Gemini 2.5 Pro just went generally available like 10 days ago.

Starting point is 00:32:46 There's already rumors that Google has a much better and more capable model that they're already testing on the L. am chat botterina. I don't know how Gemini is going to compete against Google. All right. Got receipts as always, y'all. I have receipts.

Starting point is 00:33:03 Yes, similar web, Dennis. Thank you for asking. That's where that data was from. All right. Let me know, y'all. Why is your audience? Am I wrong on this? But let's get quickly to the receipts.

Starting point is 00:33:15 All right. I'm not going to make you wait an hour for this one. I'm going to go through quickly. Because the proof is in the pudding, y'all. The writing's on the wall. All right. So let's look at artificial analysis. So a great third party unbiased website, right, that does benchmarks.

Starting point is 00:33:34 Because one of the thing is when companies put out their benchmarks, they cherry pick. There's dozens of different benchmarks. So, of course, you know, when, you know, these AI labs put out their models, they choose, okay, out of these 50 benchmarks, here's the eight that we're going to put on our website because we look great on this, right? So I always look at Elo scores. We're going to talk about that in a minute here from Ella Marina and look at third party benchmarks as well. So intelligence. This is from the artificial analysis intelligence index.

Starting point is 00:34:04 Gemini 2.5 Pro in the lead. Second, 03 mini high from OpenAI. Then you have the two variations of deep seek. And then you have the new version of GPT 4.1. Y'all, I'd account. Claude 3-7 is number 8. in terms of intelligence on this third party benchmark. Let's keep going because you're like, okay, what about humans?

Starting point is 00:34:32 Humans probably prefer it. Okay. So, Elo scores. Let's talk about that. That's head to head. You put in a prompt on LM Arena on the Chapot Arena. You get two outputs. You don't know who they are.

Starting point is 00:34:41 You say this one's better. All right. There's been millions of votes. Guess what? Total Elo's score. Claude is not a top 10 model. That's when I, like, I know it sounds crazy to say, but you have to ask the question.

Starting point is 00:34:58 even if you ask it rhetorically, is Claude no longer a state-of-the-art model? I don't know, y'all. In so many benchmarks, in so many now ELO categories, overall ELO, they're not a top 10 model. Gemma 3, which is a small language model from Google, has a higher ELO score than Claude 3.7. Let me say that again. A small language model. Not a large language model. Humans prefer the outputs across millions of votes compared to Claude 3.7.

Starting point is 00:35:45 Google has, let's count it. One, two, three, four, five models. Five different models that humans prefer over Claude 3-7 sonnet. I don't know. So, I don't know, is my hot take, very hot take when I said, Hey, Anthropic has lost its place atop, right? Gemini 2.5 Pro, higher. Let's see, we have Gemini 2.0 Flash thinking, higher.

Starting point is 00:36:12 Gemini 2.0 Pro experimental, higher. Gemini 2.0 Flash, higher. And then there's small language model, Gemma 3. My gosh. All right. But you might be saying, all right, Jordan, well, people use Claude for certain reasons, right? They use it for creative writing. Claude's great at that.

Starting point is 00:36:31 They use it for coding. in software development. Claude's great at that. That's an old narrative. Literally, that's an old narrative, right? Especially the creative writing thing. I think essentially, right, you know, a bunch of stuff went viral online,

Starting point is 00:36:45 like maybe a year and a half ago about how bad chat GPT and Gemini were at writing content and Claude was just so much better. All right, well, let's look at those two things. Let's first look at creative writing. Okay. Oh, where's Anthropic?

Starting point is 00:37:05 Oh, the bottom of the list. Again, not top 10, Elo in creative writing. That's why I'm saying. I think right now it's a lot of Twitter talk and hipster hype, right? Oh, it's cool the light clawed, right? It's like, oh, you know, I see you wearing that name brand chat GPT. Oh, I see you with that mainstream Google Gemini. I'm over here, Prompton, with that Claude.

Starting point is 00:37:34 man no why why not a top 10 model when it comes to creative writing which everyone thought it was amazing at it was a year and a half ago um don't lie with me y'all this is this is millions of people have voted this blindly guess what else not a top five encoding either it's not claude a 3.7 sonnet with thinking is not a top five model for coding. Guess what is? Guess what's at the top? Open AIs 01. Their O1 preview, their O1 Mini, Gemini 2.0 Flash.

Starting point is 00:38:17 And I do, I do believe once Gemini 2.5 Pro is on here and gets enough votes, it'll be up there as well. But not a top five model in terms of coding. So what do you want? What do you want? I don't understand. Why are people still using Anthropic? Like I said, maybe you have one or two use cases

Starting point is 00:38:40 that you're happy with it, right? If I'm being honest, the only thing I use it, like I said, Claude used to be maybe 20% of my usage. I'm a heavy large language model user. Like I said, maybe it's 5% now. I'm only using it because there's certain things

Starting point is 00:38:55 and artifacts that Claude does better than Google's Canvas and OpenAI's canvas. but it's always like I'm doing it all at the same time anyways. I'm running the same thing in all three of them. And sometimes I'm like, okay, yeah, Anthropics is a little bit better here. All right. So maybe you're like, oh, it's fast.

Starting point is 00:39:14 It's affordable. It's not fast. It's not affordable. It's not, you know, when you look at speed, and this is from artificial, artificial analysis, Gemini 2.0 Flash and Gemini 2.5 Pro are the fastest models followed by GBT40 in 03 Mini from OpenAI. Again, claw not in the top five when it comes to speed,

Starting point is 00:39:36 which is output tokens per second. So it's not fast. All right. And that is the non-thinking model, by the way. All right. It's terrible on price. It's terrible on price, right? Which I still don't understand why people are so deep-seek drunk.

Starting point is 00:39:57 Like deep-seat is not cheap anymore, right? It's not. When it first came out, it's like, Yeah, this is cheaper. Okay. Well, Gemini 2.0 Flash is wiping the floor with everyone when it comes to price. Lama's new Lama 4 Scout, GPD 4O Mini, right? There's just so many faster, better, cheaper models than Claude.

Starting point is 00:40:19 So I don't, it's definitely lost its edge, right? I think there's one more thing I wanted to pull up here. Okay, it's coming up in a slide here. Because this is also telling. So looking at the intelligence versus price. So it's not like you're getting a good. bargain either if you're using Claude on the back end right you're not you're not so on the front end humans aren't preferring it on the back end you're not necessarily getting what you pay for

Starting point is 00:40:42 again this is intelligence versus price so there's a little quadrant here so you want to be on the upper the upper left because that means it is cheaper and smarter Claude is on the right side and Claude 3.7 sonnet is actually on the bottom right all right uh not necessarily fast or affordable. And here we go, everyone's like, oh, it's the best coding model. Guess what? It's not artificial analysis. Their coding index. This one is very interesting.

Starting point is 00:41:19 Claw 3-7, the thinking model, ready? The thinking model is in fifth place. Guess what's ahead of it? The new model that was just released from OpenAI, GPT4-1. But guess what, y'all? This is the mini-version. The mini version of Open AI's new model.

Starting point is 00:41:45 Not only is it a non-thinking model, right? Because normally if you use these thinking models, these reasoners, they code much better, right? Especially when you're working with very complex tasks in long token, long-contacts windows. So not only is this GPT4-1 model. It's not a thinking model and it performs better on artificial analysis coding index, but it is the mini version. It is the mini version.

Starting point is 00:42:13 So I don't know, y'all. If you're still using Quad37 Sonnet, let me know why. Let me know why. I'm very curious. Like I said, I know a lot of people on the software engineering side, on the development side, they love it, right? using it with cursor, using it with windsurf, using it inside all these different IDs.

Starting point is 00:42:39 I also don't understand why on that. Now with Gemini 2.5 Pro with Gemini 2.0 Flash, and now these new models from OpenAI that they just announced, I don't understand it. I honestly don't understand how Anthropic has gone, in Claude has gone from that top tier, right, state of the art world leading model to kind of irrelevant. So a lot of people are like, oh, well, you know,

Starting point is 00:43:12 Claude just released a new plan, Jordan. You're really harping on them for, you know, these rate limits. You can just pay more and use it way more. Okay, well, why? If it's not a top 10 model, right? Yeah, Claude just came out with their Claude Max, right? So you get higher limits, you know, if you're paying $100 a month or $200 a month,

Starting point is 00:43:32 which let me just call this out, you know, because people are like, okay, Jordan, this solves. Well, you don't get anything more powerful for that $100 or $200 a month. You don't get more features, right? So when Open AI as an example announced their $200 pro plan at the time, that was the only way you could access SORA. That's still the only way that you can access 01 Pro. And then you get unlimited everything. Unlimited. This is not limits.

Starting point is 00:44:00 Or sorry, this is not unlimited. You can still go on the front end and pay $100 or $200 a month. You don't get new features. You don't get new models that are exclusive to that max plan. You just get slightly better limits. But here's a concerning one, y'all. This one's kind of concerning, ready? This is from Anthropics website, talking about their new plan, ready?

Starting point is 00:44:27 Talking about their message limit on the new max plan. your message limit will reset every five hours. We call these five hour segments a session, and they start with your first message to Claude. Please note that if you exceed 50 sessions per month, we may limit your access to Claude. Each session includes any messages sent within five hours from the first initiated chat.

Starting point is 00:44:58 So we expect it to be. fairly generous for our users. Like, gosh, I don't know. How tone deaf is this, y'all? Come on. So let's just say, in theory, let's say you're a very regimented person, all right? Like I am. So this is why I can't even use Claude on the current paid plane, but even if I pay $100 or $200 a month.

Starting point is 00:45:20 So let's say I use Claude in the morning before my show to help plan it. All right. So let's say 6 a.m. And then I use it at noon, midday. All right. And then in the evening, you know, I use it again. So let's just say I just do a couple of props, a couple prompts a day. I do it at, you know, 6 a.m.

Starting point is 00:45:38 I do it at, you know, noon, and then I do it at 6 p.m. 6 noon, 6, right? Couple prompts a day. Paying $100 or $200 a month. In that scenario, even if I'm only doing a couple prompts, right, paying $100, $200 a month, I might get cut off from my pricey $100, $200, $200,000. $100 a month plan. That's what they're saying, 50 sessions a month.

Starting point is 00:46:04 So if I do that, if I use clawed three times a day that are more than five hours spaced apart, I could, in theory, in three weeks, get shut off. And I might not be able to use their paid plan for the last week of the month. In theory, that's what it's saying here. How tone deaf is that? I don't understand. If I'm being honest, when I saw that, I'm like, come on, Anthropic. You have, I don't know.

Starting point is 00:46:30 How many billions of dollars have you gotten from Amazon? I lost track, $6 billion or something. This is why people aren't using your service. Humans don't prefer it. Benchmarks don't prefer it. And for those people that are actually still finding utility in our power users, you're slapping them in the face. Get real.

Starting point is 00:46:56 All right. Hot take. Let's end it here. Can Claude recover? I honestly don't think so. I don't think so. Here's, again, this is just reading reports. You can't knock Anthropic for putting safety first.

Starting point is 00:47:26 You can't. They put out world leading research. I do think when it comes to, you know, safe AI. They are a leader in that. But no one's paying you for your research. You're not competing to be the best frontier. AI lab with the best research, with the best safety. This is a race.

Starting point is 00:47:57 This is the Wild West. That's what it is. There's no rules when it comes to AI. Anthropic is playing, I'd say, the wrong game. They've alienated their power users. They've stopped innovating. And I think that has caused them to now face. an almost insurmountable challenge, right?

Starting point is 00:48:31 Let's just say, as an example, Claude had their 4.0 model ready, and they probably have had it ready for a while. When you see these new drops from OpenAI, right, their 4.1 models, the smaller versions, when it comes price per performance, amazing.

Starting point is 00:48:51 Same thing with Google Gemini 2.5. I don't think, if I'm being honest, right where nine to 15 months ago I'm like yep it's going to be a three team race it's not anymore yes you have to pay attention to open source you have to pay attention to chinese models but most enterprise companies here in the u.s aren't going to touch many open source models for different reasons and they're not going to touch Chinese models for obvious reasons data security data privacy and not sending all your business IP straight to china from a u.s perspective Anthropic was primed

Starting point is 00:49:29 to compete in this three-team race. They were primed to be a leader. But now they're a second-tier company. They are. That might be harsh. You wanted my honest take? That's not just me. Is that my personal usage?

Starting point is 00:49:49 Sure, is that my personal experience? Yes. But I showed you the receipts. Users aren't using it. Number one. They're not competing on bench. Marks. Number two. Humans don't prefer it. Number three. So can Claude recover? I don't know. I probably say no. All right, y'all. This was helpful. You wanted some hot takes? I try to bring it. Try to bring it a little bit.

Starting point is 00:50:14 So, you know, talking a little bit has Anthropics Claude lost its edge? What happened? And are Google and Open AI too far ahead? Simple answer? Yes, Anthropics lost its edge. And yes. Open AI. Google, at least today, are way too far ahead for Anthropic to catch. That could be wrong, but the only way you're going to find out is by continuing to tune in. Maybe I'll be eaten a big helping of, you know, humble pie, you know, in 2026, but we will see and find out. All right. Thank you for tuning in, y'all. If you haven't already, please go to your everyday AI.com.

Starting point is 00:50:50 If this was helpful, please share this with your network, tag of friends, someone that needs to hear this. if you're listening on the podcast, appreciate your support as always. Reach out to me. I always lead my email in my LinkedIn there in these show notes. So please reach out if you have thoughts on this. Let me know in the live stream comments as well. Then go to your everyday AI.com. Sign up for the free daily newsletter.

Starting point is 00:51:12 Thanks for tuning in. We'll see you back tomorrow and every day for more everyday AI. Thanks y'all. Meet Firefly AI assistant. Now live in Adobe Firefly, the Allman One Creative AI Studio. Just describe what you want to create in your own words in the assistant. handles the rest, orchestrating multi-step workflows across Adobe Creative Cloud apps, including Photoshop, Premiere Express, and more in one conversational interface.

Starting point is 00:51:41 You direct the outcome while the assistant accelerates execution. Stand control with the ability to step in and refine at any time. See it today at firefly.adobie.com. And that's a wrap for today's edition of Everyday AI. Thanks for joining us. If you enjoyed this episode, please subscribe and leave us a rating. It helps keep us going. For a little more AI magic, visit Your EverydayAI.com and sign up to our daily newsletter so you don't get left behind.

Starting point is 00:52:15 Go break some barriers and we'll see you next time.

Everyday AI Podcast – An AI and ChatGPT Podcast - EP 504: Has Anthropic’s Claude lost its edge? What happened & can Claude recover?

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.