Everyday AI Podcast – An AI and ChatGPT Podcast - EP 494: Gemini 2.5 Pro Unlocked: Inside the world’s most powerful AI model
Episode Date: April 1, 2025We gotta talk about this 👇Everyone and their mama’s out here creating Ghibli-style AI images, but no one’s talking about the most capable AI ever created. And it’s available now. And for fr...ee. Let’s talk about (Pt 1 of 2) Newsletter: Sign up for our free daily newsletterMore on this Episode: Episode PageJoin the discussion: Thoughts on this? Join the conversation.Upcoming Episodes: Check out the upcoming Everyday AI Livestream lineupWebsite: YourEverydayAI.comEmail The Show: info@youreverydayai.comConnect with Jordan on LinkedInTopics Covered in This Episode:Gemini 2.5 Pro Background & ReviewGoogle's AI Model Update StrategyTech Hybrid Model Explanation1,000,000 Token Context WindowAdvanced Coding CapabilitiesMultimodal Features in Gemini 2.5Google AI's Free Access AnnouncementCompetitive Benchmark ScoresTimestamps:00:00 Unscripted AI Insights with Gemini04:32 SoftBank Funding Hinges on OpenAI's Status07:01 "Overlooked Potential of Gemini 2.5"09:33 Gemini 2.5: Chain-of-Thought Model14:40 "Google Gemini's Unprecedented Triumph"16:40 Google's New Strategy: Quiet Innovation21:53 Google's New AI Strategy24:39 Future-Proofing with AI Models26:29 "Gemini's AI Excellence in Logic"32:07 "Testing and Deploying Simple Apps"35:32 AI Development in Smaller Contexts37:17 Hybrid AI Models: Google vs. OpenAI43:02 "OpenAI Overshadows Gemini 2.5"44:12 "Feedback Needed: Testing Gemini 2.5"47:33 "Subscribe to Everyday AI"Keywords:Gemini 2.5, Google, AI update, Large Language Model, Runway Gen 4, AI-powered video generator, OpenAI, $40 billion funding, SoftBank, Sam Altman, NovaACT, Amazon, AI agent, Alexa Plus, Google DeepMind, Chain of Thought reasoning, 1,000,000 token context window, Coding capabilities. Multimodal, GPT 4.5, Human preference, Elo score, AI Studio, Vertex AI, NotebookLM, Deeper integration, Personalized Gemini, LM arena, 2025 AI predictions, Humanities Last ExamGemini 2.5 Pro Unlocked: Inside the world’s most powerful AI model (Pt 1 of 2)Send Everyday AI and Jordan a text message. (We can't reply back unless you leave contact info) Start Here ▶️Not sure where to start when it comes to AI? Start with our Start Here Series. You can listen to the first drop -- Episode 691 -- or get free access to our Inner Cricle community and all episodes: StartHereSeries.com Also, here's a link to the entire series on a Spotify playlist.
Transcript
Discussion (0)
This is the Everyday AI Show, the everyday podcast where we simplify AI and bring its power to your fingertips.
Listen daily for practical advice to boost your career, business, and everyday life.
Meet Firefly AI Assistant, now live and Adobe Firefly, the All In One Creative AI Studio.
Just describe what you want to create and the assistant handles the rest,
orchestrating multi-step workflows across Photoshop, Premiere Express, and more in one conversational interface.
You direct the outcome.
The assistant accelerates execution.
In the two plus years that I've been doing the everyday AI show,
I don't know if there's ever been an instance where an AI update this big,
especially a large language model update, has been talked about so little.
I think that there's a reason for it, but we're going to talk about it today because I think
the new Gemini 2.5 Pro model from Google is probably the best single large language model I've
ever used. And I don't think I'm alone in that because it has not just broken just about every
single benchmark, but in terms of human preference, it is quite literally off the charts.
So today we're going to be going over Gemini 2.5 Pro unlocked inside the world's most
powerful AI model. All right. I'm excited for today's conversation. I hope you are too.
What's going on, y'all? My name is Jordan Wilson and welcome to Everyday AI. This is your
daily live stream podcast and free daily newsletter, helping us all not just keep up with AI,
but how we can use all these advancements to get ahead to grow our companies and our careers.
If that's what you're trying to do, welcome. This is where you learn on the podcast or the live
stream. But this is only half the battle. You need to leverage what we talk about today and where
you do that is our website. So if you haven't already, please go to your everyday AI.com.
Sign up for the free daily newsletter. Each day in our newsletter, we recap each day's
podcast or live stream as well as keeping you up to date with literally everything else
in the world of AI. So it is your one-stop shop to stay ahead just like this podcast. I always like
to remind people, this is unedited, unscripted, trying to bring you all something real in the world
of artificial intelligence. All right. So I am excited to get into today's
and talk about Gemini 2.5 Pro, by far the most powerful AI model I've used.
But before we do, let's first start off as we do some days, well, most days with going over the
bullet points of the AI news.
All right.
So first, runway has unveiled Gen 4, their newest AI powered video generator capable of
creating consistent characters, locations, and scenes with realistic motion and physics.
So the new model allows users.
to generate videos using reference, images, and textual descriptions, offering superior prompt
adherence compared to previous models and style consistency without additional training.
So backed by investors like Google and InVIDIA, Runway does, though, face some legal challenges
over copyright concerns while aiming for 300 million in annual recurring revenue and a $4 billion
valuation. So a study warns that AI tools like Gen 4 from Runway could disrupt more than
100,000 U.S. entertainment jobs by 2026,
raising concerns about the future of the film and TV industry.
Yeah, that just goes to show how good these new models are.
So, yeah, runway gen 4, I think is probably in SORA territory,
maybe a little bit better.
I mean, we'll see it is just dropped.
So I'm sure the reviews are going to be coming out.
But I think, you know, Google VO might have some competition.
And, hey, in terms of availability,
runway gen 4 is available to everyone like OpenAI's SORA is, whereas Google's V02 tool is not available to everyone, at least inside of their platform.
You can access it from third party platforms, though.
All right, our next piece of AI news, another record breaker.
Open AI has officially secured a record breaking 40 billion, with a B, 40 billion dollar funding round valuing the company at $300 billion.
So Open AI has closed that historic 4 billion.
billion dollar funding round, making it the largest private tech investment ever.
All right.
So it does value the chat, the chat GPT creator at $300 billion.
And the round was led by Japan's soft bank, contributing 30 billion of that amount,
with additional investments from Microsoft, interesting there, Thrive Capital and others.
The funding does come with a condition, at least from SoftBank.
their investment could drop from $30 billion to only $20 billion if Open AI does not fully transition
into a for-profit entity by the end of 2025.
And that would require approval from both the California Attorney General and Microsoft
in a resolution of these ongoing legal challenges from Elon Musk, which I think are pretty much
theater.
All right.
And this also comes as Open AI did just announce that their weekly active users has jumped up
to 500 million.
And Open AI CEO, Sam Altman did just say on Twitter that they added literally a million
people in an hour, probably with all the Ghibli AI studio, you know, video or photo generations.
Also, this comes on the heels of Open AI just announcing that they would release an open
model.
So pretty exciting news there.
So make sure to follow along.
We'll be following that news.
All right.
Last but not least, some big news from Amazon.
they've unveiled Nova Act, a new AI agent to compete in the agentic race.
So Nova Act from Amazon is an AI agent capable of independently navigating web browsers
to perform basic tasks such as filling out forms, making reservations, or ordering food.
So the Nova Act SDK is a toolkit for developers and it's available now as a research preview
on Nova.com allowing developers to prototype a giant.
applications. So Nova Act is developed by Amazon's AGI lab, co-led by former OpenAI researchers.
So according to Amazon, Nova Act outperformed open AIs and entropics agents in internal tests.
But despite those claims, Amazon has not yet benchmarked Nova Act on some more widely recognized
evaluations like Web Voyager. Also, reportedly, Nova Act will play a critical role in
Amazon's upcoming Alexa plus upgrade a generative AI enhanced version of Alexa,
potentially giving Amazon a competitive edge through its massive user base.
All right.
So we're going to have a lot more on those stories and everything else you need to get ahead
on our website at your everyday AI.com.
So make sure you go there and sign up for the free daily newsletter if you haven't already.
All right, enough chit chat.
Let's get into Gemini 2.5 Pro.
No one's talking about it.
It's wild.
Like the fact that we have a large language model that's available now with the capabilities
that Gemini 2.5 Pro has and hardly no one's using it.
No one's talking about it is pretty telling, right, of a couple of things.
I think this is a case of shiny AI syndrome, as I like to call it.
Right.
I think that what Google has released can change fundamentally how we all do business.
yet so few people are using it just because there's a new shiny AI object in the room,
which is the new 4-0 ImageGen from Open AI, really a groundbreaking visual model.
And yes, we are going to be doing a show on that sometime soon.
That one's going to require a lot of research.
And speaking of that, even today's show, we're actually going to break it up into two chunks.
So today we're just going to be talking about kind of the bullet points, high level,
what's new.
and then we're going to be doing a show maybe later this week or next week,
kind of a part two.
So let me know what more you want to see what you want us to test from Gemini 2.5.
So, you know, our second part is going to be more hands on and use cases where today we're
really just going over the bullet points of what's new.
So make sure to let me know, live stream audience, what you want to hear from part two use
cases.
You want to see all that good stuff.
Speaking, hey, good to see.
see you, you know, our YouTube family here, our LinkedIn, thanks for tuning in, Michelle, Samuel,
Jose, Charest, Kyle, Sandra, Gene, Big Bogey, Christopher, Brian, I can't get to you all.
Thanks for joining, but do let me know what questions do you have on Gemini 2.5, but let's start here.
What the heck is new in Gemini 2.5? Well, there's a lot, and it's also a little confusing because,
you know, you might be having deja vu.
You know, you might be saying, okay, wait, there's new Google Gemini updates,
you know, out of nowhere.
Didn't this just happen?
Yes, it did.
So we're going to be doing a quick recap also of what was released like literally two weeks ago.
But first, let's talk high level of what's new in Gemini 2.5.
And then we're going to be going over all of these kind of piece by piece.
So some of the biggest things is now it is a technical,
hybrid model, although Google did not choose to call it a hybrid model, but what that means
is it has built in thinking. So Gemini 2.5 Pro is a thinking model. It uses chain of thought
reasoning kind of under the hood. We've been talking about this a lot over the last few months,
and we'll continue to talk about it a lot in 2025. Kind of this is the new direction that large
language models are going. So Google following suit here with Gemini 2.5. So think of it this way.
you kind of have your quote unquote old school, you know, transformer bottles.
And then you have your reasoners that essentially use more compute to kind of do this chain of thought thinking or chain of thought reasoning under the hood.
So Google Gemini 2.5 kind of combines both.
So if you have simpler task, at least in my testing, it still kind of goes through that reasoning or those thinking steps, although it's pretty quick.
So it kind of depends or sorry, Google Gemini 2.5, kind of the same.
how much compute or how much thinking it needs to use.
But that's probably one of the biggest, you know, what's new.
The other is the context window, enormous.
One million token context window, which is roughly like 750,000 words or 1,500 pages.
So we are talking literally multiple books.
I mean, we're talking 30,000 lines of code as an example.
So if you are brand new and you're like, what the heck is the concept?
context window, right? That's essentially how much a large language model can remember at any one given time.
This is different than memory, right? But essentially, you know, think if you're having a chat with a
large language model and you're giving it some information and you're going back and forth, right,
with older models, right. So, you know, let's even talk chat. GPT, they're kind of a little behind
in terms of context window, right? They had a, you know, roughly 32,000 context window on their front end
chat product. So that means, hey, after, you know, 26,000 words, chat GPT is going to start
forgetting. So with this, at least in AI studio, uh, I did not see Google, uh, clarify
anything on the front end. If you're using this inside of Google Gemini on the front end chat
bot, uh, we will be testing that though. We'll probably share it in the newsletter.
Uh, but hey, essentially a one million context window, one million tokens is wild. That means that
the chat is pretty much not going to forget.
until you use it like incessantly, right?
Like until you are going wild and you're not leaving that chat and you're dumping thousands and thousands or sorry,
I should say hundreds and hundreds of pages, it's still going to remember, which is huge.
Another thing, advanced coding.
Some of the top benchmarks score for, you know, sui bench as an example and complex code generation.
So if you are big into software development, if you're big into coding or even vibe
coding, right? This whole concept of, hey, I'm just going to open a large language model,
have it code something for me, you know, have it code a Chrome extension for me, have it code
a little desktop application, you know, have it code a simple CRM, right? This was one of my bold,
you know, 2025 AI predictions is everyday people like you and me would just be using AI
to code our own little pieces of software. Gemini is great for that, right? And the big,
like the good thing is you don't have to know anything. You don't even have to tell it what coding language
to use. Just be like, yo, Gemini, I want a Chrome extension that does this, build it for me,
and then give me simple step-by-step instructions on how I go ahead and install and deploy it.
So fantastic for advanced coding. Already, I would say it's not the top coding model in the world.
I still think Claude's on at 3.7 inches it out a little bit. There's a lot of different coding
benchmarks. But, you know, essentially Claude from Anthropic was so far ahead. It's like it wasn't
even close, right? It was like they were 1A, 1B, 1C. They were probably even number two, right?
And everyone else was so far in the distance. Now Google has closed that gap and they're essentially
1B with Gemini 2.5. Benchmarks. Human preference is huge. So, you know, I've talked about this a
little bit. I think a lot of the kind of the AI labs, especially in 2024, were kind of overfitting
models. So what that means is when they were building them, going through post training, all that,
is they were doing it to get certain scores on benchmarks, right?
So Google Gemini 2.5 does that, right?
Like not saying they overfit it to get certain benchmarks,
but it cleans up on benchmarks and, you know,
essentially has top scores, you know,
either number one or number two on every important
and telling benchmark that there is.
However, the big one is the ELO, kind of the ELO score.
So in the LM Arena, this is essentially,
I talk about this, a lot of the show,
Think of it as a blind taste test, Pepsi versus Coke.
You put in a prompt.
You get two outputs and you choose which one is better.
Those outputs are not names, right?
And that kind of gives you an Elo score.
Generally, when a new model comes out, right?
So a GROC3 or a GPT40 latest or, you know, or Cawd-3-7, right?
Usually the new state-of-the-art model will go into first place generally on the Elo scores,
but maybe only by like two points.
Generally, it's usually like a two to four point jump
anytime a new state of the art model comes.
And it's like, oh, it's the most powerful model
in terms of what humans prefer
because that's extremely important.
Right.
In this case, Google Gemini came out by a 39 point margin,
which is literally unheard of, has not happened.
So yes, it checks the box in terms of benchmarks,
but it definitely checks the box in terms of human preferences,
which I think is used.
usually more important, right?
And the L.M arena has, I believe, multiple millions of votes, right?
Not millions of vote yet for Gemini 2.5, but already with enough qualifying votes,
it is the top in humans preferred by far.
The other big thing, it's free, right?
Google snuck this in actually over the weekend.
So they announced Gemini 2.5 last week.
A couple days later, they're like, oh, guess what?
We're going to make it available for free.
So if you do have a Google Gemini account, so you can just go to gemini.com, you know,
you can use your Gmail or Google workspace credentials and you'll find Gemini 2.5 in there and you can
start using it for free right now.
All right.
So that is the high level.
All right.
And hey, live stream audience, let me know what your thoughts are, you know, of Gemini 2.5.
Sandra's asking you can use it to code a widget for you.
yeah, you can use it to code anything, Sandra, but yeah, you do have to, you know, as an example,
a Chrome extension or, you know, something that runs on your desktop, you have to still execute that,
but it will write the code for you and tell you how to, you know, install or execute it.
So let's go over because you might be thinking, didn't this just happen?
I'm confused.
Wasn't there just new Gemini two point something updates?
Yes, there were.
Okay.
So about two weeks ago, mid-March.
If you go back and listen to episode 482, if you want the full updates, we gave it to you there.
I love Google's new strategy here, right?
I think they had their original, you know, kind of December, 2023 snafu, where, you know,
they put out this fancy marketing video about their AI.
And it turns out a lot of it wasn't true and it didn't work.
And they kind of got drafted the mud.
And they spend the better part of 2023 and 2024 way behind.
ever since. I love what Google's doing.
They're not coming out with flashy advertising,
flashy marketing, big announcements, big hype.
They just come ship, right?
They just ship updates that are pretty amazing.
So they did two weeks ago announce some pretty impressive updates
that I still don't think people talked about.
So if you want to know about that,
you can go listen to that in episode 482.
Again, for free on our website.
Yeah, if you didn't know on our website,
you can go and listen to every single episode we've ever done,
interviewing some of the world's top experts on AI.
But here's essentially what was announced in the mid-March version, so we can get this out of the way.
So Gemini 2.0 multimodal, which was huge.
I think that kind of set the stage for this whole, you know, GPT40 ImageGen.
Multimodal by default.
Amazing.
I went over that.
You can literally create a blog post with inline images wild, right?
You can also edit images with natural language.
kind of like what you can do now with GPT4O's ImageGen.
So mid-March, Google announced the Gemini 2.0 multimodal.
They announced deep research was updated to the 2.0 model,
whereas previously it was running on 1.5.
They announced personalized Gemini, which I think some people like, some people don't like, right?
But it essentially takes into account your search history.
So it's a mode that you can select.
So that was new.
They also announced Gemma 3, which is wildly powerful for a super small,
open source model so you can run it locally.
They announced Gemini robotics running on Gemini 2.0
as well as big updates to my favorite AI tool notebook
LM. Also, they upgraded that as well under the hood to the Gemini 2.0
in a model versus previously it was running on Gemini 1.5.
All right. So if you're scratching your head and being like, wait, is Jordan like
a month late on this? No. Gemini just did or sorry, Google did just have
a ton of big updates a couple of months ago.
All right.
Let's get into it now.
Let's go over kind of point by point here.
Again, this one's not going to be a super long one because we are going to have a point two or sorry, a part two.
But here's kind of what's new in the Gemini 1.5 pro launch.
Adobe just introduced an entirely new way to create, bringing the power and precision of its creative suite into one conversational experience.
Meet Firefly AI Assistant, now live in the Adobe Firefly app, the all-in-one creative AI studio.
Powered by Adobe's creative agent, Firefly AI Assistant lets you start with your vision,
just describe what you want, and shape the outcome as it takes form with the Assistant.
The Assistant orchestrates multi-step workflows, drawing on 60-plus pro-grade tools across Adobe Creative Cloud apps,
including Photoshop, Illustrator, Premier, Lightroom Express, and more to help bring your ideas
to life. You can also get started with creative skills, a growing library of pre-built workflows
for common creative tasks like batch editing photos, creating mood boards, portrait retouching, and
creating social variations. Every step the assistant takes is visible so you can refine,
redirect, or take over at any time. You stay in the driver's seat as the creative director.
Adobe Firefly AI assistant now in public beta. See it today at firefly.adopi.com.
launched late March by Google and Google DeepMind as their most intelligent AI.
The biggest thing, like we talked about, it focuses on built-in thinking or using that chain of thought.
This is huge.
This is huge for reasoning, coding, context handling.
I mean, there's a lot of new capabilities and it does change what's possible for businesses.
All right.
So how can you access Gemini 2.5 Pro?
Well, like I said, over the weekend, Google.
just with a tweet, just said, oh, by the way, we're making this for free.
So it is available for free to Gemini app user.
So if you're using this inside the kind of Gemini chat, so gemini.com, right, especially if
you're on a paid account, you do have the option to turn off model training.
So, you know, you don't have to worry about the data that you share being used to train
Google's model.
So on the front end, you can access Google Gemini that way.
You can also access it for free in Google AI Studio, which is a kind of a more experimental version and more of a sandbox.
And I'm glad that Google has shifted their strategy after I like, I don't know, I feel I like did so many rants in 20203 and 2024 because Google for like a year kind of quote unquote hid their most powerful and capable models inside Google AI Studio, which is more for developers.
and then they didn't even label or tell you what was powering their Gemini chat bot.
So you had no clue.
But usually it was running a model that was up to six months old.
So not anymore.
I love Google's new strategy here.
Put the newest, the latest, the greatest model inside the front end Google Gemini chatbot.
But you can still use Gemini 2.5 inside Google's AI Studio.
And that is where you're going to be able to get that full one million context.
Just keep in mind in Google's AI Studio.
is free, but there's no data protection on that end.
So yeah, don't go and put, you know,
confidential proprietary company data inside Google AI Studio.
It is more of a sandbox.
Also, the enterprise path, it will be coming soon to Google Cloud Vertex AI in the coming weeks, right?
So there's technically, I know it's a little confusing, right?
There's so many different ways that you can access Google and Google Gemini, you know,
as well as inside their apps, right?
So they didn't say yet, you know, as an example, if their Gmail, Gemini integration has been upgraded to 2.5, I'm not sure.
But at least for right now, you can go access it, Gemini.com, even for free.
If you have a paid account, you have higher limits, as well as you can access it for free inside Google's AI studio.
And it will be coming soon kind of across Google's family of products via the Google Cloud Vertex AI.
All right. Let's talk a little bit about the reasoning.
So it does have that built in chain of thought like we talked about.
So what the heck does that mean?
Well, it kind of plans steps internally before it gives you an answer.
And the cool thing is, you can click that show thinking.
I don't know why.
I feel a lot of people don't read that, at least people that I talk to.
I highly encourage you if you want to get better outputs out of any large language model that shows its kind of chain of thought.
you should be reading that, right?
Because you'll see what happens a lot of times,
especially with these kind of hybrid models that can reason.
They take a little bit longer, which is okay, right?
Because outputs in general are exponentially better,
more accurate, more robust, more complex, much better.
However, it does take a little longer.
So what I always do while I'm quote unquote waiting, right?
It might be 10 seconds.
It might be two minutes, depending on how complex of a query you're giving
the model, read the chain of thought. Always read it, right? If you want to, you know, be future
proof in your job, right? If you want to be the smartest person in AI and in your department,
read the chain of thought and go ahead and accordingly make updates to how you use that model,
how you use the prompt, right? All the thinking models work a little bit differently, right? So you
have Claude, uh, three seven sonnet. Uh, it's a hybrid model with thinking, although if I'm being
honest, I think that was more of a marketing thing because you
still have to click the extended thinking.
Anyways, you know, you also have, sorry, open AIs, all their models.
They're 01, 01 Pro, O3 Mini, O3 Mini, High.
O3 Mini Pro should be coming out via the API soon, right?
So always, no matter what thinking model you're working with or reasoning model or hybrid
model, look at the chain of thought, see what's going right, see what's going wrong.
I always tell people, have a conversation, reprompt to get better results.
Also, you got to talk, hey, one, maybe we'll do a dedicated show on this.
I don't know, live stream audience.
Let me know if you want to know more about this humanity's last exam benchmark.
It is a newer benchmark put together by, I think they said hundreds of subject matter experts.
Essentially, it's a benchmark that in theory shouldn't be in any training data yet.
So it did get the new Gemini 2.5 gotten 18.8% on the score, which you might think like, oh, 18% of 100.
AI's dumb. All right, humans. I doubt any human out there listening. Any single human could get a 1% on this humanity's last exam. Let's be honest, right? But the previous high score was OpenAI's GPT 4.5, which got a 14%. Anthropic Claude 3.7 got an 8.9%. I believe Deepseek was shortly there behind in the mid 8%. So yeah, Gemini 2.5, an 18.8%. So,
hey, in terms of it being able to solve and tackle very complex problems that the single smartest human in the world could never solve, right?
You'd have to get hundreds of people working together to be able to make a dent in this humanity's last exam.
You know, Gemini did a great job.
Also, it excels at complex, logic, and math without needing external tools.
So we did talk a little bit about some of these benchmarks, but like I said, right away out of the gate in the L.M Arena, which is,
just human preference.
You know, being 39 points above the last or the next best model.
Super impressive.
Some other top math and science scores on the aim 2025 got an 86%.
And the GPQA diamond science, it got an 84%.
Very impressive.
And then the, in 81% on the MMMU, which is the multimodal equivalent of the old
standard of AI testing, which is the MMLU.
All right.
And y'all, even though this, even though a Gemini 2.5 has only been out for a couple of days,
they've already made multiple updates to it since.
All right.
So since launch.
So a couple things.
Number one, they made it open and available for free users.
So that was not available at the time of launch.
It was only available to paid users inside Gemini.com.
so on the front end chat bot.
So now it's available to all free users.
Also, just hours ago, y'all, this is why sometimes I don't sleep and why I don't always,
you know, sometimes I'll do pre-recorded shows.
But literally just hours ago, CEO Sundar Pachai just did kind of announce that the new canvas mode
is available in 2.5 Pro.
So I did use it actually while planning this show going through my notes and, you know,
having it put together kind of some interactive elements to help me better learn and understand
what was new. So there are some things in the canvas mode that worked very, very well.
There were some things that were buggy, if I'm being honest, right? It is experimental.
Keep that in mind. Also, they added support for third-party tools like cursor AI. That's huge.
So we should see because Claude, you know, Anthropics Claude, it has been making a living
right, essentially by being the coding LLM of choice by the top software developers, by the top
engineers.
So we'll see.
I do see that potentially changing, especially when you look at API costs.
The claw models are rather expensive and the Gemini models aren't.
So we'll see what happens.
And if Anthropic still is kind of the de facto model chosen for software developers.
Also, Sundarpa Chai hinted at future MCP support for Google Gemini.
Pretty big news.
So that's model context protocol.
I know we're going to do a dedicated MCP show soon, but essentially, you know,
if you've been seeing this little acronym floating around, you're like, what the heck is it, right?
So essentially you have APIs, right?
So in the SaaS world, in the software world, right?
APIs is essentially a language that softwares can talk to each other, right?
APIs can sometimes kind of work for AI tools in large language models, but not necessarily.
So they're a little different.
So this model context protocol was actually developed by Anthropic, but it's being used now and supported by just about everyone.
Open AI last week announced support for it.
And, you know, so Google and Google Gemini may support, you know, MCP as well, which is essentially,
I like to think of it.
It's a little more complex than this.
Think of it as the API for large language models.
It allows different AI systems and different large language models to talk to each other
and to talk to other APIs and to other softwares.
All right.
Next, the coding abilities.
All right.
So we already talked a little bit about this is one of the more unique or at least one of
the angles that Google is taking with Gemini, really pushing and promoting its proficiency in coding.
So very impressive.
So, you know, they put some demos out there, but also in agentic coding, you know, it scored a 63.8% on that suede bench, which, you know, when it comes to agentic coding, I do think that is the benchmark to look at, you know, go go play with it, right? And the good thing is, is now you have that canvas mode inside Google Gemini 2.5. So you can literally go code anything you can think of with natural language. Code me this, build me this, right? And you can render it or run it in.
in the new canvas mode.
So it's a little different than OpenAI's canvas mode,
which I think is more of like a Google Docs-esque collaborative environment.
You can run certain coding languages inside OpenAIs version of Canvas or
ChatGPT's version of Canvas.
But I'd say in my limited testing of Canvas so far, which came out a couple of weeks ago,
I'd say it is more like, it is more like Anthropics artifacts feature in terms of
it can render and run a lot more languages, right?
And go, go have fun, right?
This whole vibe coding thing, right?
It's, it's been kind of this, this, this trending topic.
Go vibe code yourself something.
See if you can, right?
And then if you can get it to run inside canvas, then that means, okay, it's working.
And you could go, you know, deploy it somewhere else, whether you need to, you know,
have it running on a full stack, uh, kind of app, you know, running on some service online or
whether you would run it on your desktop, whether you might,
you know, as a Chrome extension, et cetera, right?
I one-shot it, which was pretty fun.
And I shared it in the newsletter yesterday.
I don't know if anyone saw it.
I did a little, you know, simple Chicago-inspired game, right?
A little, you know, side runner to the, you know, very, you know, early Nintendo-esque-type game.
But just one shot.
I said, hey, do it like this, you know, working in all these Chicago elements, you know,
hot dogs and pizza and potholes, right?
You know, make it kind of, you know, bring in albumints that I like from Super Mario, right?
I need one shot and it worked.
Very amazing, right?
So it like I said, instantly from a coding software development, you know, we're going to put
it through some more testing and maybe we'll do that in part two if that's something you
want to see, but very proficient in coding.
Next, we have to talk about the multi-modality and the context window.
So Gemini in their 2.0 versions, everything is multimodal by default.
So what that means is it understands not just text,
but it understands images, audio, video, code inputs,
or a mixture of all of those things together,
which is pretty amazing.
So we are getting that as well and getting close to that from big models
like Anthropic Claude in OpenAI,
chat GPT, but not quite there yet, specifically with video, right?
That's kind of a different modality for at least open AI's chat chbtbt.
Claude, I don't think is really going to play too heavily in the multimodomot by default space,
although I feel they should, right?
I think Claude was really hoping they could carve out their niche just with software,
just with coding.
But, you know, Google's like, hey, hold my, hold my MCP.
I mean, the $1 million token context window, amazing.
Like I said, we're going to be putting that to the test test inside of the Google Gemini front end.
I do know and have done some testing on the back end of AI studio context window.
Super impressive.
Also, Google did announce that they're planning soon for a $2 million token context window.
I mean, that's wild, right?
So one thing I'm going to probably do is get together,
transcripts, right? Like, I have almost 500 episodes of the everyday AI show. That's thousands of
pages of transcripts. So that's probably something I'll do for a test, upload everything.
But y'all, like, as we get to multiple million token context windows, I should have put
this in my, you know, AI 2025, you know, roadmap series. So, you know, if you haven't listened to that,
make sure you go on our website and go listen to free for free to those. It was a five.
part series. I don't know. I think, you know, rag is going to become a little less important
in 2025 and 2026. I'm not saying it's not going to be needed. It's still going to be needed,
right? But I think so many especially smaller companies in small use cases, you know, they heard
this rag terminology, you know, really in late 2023 and 2024. And everyone's like, oh, I need to
build, you know, retrieve a long minute generation, right? But okay, what if you don't have a ton of data,
right? What if you don't actually have a ton of files and it's not a lot, right? You might just be able to work in that 2 million token context window. So, you know, the context window is actually extremely important to the future of AI development. All right. Let's talk a little bit about some of the early feedback. So like I said, we've shared about this in our newsletter, but there's been some very impressive, you know, one shot generations, people building video games, precise image analysis, 3D simulations, extremely impressive. And we'll be doing,
Some of those in our part two of this series.
Audio skills, being able to instantly get accurate transcriptions,
very impressive and just positive, right?
Just positive, you know, people are always like vibes, right?
The vibes on 2.5, Gemini 2.5, pretty positive so far.
Let's look at the market impact.
So Google right now, they're trying to be the leader in thinking model.
right. They're kind of beating open AI to the hybrid punch. Like I said, technically,
Anthropic was first with Claude 3.7 sonnet, but I don't know. I actually talked to a couple of
people of this at the, a couple of people about this at the Nvidia conference at GTC in the few,
you know, two minutes, a free time I had between like the 15 interviews I did out there.
We still do have like, what are two more shows dropping from GTC, by the way? A lot of people
And we're like, yeah, I'm like, hey, what do you think about this, this new hybrid approach from Claude?
And they're like, oh, is it really hybrid, right?
You technically have to click if you want the extended thinking or not.
But with this, I think Google is in the driver's seat, at least right now.
When it comes to this new hybrid model approach, which we also heard from Open AI is going to be their approach moving forward as well.
So they've said when we get GPT5, it's going to be more of a system, right?
And you're not necessarily going to be able to choose which model that you use, which some people might like.
If you look at OpenAIs chat GBT and, you know, in my pro account, I think I have nine different models to choose from.
Some people might be intimidated by that.
So, you know, at least the GPT5 is going to be more of an architecture that's going to kind of use this mixture of models or mixture of experts or using kind of traditional, you know, quote unquote, old.
school transformer models and hybrid or, you know, these, these reasoning and thinking models.
But Google with this, they're, they're the leader in it right now. I don't think Anthropic did a good job
with it. If I'm being honest, I don't. I think a lot of people were not super impressed with Sonnet
37. I know a lot of people, you know, defaulted back to Sonnet 35. They weren't very impressed.
And they didn't feel they had kind of enough control, right? At least on the front end users,
that's what we're talking about, not on the back end. But
I mean, this play right here.
So aside from being a leader in terms of the thinking model, also, I mean, with the enterprise game, right?
So this isn't released for Vertex AI yet, which is probably a good idea, right?
Because it is buggy.
I should probably say this, right?
I will say when open AI releases a model, at least this is in my personal experience,
I'm using, you know, the main models every single day, multiple hours a day.
The Open AI's models, when they're released, yes, they're throttles.
they may go down, right? Gemini has better availability, right? You know, a lot of times,
especially if you're on a free plan or, you know, the basic $20 a month plan with Chatt
GBT and a new model comes out, you know, it might be very slow or availability might be impacted,
but when it is there, it works fairly well, I will say. So I will say that Gemini 2.5,
although it's not, you know, there's no slowdowns, there's no real outages, the availability's
there. It has been a little buggy, right? So the canvas mode, although it's only been
out for a couple of hours. It's been hit or hit or miss for me, but when it hits, it hits.
The same thing with just the general, you know, Gemini 2.5. It's been a little buggy,
but it's experimental, right? You know, I always run a series of tests and sometimes we were
getting not, I wouldn't say hallucinations, but some misdirections, right? One thing I always do
to test its internet capabilities. I say, hey, what's the, what's the latest episode of the
Everyday AI podcast by Jordan Wilson? You know, so I see, okay, is it?
actually able to navigate to the web and find the latest episode.
And instead, it gave me the weather, right?
The weather was accurate, but that's not what I asked for.
So, you know, hit or miss so far, but I think once Google irons out some of those things,
it's going to, you know, be a very impressive and reliable model.
But I think that's honestly why they haven't really released it for Vertex AI yet, right?
So that's when you can, you know, when you'll start seeing it deployed at scale.
you know, across many large enterprise organizations.
But I do think that Google is taking more of a tiered approach
and making sure individual users,
people kind of using their sandbox in AI studio,
have a good experience.
They're going to want to squash some of those bugs
before they release it to the masses.
All right.
And then kind of last but not least,
and hey, live stream audience,
thanks for sticking with me.
We are going to have a part two.
If you have any questions, get them in now.
I'm going to scroll through, see if I can answer any.
But last but not least, we have to look at the future outlook and updates.
So we are going to see some pricing updates probably soon for the API because I do believe
there's going to be some heavy usage.
Google has said that they're working on enhancing the reasoning and coding even further.
So there will be some under the kind of under the hood updates.
That's another important thing to think about, right?
So even though we saw this jump from, you know, Gemini 2.0 to Gemini 2.5,
that doesn't mean that Gemini 2.5 won't be updated until we get something like Gemini 3, right?
Yeah, you have to kind of keep up with, you know, sources such as everyday AI, right, to see when
some of these more under the hood model updates come out.
But I do see it, you know, I think they're going to squash some of these bugs, make some
improvements.
But the biggest thing is the ecosystem, right?
I'll be interested to see when and if Google announces if Gemini 2.5.
Pro or Gemini 2.5 is going to be rolled out with deeper integration into its ecosystem.
So that means, right, and I expect it better in deeper integration across, you know,
Google Sheets, Google Drive, Gmail, docs, et cetera.
I would love to see if we're going to get 2.5 in Notebook L.M in Google Gems,
which is kind of their version of, you know, GPTs, right, creating kind of this personalized version of Google Gemini.
Also, you got to get ready for the clapback now, y'all.
That I think I'm going to end on aside from your questions, because here's what happens.
Anytime a model like this comes out, and it is met with fanfare.
And by fanfare, I mean a combination of traditional benchmarks.
Google Gemini's got it.
Human preference, they got it in the Elo.
And then just overall vibes, like I said, overall, people are loving the Gemini 2.5,
but no one's talking about it.
No one's talking about it because everyone's on, you know,
OpenAI's new 4-0 ImageGen creating Ghibli studio pictures of their family, right?
And don't get me wrong.
That is actually, I'm more impressed by, you know,
if I had to compare the two, even though there are two unrelated things,
I'm more impressed with the update from Open AI actually
because it is really driving the multimodal conversation.
and even though we did get this multimodal kind of by default with Gemini 2.0 a couple of weeks ago,
being able to create and work with images in line, being able to edit with them.
I think the execution was just much better with the 4-0 image update from OpenAI.
But from a pure large language model standpoint, Gemini 2.5 just getting completely overlooked, extremely powerful.
So we're going to be, we're going to be testing this in a part two.
So make sure.
And if you are listening on the podcast, thank you.
You can always reach out to me or just respond when you sign up for the newsletter.
Tell me what you want to see in our part two.
How do you want to see us put Gemini 2.5 to the test?
What use cases?
Demos, do you want to see us run?
Big bogey face here asking, let's test its coding skills.
We can definitely do that.
Kabari asking on YouTube in terms of capabilities on a scale of one to 100, where is AI now?
Oh, that's a good question.
I don't know.
If we're talking about Gemini 2.5, I mean, you have to say it's in the 90s, right?
And it's ahead of everyone else.
If you're asking in terms of the AI, you know, as a whole, I don't know, right?
Because that 100 or the ceiling is constantly being raised, right?
Again, if you would have told people two years ago that we would have,
models this capable, this powerful, available for free. I think you'd say no. Like, oh, that's not
possible, but it is. Here we are. So the ceiling keeps getting raised. Denny asking, what about
content creation for writing needs? Most of what was mentioned are video or tech kinds of needs.
So I will, I will say this, Denny, great question. And maybe that's something we can test as a
use case, just kind of creative writing. But I do think that Gemini has always had a nice knack for
creative writing, right? I think ultimately,
with proper prompt engineering, Open AI has always been best.
But if you're talking about zero shotting and trying to get some good,
kind of creative writing, I think people I've always preferred Claude.
I think Gemini is right there in terms of, you know,
what you can get out of the box with just, you know,
hey, here's five examples.
Go mimic this.
I think Google Gemini is actually great for that.
And that's something I did do some testing on a little bit.
Jose's asking, when do the live stream start?
Yeah, they start 730 Chicago time.
So yeah, if you're on the podcast, if you didn't know, this is unedited, unscripted,
you can come in here, hang out, network, ask questions.
We try to tackle everything as best as we can.
All right, y'all.
So I hope this was helpful.
Got a couple of questions, a couple comments in here at the end.
Again, we're going to have a part two.
We're going to be breaking all of this down.
Go over use cases.
Do some things alive, really push it to its limits.
So make sure you join us for that.
and let me know what you want to see, what you want to hear.
So thank you so much for tuning in.
If you haven't already,
both sign up for that free daily newsletter at your everyday AI.com.
We're going to be recapping the highlights and what you need to know from today's
live stream and podcast.
If you didn't catch everything,
don't worry,
it's going to be in there as well as everything else you need to get ahead to grow
your company and career with generative AI.
So if this was helpful,
please subscribe to the podcast.
Please leave us a rating.
I'd appreciate that.
I'd also appreciate it.
It always makes me smile a little bit,
You know, if you are listening on LinkedIn, click that repost button if this was helpful.
We spend so many hours cutting through the BS, bringing you on, you know, hopefully unbiased and just real information to help you make better decisions on your AI strategy and implementation.
So if you could repost this, if it was helpful, I'd appreciate that.
So thank you for tuning in.
I hope to see you back tomorrow and every day.
For more Everyday AI.
Thanks, y'all.
Meet Firefly AI Assistant.
Now live in Adobe Firefly.
the Allman One Creative AI Studio.
Just describe what you want to create in your own words and the assistant handles the rest,
orchestrating multi-step workflows across Adobe Creative Cloud apps,
including Photoshop, Premiere Express, and more in one conversational interface.
You direct the outcome while the assistant accelerates execution.
Stand control with the ability to step in and refine at any time.
See it today at firefly.adobie.com.
And that's a wrap for today's edition of Everyday AI.
Thanks for joining us.
If you enjoyed this episode, please subscribe and leave us a rating.
It helps keep us going.
For a little more AI magic, visit Your EverydayAI.com and sign up to our daily newsletter so you don't get left behind.
Go break some barriers and we'll see you next time.
