Everyday AI Podcast – An AI and ChatGPT Podcast - 2025 AI Roadmap Rewind Human vs Machine, AI Models Shrink, and AGI No One Noticed
Episode Date: December 18, 2025Did AI end up being a political force this year? 🤔Did AI kill start to kill off influencers? Did we achieve AGI in 2025? We take our AI predictions seriously. And in January of this year, we made... some kinda hot-take predictions about the above and more. How did we do? Tune in for Part 2 of our 2025 AI Roadmap Rewind as we give a State of AI and see if we were your AI BFF or if we led you astray. 2025 AI Roadmap Rewind Human vs Machine, AI Models Shrink, and AGI No One Noticed -- An Everyday AI Chat with Jordan WilsonNewsletter: Sign up for our free daily newsletterMore on this Episode: Episode PageJoin the discussion:Thoughts on this? Join the convo and connect with other AI leaders on LinkedIn.Upcoming Episodes: Check out the upcoming Everyday AI Livestream lineupWebsite: YourEverydayAI.comEmail The Show: info@youreverydayai.comConnect with Jordan on LinkedInTopics Covered in This Episode:Anthropic $1.5B Copyright Lawsuit SettlementAI Influencers Replace Human UGC ContentNon-Technical Vibe Coding Software TrendEnterprise Reasoner Wrappers in AI AdoptionRise of Virtual Machines for AI AgentsPolitical Impact of AI on US PolicyGlobal AI Regulations and EU AI ActNarrow AI Agents Drive Business ValueLLM Memory and Context Advances in 2025Shift: Large Language Models to Small ModelsMixture of Models Approach for EnterprisesAGI Benchmarks Surpassed, Yet Goes UnnoticedTimestamps:00:00 "AI Insights and Predictions"04:02 $1.5B Anthropic Copyright Settlement08:19 "Rise of AI Influencers"11:37 "Google's Experimental App Generator"14:05 "Reasoning Models and Enterprise Growth"17:35 Trump's AI Policy Shift22:32 "Narrow AI Revolution in Progress"24:42 "Memory Advancements in Technology"30:20 "Mixture of Experts Explained"32:15 "Model Fusion Beats Best Score"36:22 "AGI Wins IMO Gold"37:40 "Testing AI IQ Progression"43:07 "GPT-5.2: Human-Level Expertise"44:17 "AI Outperforms in Valuable Work"Keywords:2025 AI roadmap, AI predictions, Artificial General Intelligence, AGI, AI copyright case, Anthropic lawsuit, $1.5 billion settlement, OpenAI and Disney partnership, AI influencers, AI avatars, user generated content, UGC, Lil Mikaela, AI influencer industry, TikTSend Everyday AI and Jordan a text message. (We can't reply back unless you leave contact info) Start Here ▶️Not sure where to start when it comes to AI? Start with our Start Here Series. You can listen to the first drop -- Episode 691 -- or get free access to our Inner Cricle community and all episodes: StartHereSeries.com Also, here's a link to the entire series on a Spotify playlist.
Transcript
Discussion (0)
This is the Everyday AI Show, the everyday podcast where we simplify AI and bring its power to your fingertips.
Listen daily for practical advice to boost your career, business, and everyday life.
Meet Firefly AI Assistant, now live in Adobe Firefly, the All In One Creative AI Studio.
Just describe what you want to create and the assistant handles the rest,
orchestrating multi-step workflows across Photoshop, Premiere Express, and more in one conversational interface.
You direct the outcome.
The assistant accelerates execution.
Did AI end up being a political force this year?
Did AI start to kill off influencers?
And maybe more importantly, did we achieve AI in 2025?
Well, more than 11 months ago, we laid out 25 kind of bold predictions for 2025
that covered those questions and a lot more.
So now as the year comes to a close, we're fact-checking ourselves.
Why?
Because when we give predictions, they're much more than that.
We think of them more as a roadmap for busy professionals who are trying to make sense of AI and grow their careers.
And we're lucky enough on this podcast to be heard by millions of people.
So usually around this time each year, it's when business leaders reflect on what's work that year and then plan for the year ahead.
So on today's show, that's exactly what we're going to be doing.
We're going to reflect on the advice that we gave you and fact check ourselves in part two of our AI roadmap rewind.
All right.
I hope I hope you excited.
I am too.
Let's get into it.
Welcome to Everyday AI.
If you're new here, what's going on?
My name is Jordan Wilson.
And everyday AI, it's for you.
It's a daily live stream podcast and free daily newsletter helping everyday business leaders like you and me make sense of these AI updates and how we can extract
important insights to grow our companies and our careers.
So this is a little bit of a special episode.
So like I said, in January of 2025, we laid out our AI prediction and roadmap series.
And now this is part two of fact checking ourselves.
So if you missed part one, you can go click back, you know, two episodes.
Go listen to episode 674.
And you can go ahead and check on part one.
So on part one, we're going to reverse.
order. We went predictions 25 through 13. And then on today's show, we're going to go predictions
12 through one. So, uh, and we're going to be recapping it all in today's newsletter. So if you
miss a certain fact or stat, maybe you're out walking your dog or running the treadmill or running
your dog on a treadmill and you miss something, it's all going to be recapped in our newsletter,
as well as all the other AI news that you need to know for today. Let me set the stage here. So maybe
if you didn't listen to episode one or maybe this is your first episode.
Well, welcome.
So I've been lucky enough over the past three years to talk to some of the smartest people in the world when it comes to AI.
So everyone from, you know, people who work at the big companies, you know, like Google, you know, Microsoft, et cetera, to small business owners, you know, who are making use of this technology to Fortune 100 CEOs.
So you hear a lot of these conversations on the podcast, but there's a lot more that happens.
and off air that you don't hear about.
So I have been lucky enough over the past three years to talk to hundreds of the
smartest people in the world.
So when I put together these prediction shows in January, like I said, I take a lot of care
in them.
And at the time, they may seem kind of bold and crazy, but as maybe you'll find out in
part one and in part two, they actually ended up being not too crazy.
But you should go back and listen to them anyways.
I still think that they're really valuable.
And I want you to fact check myself.
because I'm going to be coming out with our 2026 AI prediction and real map series here in a couple of weeks.
So if you want to go listen to the full episodes, go listen to episode 443 through 447.
All right.
Let's get straight into it.
Like I said, we already went through 25 through 13 in part one.
So part two, we're picking up with number 12.
So number 12 was the first big copyright case would be settled.
So here's what it was.
Well, yes, that happened.
It was a 1.1.
$1.5 billion settlement where Anthropic had a class action lawsuit regarding,
allegedly just stealing information from books, right?
That's what was alleged against them.
And there was a $1.5 billion settlement.
So this is, I think, one of the first big pieces that eventually we're going to move to a
pay to train model, right?
And another good example of that was from, well, this week.
when Open AI and Disney, you know, went into a agreement where essentially Disney gets some
equity in Open AI and they pay a billion dollars and Open AI gets to use their IP.
So this was the first big copyright case that was settled.
Obviously, the biggest domino that I've talked about has still been going on for more than
two years, the New York Times versus Open AI case.
The Anthropic case wasn't the only one against the authors.
you know, of these books.
It was technically called Barts versus Anthropic, the class action lawsuit.
But we got the first big copyright case to be settled.
And there's been a lot of other lawsuits filed since very similar to that one.
And, you know, for whatever reason, I think at the time, Anthropic was maybe a sitting duck, right?
Technically, you could probably make this a similar case against some of the other big tech giants.
But, you know, they're kind of a smaller, technically a smaller company, even though they're enormous, technically one of the biggest private companies in the world.
But they were still small enough, right?
Smaller than Anthropic or sorry, Anthropic is smaller than OpenAI, smaller than Microsoft, smaller than Google.
So they may have been an easier target per se for this first class action lawsuit.
So let's go on to number 11.
So anyways, yeah, that, I don't know.
To me, okay, I also need to grade the, the prediction if it came true or not.
So this one, I'll say, yes, it came true.
I didn't think that the New York Times versus Open AI would be that big case.
So, yeah, it ended up being still a pretty big one, $1.5 billion settlement to these hundreds of authors that their books were allegedly used for Anthropics training data.
All right, let's go to number 11.
AI influencers are going to start.
killing off human UGC content.
So my actual prediction was that AI influencers or avatars are going to start
replacing human user generated content at scale.
So I'm talking specifically UGC influencer type videos.
So although there isn't a huge AI, like there's not a huge study on this,
I think it's happening.
And the crazy thing is, is most people don't know.
Here's why.
I'm not a big person on social media, right?
This everyday AI gets live stream to LinkedIn, YouTube, but I'm not like on Instagram or
Snap Hub, whatever those things are.
Right.
However, I do know we've, right, my other company, you know, I have another company we do,
you know, marketing for small businesses.
And I know the tools that are out there.
And I know people that are using these and big companies are now using just AI influencers
for UG.
campaigns, user generated content, right?
It's those videos.
If you're scrolling your feed, it's just a random, you know,
random person pushing a beauty product or, you know, a new workout,
whatever it is, right?
You don't know who it is, but someone's saying, oh, my gosh, this thing, like,
you've got to check it out, right?
So many of those are fake and AI influencers are becoming a huge thing.
So as one example, hopefully I get this right.
There's an AI influencer, Lil McKila, who has nearly three million followers across
different social media networks, earns up to $10 million a year and has partnerships with
real brands like Samsung, Calvin Klein, Prada, and a lot of others.
So these companies obviously know that they're working with an AI influencer, right?
A lot of the fans or the followers know as well, but some don't.
And I'm telling you, y'all, this is going to become the norm.
I actually had a friend text me earlier this week, you know, asking me about this because she
saw an article, you know, about something like this in the media, right?
Like, is AI going to kill off a certain breed of influencers?
And I said, hey, probably by the end of next year, I'd say maybe 80%, you know, of what you
might see online is going to be AI, right?
And the platforms are also pushing this and enabling it, right?
Whether you think that's a good thing or a bad thing, right?
As an example, TikTok expanded their symphony gen AI ad tools.
So essentially, TikTok is providing a similar service.
Meta similarly has introduced and expanded their AI ad tools use.
Right.
So these big social media platforms are giving brands the tools to use AI in their ads, right?
Not full-blown avatars is just yet, but it's getting there.
And those full-blown AI avatar companies, Hagen, Synthesia, Hour 1, there's a lot of them.
You know, they're improving vastly, right?
And then you have tools like VO, like SORA, right, that are primed now for making UGC content.
I think cameos and SORA is a great use for making UGC content, right?
Maybe of yourself, you know, because I think that's all you have the rights to do.
But, you know, that's a whole other, a whole other issue.
But I would say this has definitely been a fact.
AI influencers are starting to kill off human, uh, UGC content.
There's actually a Yahoo finance article, uh, here.
It said, Gen Z job warning as new AI trends set to destroy 80% of influencer industry.
All right.
Our next and sorry, I'm a little.
horse. Yeah. Been getting randomly sick here at the end of the year. It's the Chicago weather. It's like
I think it's today is going to be like 50 degrees, but like four days ago it was negative five. So yeah,
sorry. If I sound a little hoarse, it's because I am. All right. So number 10, this is, I said non-techies
are going to be able to are going to build on the fly software. Y'all, this was this is how
fast time flies in AI world.
This was before vibe coding, right?
Vibe coding was literally not a thing.
And I just kind of, in January, I just kind of predicted there's going to be this thing
called vibe coding, right?
Except I said it's going to be non-technical people building on the fly software.
I think vibe coding is a much better term, right?
So the term vibe coding was actually coined by Andre Capathy in February, right,
a month after this AI prediction, right?
But at the time, I'm like,
these tools are coming out, people are just going to start building their own software.
It's not going to be hard.
Obviously, this one is extremely true.
You know, and it's so true.
It's now going to be baked into browsers, right?
A recent announcement, Google's disco experiment, you can generate many web apps just from
the tabs that you have open in your browser, right?
And you don't even have to prompt anything.
You just click a button, right?
It's an experimental feature that's coming out.
But I mean, talk about things like Gemini Canvas, Chad GPT Canvas, Claude artifacts,
let alone all the, you know, higher level, you know, IDE, vibe IDE's, you know, cursor,
winserv, you know, the new anti-gravity from Google codex, right?
There's great kind of vibe coding platforms.
But then there's things that are, I'd say, more like disposable apps, right?
that you can just create in a front end large language model.
And that's what so many people are doing.
I actually had a great conversation a couple of months ago with Paige Bailey,
one of the heads there at Google.
And she said, you know, the people that are winning hackathons are non-technical people.
And they're just going into, you know, in her case that she was talking about,
they're going into AI Studio.
AI Studio has a great build feature.
You just, you don't need to know anything about coding.
You say, make me an app that does A, B, and C.
And it just happens.
And you can just use it.
And it just works.
Right.
So the reality is, well, a recent study said 70% of new enterprise apps in 2025 were built
using low code AI tools.
All right.
So yeah, crush that one.
Number nine, reasoner wrappers will hit the scene.
So my prediction back in January was that tools are going to emerge that wrap reasoning
models and enterprise data to drive.
to drive decisions.
So essentially how the transformer old school models work on structured data,
I said, well, there's going to be a kind of movement for reasoning data or a company's
kind of how they make decisions.
Right.
So I'll say this one didn't hit, right?
A little bit, maybe, but it didn't hit as hard as the other predictions.
In this case, though, I'll say that I'm definitely not wrong.
It is starting.
maybe just a little early on this prediction.
But there is probably this new agency layer, right?
Kind of what I and other people call it.
But this is a business agency.
This is a business decision making, their expertise, you know, their subject matter experts.
It's what happens in their head, right?
So my thought was back in January that there's going to be a big push for collecting and
curating essentially companies, IP, their decision,
process, right? And there is a little bit. You know, there's tools like LETA that have become
pretty essential to parse and structure the hidden thought process that pairs models,
these reasoning models with human or company reasoning, right? I still think it's going to be a
multi-billion dollar industry, you know, whether it's next year or the year after. But there is
this kind of enterprise layer growth. And I think the strongest public evidence is kind of the
explosion of agent frameworks and observability or tracing,
which is exactly what these wrappers need to function reliably in business settings.
So maybe didn't hit on this one.
Kind of close,
but I don't think it's going anywhere.
I think as reasoning models become the default,
I think that companies are going to eventually,
smart companies are going to eventually realize,
wait,
we need to feed these reasoning models with our company reasoning.
Number eight, virtual machines will become all the rage again.
All right.
So as a reminder, this is before there was even a chat GPT operator.
Poor chat GPT operator.
It was announced and it's already dead, right?
But my prediction was that because agents need a computer,
that virtual machines or virtual desktops become trendy or at least they become a thing again.
And that's definitely happened.
Right.
So even like chat GPT's agent, it uses.
a virtual machine. So maybe companies aren't out there, you know, putting out virtual machines,
but it's actually going to start happening. And it has, there's a recent announcement,
right? Like some of these predictions, there's like big news that came in November and December
and I was like, yeah, right? Like, as I've been planning the follow-up show, I'm like, yeah,
I knew, I knew something was coming, right? So obviously, Google's Project Mariner, that's
position as a browser-based agent. But the big one actually came in November.
when Microsoft announced Windows 365 for agents at Ignite.
And essentially, they have now cloud PCs that are specifically designed for AI agents.
Right.
That's it.
So, yeah, it happened.
It took a while.
But yeah, this was true.
Kind of seemed like a niche prediction because at the time, there was no general use agents
from the big companies, right?
You had co-pilot, you had co-pilot studio.
And that was it.
That was it.
And copilot studio was kind of brand new.
But no one was talking about virtual machines.
And now they're definitely a thing.
Number seven.
So AI becomes overly political.
I said AI is going to become deeply entangled in politics and policy conflict.
Yeah, that's happened.
So, you know, and I'm looking at this.
I'm based in the U.S.
Our biggest listenership is here in the U.S.
So looking at this mainly through the U.S.'s point of view.
But in December, so this week, President Trump signed an executive order blocking states from regulating AI, framing deregulation of AI as a critical weapon in the AI race against China.
So there is literally no bigger way for AI to become political than at the federal level saying, hey, states, you can't make laws on this, which is very not in the same.
same vein as mostly everything else that President Trump and the Republican Party, right,
they like to hand as much power over to the states.
You know, that's kind of been one of their big talking points.
And this, AI is completely the opposite, right?
So if nothing else, you know, this went against the grain of what President Trump has
traditionally done in his second term in office so far.
So yeah, definitely AI has become, I'd say way too political because even outside of that,
even outside of the geopolitical, even just.
in the domestically, it's become political.
Right.
So in July of 2025, the Trump administration signed America's AI action plan.
And in there, there was an executive order preventing the federal government from procuring
AI models that include quote unquote ideological biases or social agendas or woke AI.
Right.
So essentially the federal government said, yeah, we're not going to do business with anything
that we don't like, right?
Anything that we say is woke.
So y'all, I know like I already know just from what I said there, I'm going to get, you know,
people who are Democrats who are angry at what I said, people who are Republicans, angry at what
I said.
I know I'm going to get hate mail just from both sides on this, right?
I'm just speaking facts, right?
This is literally what happened.
All right.
So just FYI.
Also, it doesn't get more political in AI than the, the cozying up of big tech leaders with the federal
governments or President Trump.
So as an example, for the president's inauguration campaign, everyone donated like a million
dollars, right?
So Meta, Mark Zuckerberg, personally, this is according to reports.
InVIDIA, Google, Amazon, Apple, Tim Cook, Microsoft, OpenAI with Sam Altman, right?
Broadcom, Adobe, perplexity, right?
Essentially, everyone, just about any big time.
company donated to President Trump's inauguration.
Not really normal.
So yeah, another sign that, yeah, AI became extremely political in 2025.
All right.
Prediction number six was global AI regulations Titan, just not in the U.S.
Right.
And yeah, that happened.
And it honestly impacted how users worldwide were able to interact or not interact
with their favorite large language model of choice, right?
Obviously in the EU, very restrictive with the EU AI Act.
And a lot of features like Open AI's long-term memory upgrade got really delayed in the
EU and the UK because of this strict AI regulations.
And there's rules for general purpose AI models that became applicable in the EU starting
in August with penalties that can reach up to 30.
35 million pounds or 7% of global turnover.
Also, Singapore launched its global AI assurance pilot in February.
In Italy, also had their national AI law that took effect in October.
So you have a lot of big nations or groups of nations that have different laws, right?
I'm not going to spend time talking because it would take hours.
But a lot of just very strict regulations on how AI can be used.
And in many cases, certain features or certain AI tools just can't be.
used in certain countries, right, which obviously, you know, stifles their innovation.
And so, yeah, the EU Act went into full enforcement in 2025.
Meanwhile, the U.S. did absolutely nothing for the most part to block innovation.
There's no real regulation in the U.S.
Adobe just introduced an entirely new way to create, bringing the power and precision of
its creative suite into one conversational experience.
Meet Firefly AI Assistant, now live in the Adobe Firefly app, the all-in-one creative AI studio.
Powered by Adobe's creative agent, Firefly AI Assistant lets you start with your vision,
just describe what you want, and shape the outcome as it takes form with the Assistant.
The Assistant orchestrates multi-step workflows, drawing on 60-plus pro-grade tools across Adobe Creative Cloud apps,
including Photoshop, Illustrator, Premier, Lightroom Express, and more to help bring your ideas
to life. You can also get started with creative skills, a growing library of pre-built
workflows for common creative tasks like batch editing photos, creating mood boards, portrait retouching,
and creating social variations. Every step the assistant takes is visible so you can refine,
redirect, or take over at any time. You stay in the driver's seat as the creative director.
Adobe Firefly AI assistant now in public beta. See it today at firefly.adobie.com.
All right, number five, narrow AI agents or narrow AGI also achieved.
But anyways, my prediction when it came to agents that there wasn't going to be a runaway
general purpose agent.
The general purpose agents were not going to be good.
But narrow agents would absolutely dominate, right?
And that prediction came true.
Because you can, there's no one great agent out there.
Look at the big players, right?
Microsoft co-pilot, I'd say is probably the closest to having a general agent,
but even those are built on a company's certain use cases, right?
So they're narrow.
They're not general for the most part, right?
Open AI's agent mode, not really good, right?
I'll be honest.
Sorry, friends at OpenAI who are listening.
It's not very good.
It's slow.
It's clunky.
For certain tasks, it's okay.
It's manageable.
But general agents aren't there yet.
They will be.
I think especially with some of the recent model updates that just came out like
looks at watch in the last 10 days or the last couple of weeks,
specifically Gemini 3, Gemini 3 Flash in GPD 52,
and then Claude Opus 4.5, right?
On the agent side, that really changes what's possible.
But we haven't really realized that yet.
But the agents that have been making the biggest splashes, I'd say,
are just narrow agents, agents built around a certain vertical, a certain task.
So as an example, Salesforce's agent force 2O, you know, handles end-to-end CRM workflows,
Oracle deployed 50, role-based AI agents in Fusion Cloud apps, GitHub copilot coding agents,
obviously Anthropics, Claude code, right, a lot of coding agents, right, with narrow use cases.
And Menlo Ventures report show that there's a $7.3 billion enterprise.
investment in departmental AI, which is essentially narrow AI use cases.
All right.
Number four prediction.
LLM memory becomes a major focus.
This prediction obviously came true.
So at the time, right?
And I, I didn't want to harp on this on every single one.
I thought it's funny to talk about the, the vibe coding one.
Because it feels like vibe coding's been around for like five years, but it wasn't
even talked about until February, which is after this.
Think back to January.
Aside from chat TPT had a very early version of memory for some users.
But aside from that, LLM memory wasn't really a thing.
It wasn't.
People talked about it.
Some people talked about it, right?
I wasn't the only one.
But it obviously came true.
So OpenAI had a handful of big updates to their memory and chat history.
Google Gemini rolled out a memory upgrade or, you know,
memory to its users.
Claude just right there were the last of kind of the big three that finally rolled
out chat memory.
So yeah, now the three big front end LLMs in chat,
GPD, Gemini and Claude finally have memory, right?
Like I said, it was going to be a major focus and it obviously happened.
Also, context caching on the developer side has become a standard API feature,
whereas before wasn't really a thing, right?
Which is a kind of a, you know, context caching.
is a form of memory.
It's a form of,
you know,
not having to pay over and over
for things that have already been processed.
It's a,
more of a technical,
uh,
memory mechanism.
And then also,
um,
in with Gemini 3,
uh,
they introduce an,
uh,
architecture that allows for agents to recall user preferences
across months of sessions,
right?
So,
uh,
memory is taking shape and form in many different ways.
But,
um,
I mean, if you just look at the, the progress on the front end, it's been one of the biggest
things has been memory.
It's been using more of the context window in making more of it, if that makes sense, right?
In that context window, right, because outside of the context window, it doesn't really matter
much.
But in that context window and then being able to layer in memory, layer in, right, also Gemini
has a level of personalization that can work with your,
you know, your search history, right?
And even just as we get connectors, right?
And being able to work with that data, it's huge, right?
So I think that the context plus connectors plus memory was definitely the biggest
leap forward when it came to front end large language models in 2025.
All right, our top three last three predictions.
Here we go.
Number three, large language models.
models become small language models. Well, what I meant was my prediction there was that smaller,
efficient models would become more prominent and capable than January 2025's big models, right?
That's what I was alluding to. Essentially, I said, hey, in a year, there's going to be small
models that are way better than big models and people are going to be using small models a lot more
than they're using them now because, you know, back in January, people weren't really talking or
using small models. And in some cases,
there weren't a lot of them available, right? Open AI had their mini series, but now, I mean, Google, so let's get to the reality. The reality is, yes, it obviously happened. There's been a new category of models, right? Like as an example, Gemini, Gemini now has their flash, but then they have their flash light, which is an even smaller version of their kind of behemoth model, you know, the Gemini 3 Pro. But it's not just that. It's not just the, you know, the, the, the, the, the, the, the, the, the, the, the, the, the, the, the, the, the, the, the, the, the, the, the, the, the, the, the, the, the, the, the, the, the, the, the, the.
the GPT5 mini, the, you know, Gemini 3, Flash.
It's not even those.
Some of the models like GPTOSS, so opening eyes, open source model, a 20 billion parameter
model more powerful than GPT40, right?
So think about a year ago, give or take, GPT40 was the world's best model, right?
Now you have an open source,
20 billion parameter model
that when you look at
third party analytics, right?
Like artificial analysis,
it's better.
Okay, I'm going to do the math.
I'm going to do the math for those of us that aren't great at math.
Parameters, right?
Models are judged in their size by parameters.
For the most part, with closed proprietary models,
you don't know how many parameters they are.
With open source models, you do.
That's why with GBT OSS 20B, we know it's a 20 billion parameter model.
You can think of it like kind of like a hard drive, right?
According to most reports, the earlier versions of GPD4 were two trillion parameters.
So GPTOSS 20B, OpenAI's open source model was about 1% of the size of the model that was the most powerful model in the world about one year prior.
and it was better.
So, yeah, the models got smaller.
And then, I mean, there's a whole, I mean, the Phi series from Microsoft really good.
The Gemma series, Gemma 3 from Google, which we might see more of here in the next couple of days.
So keep your eye out, right?
Small language models, I've always been bullish on them.
I'm even more bullish today than I have ever been because that's the direction.
We've seen even domain-specific models from OpenAI with their science models,
Anthropic, with their financial models, right?
We don't talk about them a lot unless you're one of the few companies that use them.
But I've said this all along.
The future of large language models is using many small domain-specific models.
All right.
Number two, speaking of models, mixture of models becomes a thing.
A year ago, no one's talking about mixture of models.
We talk about mixture of experts, which is a little different.
But my prediction was that there would be systems that would orchestrate multiple models in parallel as a bundle.
So let me kind of explain the difference between mixture of models and mixture of experts.
And also, I even think of the time last year when I was building my prediction show, I don't even know if mixture of models was talked about.
I might have made up the mechanism, but now people are talking about it.
There's, there's other other words for it, right?
But mixture of experts are MOE, which a lot of people know.
And now there are people talking about mixture of models.
You know, there's different names for it.
But mixture of experts is essentially a single sparse model that uses a gating mechanism
to activate only a specific subset of parameters or experts for each unit.
Right.
So these big parameters, when you have a mixture of experts model, it only activates certain
parameters, right?
And that's how it can, uh,
You know, it can use a mixture of them faster and better, right?
Whereas a mixture of models is more of an ensemble of independent, fully active models that all process the same input, aggregating their distinct outputs to improve overall accuracy.
So this is another one that, man, I was, I was playing in the show, you know, last week.
And I'm like, there's not really, there hasn't really been one big story, you know, that someone,
went with this mixture of models and then Zoom did for all companies,
Zoom, right?
So yeah, the,
the video calling company, right?
So Zoom, uh, put together a essentially mixture of models, um,
approach and got the highest score on humanity's last exam,
but critics were not really happy and a lot of people are like,
I don't know if this counts because all they did, right?
So humanity's last.
exam. I'm not going to go into it because it'll take a long time to explain, but it's essentially a very
hard AI test, right? We'll just say that with questions that make models really think and reason.
The answers aren't in their training data, right? But usually you give one model. So what Zoom did is they
created essentially a mixture of models. So they didn't build a model. They didn't fine tune an open source
model, they didn't, you know, they just used off the shelf models, uh, kind of duct tape them
together and then they beat the best score. So yeah, they essentially use a mixture of models.
But there's a lot of, um, there's actually a lot of movement in this space.
Uh, you know, similar to model routing, but a little different, but Google's interactions API
from this month, actually, it now dynamically routes queries, uh, and shares the information
between Flash Pro thinking in real time.
So they're also using, it's not just model routing, right?
Because that's usually, you know, a router sends your prompt or your query to one.
This is a little different.
It shares it between all three.
So, you know, even Google's interactions API for developers is using kind of this mixture of models.
Also some other, you know, signals, the IDC reported in November that 70% of top AI-driven enterprises will use multi-model
architectures by 2028.
And another good example,
Aria,
or sorry, area,
their recent sponsor on this show,
they're a key player.
Essentially, they have a mixture of models architecture.
And they,
I mean,
they recently raised $100 million.
They're a newer company.
You know,
they had a $100 million round in September,
I believe.
So it is a space that is growing,
you know,
essentially not having to use
one model. I think this is in the future, it is going to become a much more commonplace than you
would think, right? Because I think the smartest companies are having a modular approach when it
comes to their AI strategy, right? Because sometimes, and I think we saw this with GPT5, right,
going from GPT40 to GPT5, for companies that maybe didn't have a modular approach, they had a
singular approach. If they were just using, you know, GPT4O, whether on the front end or the back end,
and maybe all of their, you know, processes with a big switch like that,
maybe, I mean, maybe some things work better.
You hope things work better, but a lot of things are going to work worse.
So that's why I think having a mixture of models approach is really smart.
That's what the smartest companies are doing.
All right.
And then last but not least, the big prediction.
I said in January, EGI is going to be achieved, but no one notices.
All right.
So my prediction was that AGI artificial general intelligence would arrive in 2025,
but daily life doesn't feel any differently.
And there's no bold proclamation of, you know, hey, AGI has been achieved.
You know, we're going to throw up the AGI flag on the federal buildings.
And, you know, now the humans taking that.
I think most people have said that AGI is just going to happen and no one's really going to notice.
So here's the.
reality of what happened. And I'm not going to fact check myself on this one too hard. I'm going to let
you. All right. So live stream audience, let me know, say AGI, yes, AGI, no, AGI maybe, right? Spotify
people in the comments. Let me know. I'm going to read them. But here's some of the reality.
So in 2025, large language models started surpassing top, the smartest humans in the world on almost
all elite thinking tests, right? So any test that humans take and
a single AI model takes.
AI models on almost all beat the smartest humans in the world.
Okay.
But I mean,
AGI still doesn't have one agreed meaning.
And even the optimistic voices note that today's AI systems can look brilliant on
some tasks and be like, wow, this is definitely AGI level.
But on other tasks, they can seem very dumb, right?
And let me be, let me be very honest.
A lot of that's a skill issue, FYI.
But one of the biggest things, a little dorkier, unless you're a math geek, right?
It was the IMO gold medal performance that was, I think, maybe the standout AGI moment example that a lot of people pointed to in 2025.
So what is the IMO?
That is the international mathematical Olympiad.
Essentially, it is one of the most difficult math competitions in the world.
and Open AI and Google DeepMind for the first time ever won gold.
They solved all the problems.
And then you had this week, you had Boris Power, the head of applied research at OpenAI.
So a lot of people would conclude that Boris is probably one of the smartest people in the world.
When it comes to AI and research, he said that winning the IMO gold was seen as an AGI level difficulty problem, right?
it happened, it passed.
And no one was like, all right, life changes, right?
I don't think that there ever will be a standard or agreed upon term of AGI.
Right.
But there's a lot of signs.
And I want to talk about two of them.
One would be the AI IQ test.
All right.
So this is from tracking AI.
org, a good resource.
So essentially, I have a on my screen here.
And if you ever want to see the video version, it's always on our website at your
everyday AI.com, but I'll describe it, right?
Essentially, tracking AI, they give offline IQ tests to large language models, right?
So the answers aren't in their training data.
And then they do, they do an offline and they do a Mensa Norway test.
Right.
So now it's at the point, right?
A year ago, uh, the models have an average, you know, an average, you know, an average.
IQ about the, you know, the average person has about an IQ of 100. You know, I think bright,
you know, there's different classifications. I guess a bright person is around 115. Gifted is in the
120s. And I think you get to a genius level IQ, I think between like 135 and 140. So now you have,
I mean, Gemini 3. Gemini 3 Pro and not too far behind GPT5 Pro thing.
thinking, GBT52 thinking and GPD 52 Pro, they're at the about 130 mark for these IQ tests.
So they are near genius level.
They're in the top 1.5% of humans.
Right.
Okay.
So again, I mean, AGI is not an IQ test, but if you know what you're doing, an AI model is
smarter than almost any human in the world.
And that's just, you know, normally those humans,
are smart in their one field.
All right.
And then I will throw out, right?
And I did a whole show on this.
I don't know, maybe a year ago.
I looked at older definitions.
I looked at definitions of artificial general intelligence from, you know, 2000,
2005, 2010, 2015.
And when you look at all those old definitions and where we're at today,
if you were just to look at those old definitions, we've achieved every single old
definition, right?
But as AI gets smarter, the goalposts keep moving.
So I don't know if.
AGI has been achieved, right?
But at what point will it be achieved?
I don't know.
I guess when we have a definitive definition, for me, I would say, yeah, probably, right?
And one reason why I'm going to show you here in a second.
But Sam Altman in 2018 in the Open AI charter defined AGI as AGI is by which we mean
the highly autonomous systems that outperform humans at most economically,
valuable work.
Let me repeat that last part again.
Outperform humans at most economically valuable work.
So the benchmarks have never really measured this.
Right.
And I think that AI companies may have gotten kind of smart at overfitting their models
to score really well on the benchmarks, right?
That doesn't always help them when it comes to human preferences like LM Arena.
Right.
But now there's a new, a new benchmark that's,
judged by humans.
And I'll probably do a dedicated show on this once because I think it's really interesting.
So there's a, this is an open AI.
I did talk about this a couple of days ago.
So this is an open AI benchmark called GDP Val.
Right.
So right now, their model, GP52, pro has the best score on this.
But when they released it in September, it didn't.
Right.
So a lot of people are like, oh, yeah, of course, opening I is going to have, right, the best
score on their own benchmark.
Well, when they released GDP Val, they didn't.
It was clawed four or five.
But GDP Val test models on real jobs with economic impact, right?
Exactly what Sam Altman's definition.
It's when autonomous systems can outperform humans at most economically valuable work.
So GDP Val test models on real job tasks from 44 jobs in nine big U.S. economic sectors.
So tasks use real work files like spreadsheets, slides, PDF images, and then they produce those actual products.
Right.
So previous versions, even of GPT5, weren't any good at producing spreadsheets or slides.
Well, now GPT52 thinking in pro, they really are.
So how this is, essentially, a human who is an expert in a field judges this.
and they compare the models work to that of a humans without knowing who did what, right?
So yeah, it's like, oh, hand in a spreadsheet.
Smart human, expert human, hands in a spreadsheet, a model, hands in a spreadsheet.
And then an expert who I think they said on average had about 14 years of experience.
So a domain expert judged the two ones.
And they didn't know who did what, right?
And then the scores are average to get the model's overall win rate.
And then grading, the human grader, bases it on usefulness, format, and quality.
And so here's an example, right?
This is an actual example from GDP file.
So in a retail task, so this is in retail, they're given a past sales and promo calendar
in Excel, and they have to produce a forecast and a one slide summary.
So then the graders compare the models spreadsheet and PowerPoint slide to the humans for
math, readability, assumptions, and presentations.
So what happened?
And well, let me first read Open AI's kind of definition here.
So they said GPT52 thinking is the best model yet for real world professional use on GDP
Val and in eval measuring well-specified knowledge work tasks across 44 occupations.
GBT52 thinking sets a new state of the art score and is our first model that performs at
or above human expert level.
Specifically, GPD 52 thinking beats or ties industry professionals on 70.9% of comparisons.
All right.
And then if you look at GPT 52 pro, the win tie rate is actually 74.1%.
Okay.
Let me repeat that.
On economically valuable work, judged by.
experts.
One model versus smart human.
They have to produce spreadsheets,
PowerPoints, writing, etc.
The AI did better
three-fourths of the time on
economically valuable work. Wait, what was that
definition? That Sam Altman,
the CEO of that same company said,
AGI, by which we mean highly autonomous
systems, yes, that outperforming
humans at most economically valuable work.
44 real jobs across nine major U.S. sectors and the AI model wins or ties three-fourths of the
time.
Hmm.
I don't know.
To me, my prediction that AI, AGI is achieved, but no one really notices.
Kind of seems like it came true.
But guess what?
As the models get better, we're just going to keep kicking the goal post.
But don't worry.
As the models get better,
we're still going to be here every single day keeping you up to date.
All right, that is a wrap on our 2025 AI roadmap rewind.
So volume two, like I said, if you miss, volume one, make sure to go check it out.
That's episode 674.
I hope this was helpful.
And yes, we are going to be announcing dates soon for our prediction series for 2026.
And we got something else special.
kicking off here real soon.
So if one of your personal goals or if one of your company's goals is to double down
and go all in on generative AI in large language models, trust me, we have something special
coming.
So if you haven't already, please make sure, number one, you're going to want to go subscribe
to the podcast if you haven't already.
So if you're listening on Apple or Spotify, thank you for that.
Number two, you're going to want to go to our website.
Trust me, your everyday AI.com.
Go sign up for the free daily newsletter.
we're going to be recapping today's show and a whole lot more.
Thank you for tuning in to our recap, our rewind series, 2025, even though it's not over.
It's been a great one.
Thank you for your support.
Thank you for your support.
Thank you for you back tomorrow and every day for more everyday AI.
Thanks all.
Meet Firefly AI assistant.
Now live in Adobe Firefly, the Allman One Creative AI Studio.
Just describe what you want to create in your own words and the assistant handles the rest,
orchestrating multi-step workflows across Adobe Creative Class.
apps, including Photoshop, Premiere Express, and more in one conversational interface.
You direct the outcome while the assistant accelerates execution.
Stand control with the ability to step in and refine at any time.
See it today at firefly.adobie.com.
And that's a wrap for today's edition of Everyday AI.
Thanks for joining us.
If you enjoyed this episode, please subscribe and leave us a rating.
It helps keep us going.
For a little more AI magic, visit Your Everyday AI.
and sign up to our daily newsletter so you don't get left behind.
Go break some barriers and we'll see you next time.
