The AI Daily Brief: Artificial Intelligence News and Analysis - 25 Agent Predictions for 2025 - Part 2
Episode Date: December 27, 2024PART 2: Agents are the most important trend in AI heading into the new year. NLW is joined by Nufar Gaspar to count down 25 predictions for AI agents in 2025. Nufar Gaspar is a seasoned AI expert and... leader with vast experience in incubating and growing AI products, verticals and communities. She is the Director of AI Everywhere and Gen AI for Intel Design, and consults and trains organizations and teams on the usage of AI and building AI products and companies. Brought to you by: Vanta - Simplify compliance - https://vanta.com/nlw The AI Daily Brief helps you understand the most important news and discussions in AI. Subscribe to the podcast version of The AI Daily Brief wherever you listen: https://pod.link/1680633614 Subscribe to the newsletter: https://aidailybrief.beehiiv.com/ Join our Discord: https://bit.ly/aibreakdown
Transcript
Discussion (0)
Today on the AI Daily Brief, part two of 25 agent predictions for 2025.
The AI Daily Brief is a daily podcast and video about the most important news and discussions in AI.
To join the conversation, follow the Discord link in our show notes.
Hello, friends, here we are back with part two of our 25 agent predictions for 2025.
It's not strictly required that you listen to part one first.
However, I would recommend it.
Once again, we are joined by Newfar Gaspar, the director of AI Everywhere and Gen.
for Intel design. Navaar brings the perspective of someone who has built AI products inside Intel
helped with broader AI transformation and things about these issues professionally and personally
all the time. In the second part, we talk about technology, as well as financial trends,
and close out with a big vision for where this is all headed. All right, and we are back once again
for part two of this conversation around 25 predictions for AI agents in 2025. We've talked about
all sorts of things. A lot of ground setting in part one. And now we're digging into some of the
more kind of discreet and specific technology predictions. Kicking off with number 14,
new custom cognitive architectures will enable better and safer agents. So what do you mean by
this? Yeah, so let's start by defining what a cognitive architecture is. It's basically a fancy
term for like a blueprint or a building for a blueprint for building an intelligent and
autonomous systems.
And you can think about it as designing the minds of the agents.
So maybe some of you have heard about agents over a year ago where auto-G-T and baby
AGI, those were the tools that everyone discussed, and they never took flight.
And the reason for that is that they were too general and unconstraints.
And thereby they had an unreliable performance.
And with the newest generation of agents, there were an introduction also of new
custom cognitive architectures by many individuals and companies, and those provided a lot of guardrails
or sometimes referred to as scaffoldings and frameworks for controlling these agents. And thereby,
with an improved memory and improved capabilities, those kept the agents much more focused on what
they are trying to do and prevent them from flying off these rails. And because they were so
successful in 2024 by bridging the gap of being too loose to getting to actual results.
There are so many labs and companies working to improve them even further.
And this will probably continue in 2025 and we will get even better results with that.
I actually want to bring in your 15th prediction because I think it's related and we can
discuss a little bit.
The development of new tools, frameworks, and conventions for agent development and management.
Right. So up until now, often we're using the same tools for a new technology. And with the rise of agents, we do need to have more dedicated tools for agent development. They should be explicitly designed for building agents in order to streamline and speed up the process of building these agents. Some of the focus areas will be on the application development. So we will for sure see more and more frameworks.
We also already have the Lange graph and other open source capabilities that we're seeing,
but more and more libraries and frameworks will probably emerge to help developers build the backbone
of these agents in order to make them more reliable, and also to orchestrate either between
agents within the same system or to orchestrate the relationship between agents talking to one
another, and we'll talk more about it in the next predictions.
The other area where there will be a lot of focus, in my opinion, is on the observability and the ability to test these agents.
They will help the developers be much more confident about whatever they're building or to debug their agents,
anticipate whatever things that needs to be improved, as well as the costs that are currently not that predictable.
And whenever we want to really understand or govern or provide visibility to our customers about what the
agent actually did, those observability will become even more critical as part of the development
building blocks that we will have. So my guess is that a lot of our listeners who come from
the enterprise or business world, a lot of those words that you just said would sound like
total Greek to them. How much do you think, how much understanding do companies that are thinking
about exploring agents and piloting agents, you know, for their companies, for their enterprises,
should they need to understand about all of this? Okay, so of course I'm a bit biased because I've been
among these Greek people who build AI capabilities for literally all of my career. So I am constantly
thinking about these things. And I do think that for the organizations that want to have a tailored
set of capabilities because improving the outcome of even with a fraction of percentage have a
bottom line implications, they will for sure have teams that are experts in building agents or
utilizing AI and they need to understand because they're in a position where they're not
fighting for the 80 percent, they're fighting for the additional 20 percent. So if you are from
a company that utilizes AI to create a competitive advantage that is very unique, you will
probably have to have people that understands that. For the early stages of agents, you will probably
be able to utilize out-of-the-box capabilities and you don't have to go down that specific rabbit
hole. I perceive that some of the listeners, even if they're currently not at this point, they might
want to get there eventually maybe later in 2025 or the years to follow. Yeah, so that's where
I land on this. I think that there's going to be plenty to experiment with next year that is very
point-and-click. You know, there will be some amount of engagement. In fact, what you're
seeing a lot of agent companies do the forward deployed engineer thing where they're actually
embedding a developer inside companies to help customize agents for their particular data set and
their particular environment. Sierra is doing this and others are as well. And so there will be a lot
of support, I think, for those initial pilots and deployments. And so I don't think that the
lack of understanding of this should be an a priori barrier for digging in. However, I also think
that the more that there is some amount of institutional understanding around these topics,
and particular an ability to assess or at least have the right support to figure out and assess
where the current agents that are being tested are deployed sit relative to new capabilities
that are coming online and what's likely to happen in the future, the better organizations
will be able to make good strategic decisions. I think that the challenge is that this is going to be
such a fast evolving landscape of solutions that it's not really going to be as clean as,
you know, we piloted an agent, we liked it, and so we deployed it, and then cool, we've got our
agent figured out. It's very likely that that's a process that's going to be, you know,
continuously reinterpreting and retrying things as capabilities improve and as competition,
you know, expands the boundaries of what's possible. So building a learning organization that can
actually understand this on a deeper level is going to be, I think,
pretty essential.
Yeah, and even if you just buy, the ability to define the right requirement for the vendor
will probably have you at least talk the talk to some extent.
Okay, number 16, growth in the number and practicality of multi-agent systems.
Okay.
So this is an exciting one.
Again, you don't have to be scared about the technological aspect, but just a brief explanation
of what a multi-agent system.
These are systems where we have several AI agents working together to accomplish a goal.
And typically each agent has a specific role.
They will often act just like a cross-functional project team.
So that's the best analogy that there is.
And in many cases, people who are building these agents will really give each agent like a title that really seems like a job title.
If you want a concrete example, for a coding task, you might have one agent that writes the code,
another agent that test the code, another that debugs it, and so on.
And eventually, the overall code functionality can be even better
by having a well-defined set of AI agents working together
if they're built properly.
But it's not easy to build a multi-agent system
because this is where you have to really have a good understanding of the agents
or if you will be using frameworks or other capabilities that will enable you,
the ability to build multi-agent system,
it will become much more prevalent probably during 2025 and beyond.
And because the analogy to real teams working
and because we're already seeing some very promising results for multi-agent systems,
there will be more industry confidence,
and we will see more and more of them in 2025.
This is a really interesting area.
I wouldn't be surprised if when the dust settles,
it's only really when multi-agent systems become the norm
that enterprises really start to see value,
or at least big scalable value.
The reason for that being that, you know,
if you're asking, you know, right now we have
sort of a correlation between how specialized an agent is and how likely to perform it is.
But that makes it a very discrete set of tasks that tend to be very narrow where you can
kind of deploy these things right away. The multi-agent systems are going to be where you can
get more customizable and you can sort of ask for more complex things. And so I think when people
are really imagining in their mind's eye all that agents could do, they're probably in many
cases actually imagining multi-agent systems, even though that won't necessarily where we begin the,
where we begin the year. Yes. And also, they're like humans, right? If you try to get an agent
to do too many things at once, it will get confused. And thereby the multi-agent system,
even for the sometimes the smaller use cases, if we were able to nail them, they will probably
get us to better, more accurate results. Okay, number 17, more focus.
on multimodal abilities of agents.
Okay, also very exciting one in my opinion.
Because when we're talking about AI agents,
we're talking about things that will have to perform tasks
and have good sensing and understanding almost like humans.
And in order to do that,
we will have to have more ability of these agents
to have like multimodal perception of the environment,
whether they will be processing video, audio, images,
whether they will be controlling the computer and so on,
all of these amount to something that is very exciting.
The most exciting thing that I've seen recently is Google's Project Astra.
I've seen some demos and some testimonials of people who use that.
And it's a great example of where you have a model that is able to perceive the environment
using video and interact with you and literally be like your eyes and ears in a real environment.
And I think more exciting even is the possibilities for people who are with some kind of a disability to have these agents work for them.
I know that we're very focused on the enterprise, but this is a consumer use case that I'm very excited about.
And even for the enterprise, you can think of having a much more robust assistant that has all of these senses working simultaneously to help you.
Yeah, I think that this is one of the areas that's been really notable to me.
even in the last couple of weeks, we got an update in Project Astra, and we also got as part of
Open AI's 12 days of Shipmiss, advanced voice mode with vision. And I think that we are still
underestimating how different the modality will be of when the normal way that we interact with
AI is it having the same visual and auditory context for the world around us that we have.
It's very hard, I think, for most people, myself included, to bring.
break out of thinking about it as a thing that exists in a computer that you write to,
you know, or maybe you speak to. But I think that we're going to just see a gradual shift over
time that opens up totally, not just totally new use cases, but I think a fairly fundamental
different understanding of what these tools actually do for us. All right. Number 18, more
academic and open source brain power will be devoted to agenic research, which should further
accelerate development.
Right.
So I mentioned that
in the previous conversation,
but I've been working in AI
for many years now,
and I'm still amazed by
what happened over the last two years.
And I think thinking about
what created all of these
capabilities beyond some
specific technological improvements
is the fact that so many smart
people all over the world
is literally focusing on one domain,
on one problem.
And I believe that agents
will enjoy the same thing
with so much hype and attention
we will just be able to
get so much more
and with so much
brain power coming from all directions
whether these are open source or academy
or industry
the exponential curve will continue
and we will all be probably
both excited, scared
and utilizing all of these
technologies much more because of all of that.
You know it's kind of ironic
but interesting. I actually think
the fact that pre-training as a scaling methodology seems to be plateauing or at least running
into some limits will only increase how much of that energy and brain power goes to agents
and applications and expressions instead of just thinking about raw capabilities enhancement
of the underlying LLMs. It was interesting. So on the Dwar Keshe podcast, I don't know, a while
ago now, maybe three months, six months, something like that. Francois Chalet basically said
that he thought that OpenAI had actually set back AGI, which is fascinating. And his argument was
that once ChatGPT hit, everyone just switched to thinking about and focusing on LLM architectures and not doing
anything else. And now that we're running into some limits in terms of getting kind of the
next level of capabilities, although who knows if that's actually true given 03, I think that
there's going to be just even more fertile realms of experimentation on different
ways to pull capabilities out of the tools that we have.
Yeah, but I'm not sure whether it's the slowdown or the natural progression towards
inference time reasoning.
You know, the cynics will say that because they can't give us good enough results in scaling,
then the hypers have all shifted to agents.
But I'm not sure.
Maybe it's because like you and I are seeing the potential of agents.
That's why they're so excited and are working on that.
as much. And maybe they have some good stuff installed for us in the, let's call them,
regular LLMs, because they are all claiming, maybe aside from Ilyas Suskeber,
they are all claiming that we're not there yet in terms of scaling completely like a slowdown.
So it seems like also a marketarial discussion and then not just the technological discussion.
Yeah, I agree. So speaking of this, number 19,
new interfaces, standards and protocol will emerge, an agent computer interface.
Right.
So, you know, we all were very excited when Anthropic first introduced the computer use.
Everyone rushed into experiment with that, and it really sounded like the true beginning
of something major.
And then everyone quickly realized that it's much more cumbersome, expensive, and not very
accurate.
And I'm not sure whether this is the right approach.
like do we want agents to control the computer like humans do?
Or in fact, because agents will be doing so much work on the computer,
there will be a new need for an interface for these agents to control a computer.
And moreover, because there will be so many agents working together,
then there will be a need for new APIs, new protocols for how to communicate.
between agents to agents, as well as perhaps being much more literal about how we write stuff
because agents can't read between the lines like humans often do.
Maybe your error messages have to be machine readable versus human readable and so many other things.
So that will also, I believe, will be a huge focus.
And an interesting part will be whether all of these different players will be able to get
to an agreement between them or we will get to a point where everyone blocks one another
with different protocols
rather than being open-ended
and letting other companies,
agents operate on your data.
And I'm not sure whether
all of these websites
will let agents call and do stuff on them
or will we see an economy of blocking each other
where essentially they're telling you the end user
that if you want to do this action,
you have to use our agent
because we will block your agent from doing that
on our data or our tools.
My guess is it proceeds sort of similarly to how most versions of this have, which is initial
balkanization and attempt to capture value that ultimately loses out to open protocols and
standards that underlie things because there's just too much efficiencies to be had.
If it's anything like the way that the Internet has developed in other areas, I definitely think
that this is going to be a big part of the next few years is those sort of subaltern kind of battles
happening. Let's hope that the open end will win for the sake of all of us because it will be a
better economy in my opinion. Number 20, a lot of investment in creating agent-oriented benchmarks.
Okay. So how do you measure an agent's performance? Is it only when it arrived at its final
destination? Sometimes we don't even know the final destination, so it's very hard to measure that.
And we have seen some recent emerging benchmark that are trying to be more open-ended like agents are
and try to pose a set of evaluation questions that will require these multi-step reasoning
and open-ended thinking that an agent will have.
Two concrete examples, the SWI, Software Engineering benchmark,
that tries to let an agent or an AI have multiple human-like software engineering tasks
and measure how well they perform in that.
And there is also an interesting benchmark of research engineering
where the agent need to basically do the AI research
that a human expert would do.
So these are two interesting benchmarks that are emerging.
And I believe that we will see more and more
because the existing methods for evaluating the LLMs
are not suitable for agents.
They often look at the bottom line
and are not really indicative of how well the agent performed,
especially if you want to open the black box and see the multiple reasoning steps
that the agent has done in order to get to the result.
So we will see more of those, and rightfully so,
because as we talked before, there will be so many competing offering.
And aside for maybe experimenting ourselves,
it's going to be very difficult to assess how well they're doing
if we will only use the existing benchmarks.
Yeah, I completely agree with this.
think there's going to be a highly functional set of benchmarks necessary. Again, just thinking about
it strictly from the standpoint of the enterprise. So as we are thinking about how to recommend
in Agent X versus Agent Y for some specific purpose that we've with an enterprise determined is a
great place to start experimentation, the types of things that would be valuable for us to know
are exactly the types of things that you were just mentioning that there currently aren't benchmarks
for. So, for example, how many times in the process,
of completing the task at hand is the agent likely to need guidance from humans.
You know, a one on that is very different than five on that, right?
The value proposition is totally different based on that.
That's like that whatever that score is called is a score that I would like to see, you know,
as relates to making decisions around agents.
So I think that you're right that there's going to be a lot of exploration here.
And it won't just be pure technical benchmarks.
I think those will be highly functional and related to actual.
usage as well.
Yes, for sure.
Number 21, the emergence of agent-oriented LLMs to serve as underlying models.
Right.
Again, maybe something a little bit more controversial, but this is my opinion and feel
free to weigh in yours.
But I believe that unlike the traditional LLMs that are very much designed for a broad
natural language tasks or sometimes image videos and so on, the LLMs that are more oriented
towards agents will be more.
more purpose build for powering those autonomous activities that the agents will need to do.
And, you know, open AI is 01 and now 03 and the likes.
They are a good step in this direction of having LLMs that are more geared for agent reasoning.
We can and we will probably also see more of these models created and used some concrete
explanation. So they might not be better in the general benchmarks because they don't have to be
smart in everything like we are benchmarking our existing Open AI and other models,
but we want them to be more suitable for agents. So maybe they will prioritize the multi-step reasoning.
Maybe they will prioritize the long-term memory or maybe they will be very smart about retaining
very good context or enabling the agents to be more thoughtful in the way they plan and the
way they make decisions.
And while we see these LLMs being specializing, we might even see a mix and match where
even one single agent will use different models for the different steps of performing its
task.
So maybe it will use the O3 model for the initial planning and then it will use a smaller model
for doing its ongoing tasks as part of the overall flow.
And I believe that eventually we will see a very hybrid approach
where some of the models that are used are smaller, cheaper, faster.
Some of them are smarter, and the best engineering practices
will be around finding the right models
and using the ones that were probably tailored the most,
even for not only the overall agent concept,
even for your specific vertical, we might even see those emerging.
Yep. I don't have much to add. I think this is absolutely going to happen. I think that the more
that we get sophisticated around what gets better performance, I think, and I think that there's
there's going to be cost incentives to do this experimentation, if nothing else, right? The fact that
the sort of highest state-of-the-art intelligence is still very expensive means that there's a lot
are reasons to try to ring more value out of other models and other approaches. And so I think
we're just going to see tons and tons of this sort of customization. Today's episode is brought to you
by Vanta. Whether you're starting or scaling your company's security program, demonstrating
top-notch security practices, and establishing trust is more important than ever. Venta automates
compliance for ISO-2-GDPR and leading AI frameworks like ISO-402 and NIST AI risk
management framework, saving you time and money while helping you build customer trust.
Plus, you can streamline security reviews by automating questionnaires and demonstrating your
security posture with a customer-facing trust center all powered by Vanta AI.
Over 8,000 global companies like Langchain, Lila AI, and factory AI use Vanta to demonstrate
AI trust and prove security in real time.
Learn more at vanta.com slash NLW.
That's vanta.com slash NLW.
If there is one thing that's clear about AI in 2025, it's that the agent
are coming. Vertical agents by industry, horizontal agent platforms, agents per function.
If you are running a large enterprise, you will be experimenting with agents next year. And given how
new this is, all of us are going to be back in pilot mode. That's why Superintelligent is offering
a new product for the beginning of this year. It's an agent readiness and opportunity audit.
Over the course of a couple quick weeks, we dig in with your team to understand what type of agents
makes sense for you to test, what type of infrastructure support you need to be ready, and to
ultimately come away with a set of actionable recommendations that get you prepared to figure out
how agents can transform your business. If you are interested in the agent readiness and opportunity
audit, reach out directly to me, NLW at B-Super.a.I. Put the word agent in the subject line so I know
what you're talking about. And let's have you be a leader in the most dynamic part of the AI market.
All right. And now we get to our last section investment in media hype. Number 22, this is
probably your safest prediction. Significant VC dollars will be invested in agentic companies.
Yes. So probably everyone who wants some funding or take a good care of their stock will have
to say agent. And this can also be a fun drinking game. Each earning call, how many times each
CEO will say agent. And you mentioned in the previous conversation, the Y Combinator team
saying that vertical agents will be 10x bigger than SaaS.
That created a lot of headlines,
and we know that other VCs are also already on board
with the technology trend,
and this for sure will continue in 2025,
and thereby there will be many newly founded startups and companies,
but also many companies that will add anogenic offering
or pivot towards an hygienic offering,
and some of it rightfully so,
some of it, the natural evolution and progression of things will probably not have them in
the history books as the companies that yielded a lot of value from that.
I think that this is absolutely true. It's already happening. Certainly, you know,
this has been a major theme with venture recently. A couple of things that I think are interesting
to watch for that should tell us how this is evolving. One is exactly what you just called
out to the extent that AI gets supplanted or
supplemented with agent mentions in earnings calls and things like that. That'll be very telling.
But two, one of the things that I think will happen is that lots and lots of companies and
startups will not for the sake of funding, but just because they realize that an agent can do
something discreet and unique for them, accidentally start building agents on top of or as part
of or as replacing their existing offering. We've had this process. So superintosh.
Deliverance delivers AI, you know, support, you know, enablement as a team, as a self-serve platform,
and now increasingly as an agentic offering.
And that wasn't a money-chasing thing.
It was because we realized there are things that we could do with agents to scale ourselves
that we couldn't do any other way.
And I think lots and lots of companies are going to, you know, stumble into experiments
next year where building with agents actually unlocks totally new possibilities.
that they haven't seen before.
So this might be one of those rare VC themes.
There's enough there there to justify all of the excitement and the capital that flows in.
Yeah, I believe there is, but you have to be smart about what you're building and for what reasons.
Yeah.
I think I would much more so than speaking to the investors, I think the reminder for me when it comes to builders is it's always a bad,
it's often, if not always, a bad choice to just chase the trends in what VCs are looking for
rather than making the right decision for whatever company you're trying to build and whatever
problem you're trying to solve. However, I would say I would caution against explicitly not
looking in the way of agents because you believe it's overhyped and just sort of a VC thing.
I think there's going to be lots of opportunities to build there that are going to be really
fun and meaningful until the summer, of course, because number 23, come summer there will be a
media debate about whether agents were overhyped and whether development is slowing down.
So, you know, the existing challenges might not be resolved, and we mentioned many of them,
as we discussed in these two episodes, but new challenges will probably emerge and reality
will meet the currently probably overhyped and overinflated media expectations.
And fortunately, the media will also be the ones probably during the summer where the news
cycle subsides that will take upon itself to deflate the bubble and tell us all how agents
were mostly hyped and are not delivering to promise.
And what we predicted that comes fall, we will meet reality.
And the reality, at least in my opinion, is that agents will continue to yield a lot of value.
And bottom line here, from my perspective, is that while
the news cycles will come and go, and we will see many headlines saying that agents are
not what they promised to be.
They will be, and it might, the only caveat is that it might take us slightly longer than
anticipated, and it might be a little bit harder than anticipated, but the value is there
and will continue to be there, at least in my opinion.
Yeah.
So in the summer of 2023, the version of this was that Chat Chapti had its first down month
in June of 2020.
And that was the context for all of these pieces.
And then in this year, of course, it was the Goldman Sachs, too much money, too little value,
and the Sequoia, $600 billion question posts that created the whole discussion.
And so there does seem to be a trend where summers generate, you know, kind of a fud cycle around AI.
Interestingly, part of why this one's going to be extra funny when it happens is that agents have actually been the most
hyped things since chat GPT launched. If you go back to April 2023 when the AI Daily Brief
was just starting, the thing that everyone was talking about was auto GPT and baby GPT.
And it was agents from from then. And so it'll be very funny to see that we're actually
have, you know, a discrete set of perhaps very specific, you know, kind of single purpose agents
deployed. And yet, you know, the narrative might be that it's disappointing. But I agree
with both the likelihood that there will be that hype cycle or that anti-hype cycle and also the
reality that it is incorrect ultimately.
Yeah, let's play this tape once we're there to prove that we predicted that.
Number 24, agents will be intertwined and accelerate AGI discussions.
Okay.
So when I created this prediction, it was before last week, last day of OpenAI, 12 days of
shipmess.
And for those of you who might have already went into vacation hibernation, so Open AI literally shocked
as yet again last week when they announced the O3 model because they said that it surpassed the
human level in the ARC benchmark.
And the ARC benchmark is a benchmark that was created specifically to evaluate the AI
system's ability to generalize and solve problems that prove that it's AGII worthy.
and up until last week, the best performing model was very, like, I think, low 20s or low 30s.
I don't remember the number, but Open AI with their O3 model have surpassed human ability.
And I think even more so the discussions around Are We Dariet will reignite in early 2025.
And, you know, with all of these agenic discussions, we need to ask our,
what's the relationship
because if agents
demonstrate an increasing
autonomous behavior
and they're utilizing O3
in the background that already
surpassed some human benchmarks
in AGI, the lines will become really, really blurred
and the debate will probably go further
about are we there yet with AGI?
And I think during the year,
as we will see more and more impressive,
use cases of agents come to fruition.
Some of these discussions might be even relevant.
However, I have to say, first of all, that bottom line, I don't think that 2025
will be the year of AGI, even with agents and all this intertwined relationship.
And I also don't think that it really matters.
I think, like I said before, what matters is the outcome or the results.
and agents will yield good results at 2025
and will have a lot of potential of having human-like abilities
in many, many different tasks,
but I'm not sure whether it will move the needle as much
or will it matter beyond some financial
and some specific companies that have the incentive
to say AGI is here.
Absolutely.
I think AGI ultimately matters insofar
as it's deployable to change,
the way things actually happen, right? And so I think that that's why it will get caught up or
connected to the agent conversation is that agents are going to be a lot of where the next frontier
of the state of the art goes to get deployed when it comes to AI. For what it's worth,
Francois-Lay again, who was the creator or the progenitor of the ARC prize, he was, he tweeted
about whether this meant that 03 was AGI. And what he said was, while the new model is very
impressive and represents a big milestone on the way towards AGI. I don't believe this is AGI.
There's still a fair number of very easy ARC AGI 1 tasks that O3 can't solve, and we have early
indications that ARC AGI2 will remain extremely challenging for O3. This shows that it's still
feasible to create unsaturated, interesting benchmarks that are easy for humans, yet impossible
for AI without involving specialist knowledge. We will have AGI when creating such evals
becomes outright impossible. So even though there's a huge discussion right now,
At least the guy behind that particular benchmark doesn't think we're there yet.
But I do think that you're right to call out.
Certainly, this has been the big discussion over the last few days.
You know, we're recording this on Monday, December 23rd.
And it's been pretty much all anyone's been talking about for the weekend.
But actually, when push comes to shove, you see this happen over and over again on Twitter slash X.
You know, someone will start with a debate around whether this is AGI.
And then it'll quickly get to, well, it doesn't really so much matter.
What matters more is, you know, does this mean software developers are cooked?
does this mean, you know, different job roles are totally going to change. And so I think that
it's going to be all about that practice that really matters. And again, that's why, you know,
agents are going to be such a big part of the story. However, according to number 25,
there will be an even bigger part of the story a little bit down the line. So 2026 will be
even bigger for agents than 2025. Right. So I think it came across in multiple times during
this conversation that this is where we will see the beginning of the exponent.
And of course, if everything that we just discussed will happen, 2025 will be an amazing year
with a massively forward in agents and humanities progression overall.
But it's just the beginning.
And I believe that 2026 and probably a few years after will be the years where many of
these learnings and development and whatever we learn from this, you called it the pilot year
or the year where more and more people put their hands on agents,
this is where we will yield the big promise of Gen A.I.
And that's why I'm so excited.
You asked me at the beginning why I'm so excited about agents.
It's what will happen in 2025, 2026 and beyond that will get us all to be amazed
about how work and life were before this era.
Yeah.
So I agree with this.
And I would go a step farther.
So I think that in 2026, enterprises will, that'll be the first year that enterprises
meaningfully and regularly have agents deployed just in the normal course of their,
of their workforce, right?
It'll be a hybrid human agent workforce will be increasingly the norm.
Not the norm, but more and more will be, it'll be normal to see that as part of certain
functions.
I think that it'll be highly focused on particular functions to begin, but I think that
it'll be fairly normal in 2026, to be having agents deployed at scale across certain functions.
And so the implications of that are that you have to use 2025 to figure out which those functions are,
how you integrate them with your systems, how you build the new systems around them that you need.
And that's going to take a ton of work and experimentation.
Obviously, this is what super intelligent is positioning to help people for.
This is why we're doing these readiness audits.
It's why we're supporting, you know, agent deployment.
It's why we're helping companies build systems for ongoing AI transformation.
2025 is going to be an incredibly important inflection year that is really going to push enterprises
to build the systems that allow them to actually take advantage of this in 2026 and beyond.
And I think that the implications of that are that you really will start to see, especially in
2026 and beyond a clear breakout of companies that have built these systems and have the capability
who have gone through AI transformation and who have this system set up to continue AI transformation,
they will start to break out from the pack in very meaningful ways in a way that hasn't even
happened yet. So I think it's going to be very, very exciting. And I think that this year,
this year will be very fun because the stakes will be high, but still there's lots and lots of room
to do things that don't work and to, you know,
wander down paths that don't lead anywhere.
That won't be the case for very much longer.
It's going to be a fun year for sure.
All right.
Well, Nufar, thank you so much for hanging out.
This is a super fun conversation.
We don't have anything quite yet to announce,
but for anyone who did like this,
keep an eye closely tuned, or an ear, I guess, closely tuned to this,
as we might have some interesting announcements coming up.
But hope that you have a very fun and non-agentic holiday, everyone.
And we'll see you in 2025.
