The AI Daily Brief: Artificial Intelligence News and Analysis - 25 Agent Predictions for 2025 - Part 2

Starting point is 00:00:00 Today on the AI Daily Brief, part two of 25 agent predictions for 2025. The AI Daily Brief is a daily podcast and video about the most important news and discussions in AI. To join the conversation, follow the Discord link in our show notes. Hello, friends, here we are back with part two of our 25 agent predictions for 2025. It's not strictly required that you listen to part one first. However, I would recommend it. Once again, we are joined by Newfar Gaspar, the director of AI Everywhere and Gen. for Intel design. Navaar brings the perspective of someone who has built AI products inside Intel

Starting point is 00:00:35 helped with broader AI transformation and things about these issues professionally and personally all the time. In the second part, we talk about technology, as well as financial trends, and close out with a big vision for where this is all headed. All right, and we are back once again for part two of this conversation around 25 predictions for AI agents in 2025. We've talked about all sorts of things. A lot of ground setting in part one. And now we're digging into some of the more kind of discreet and specific technology predictions. Kicking off with number 14, new custom cognitive architectures will enable better and safer agents. So what do you mean by this? Yeah, so let's start by defining what a cognitive architecture is. It's basically a fancy

Starting point is 00:01:22 term for like a blueprint or a building for a blueprint for building an intelligent and autonomous systems. And you can think about it as designing the minds of the agents. So maybe some of you have heard about agents over a year ago where auto-G-T and baby AGI, those were the tools that everyone discussed, and they never took flight. And the reason for that is that they were too general and unconstraints. And thereby they had an unreliable performance. And with the newest generation of agents, there were an introduction also of new

Starting point is 00:01:58 custom cognitive architectures by many individuals and companies, and those provided a lot of guardrails or sometimes referred to as scaffoldings and frameworks for controlling these agents. And thereby, with an improved memory and improved capabilities, those kept the agents much more focused on what they are trying to do and prevent them from flying off these rails. And because they were so successful in 2024 by bridging the gap of being too loose to getting to actual results. There are so many labs and companies working to improve them even further. And this will probably continue in 2025 and we will get even better results with that. I actually want to bring in your 15th prediction because I think it's related and we can

Starting point is 00:02:47 discuss a little bit. The development of new tools, frameworks, and conventions for agent development and management. Right. So up until now, often we're using the same tools for a new technology. And with the rise of agents, we do need to have more dedicated tools for agent development. They should be explicitly designed for building agents in order to streamline and speed up the process of building these agents. Some of the focus areas will be on the application development. So we will for sure see more and more frameworks. We also already have the Lange graph and other open source capabilities that we're seeing, but more and more libraries and frameworks will probably emerge to help developers build the backbone of these agents in order to make them more reliable, and also to orchestrate either between agents within the same system or to orchestrate the relationship between agents talking to one another, and we'll talk more about it in the next predictions.

Starting point is 00:03:49 The other area where there will be a lot of focus, in my opinion, is on the observability and the ability to test these agents. They will help the developers be much more confident about whatever they're building or to debug their agents, anticipate whatever things that needs to be improved, as well as the costs that are currently not that predictable. And whenever we want to really understand or govern or provide visibility to our customers about what the agent actually did, those observability will become even more critical as part of the development building blocks that we will have. So my guess is that a lot of our listeners who come from the enterprise or business world, a lot of those words that you just said would sound like total Greek to them. How much do you think, how much understanding do companies that are thinking

Starting point is 00:04:44 about exploring agents and piloting agents, you know, for their companies, for their enterprises, should they need to understand about all of this? Okay, so of course I'm a bit biased because I've been among these Greek people who build AI capabilities for literally all of my career. So I am constantly thinking about these things. And I do think that for the organizations that want to have a tailored set of capabilities because improving the outcome of even with a fraction of percentage have a bottom line implications, they will for sure have teams that are experts in building agents or utilizing AI and they need to understand because they're in a position where they're not fighting for the 80 percent, they're fighting for the additional 20 percent. So if you are from

Starting point is 00:05:33 a company that utilizes AI to create a competitive advantage that is very unique, you will probably have to have people that understands that. For the early stages of agents, you will probably be able to utilize out-of-the-box capabilities and you don't have to go down that specific rabbit hole. I perceive that some of the listeners, even if they're currently not at this point, they might want to get there eventually maybe later in 2025 or the years to follow. Yeah, so that's where I land on this. I think that there's going to be plenty to experiment with next year that is very point-and-click. You know, there will be some amount of engagement. In fact, what you're seeing a lot of agent companies do the forward deployed engineer thing where they're actually

Starting point is 00:06:21 embedding a developer inside companies to help customize agents for their particular data set and their particular environment. Sierra is doing this and others are as well. And so there will be a lot of support, I think, for those initial pilots and deployments. And so I don't think that the lack of understanding of this should be an a priori barrier for digging in. However, I also think that the more that there is some amount of institutional understanding around these topics, and particular an ability to assess or at least have the right support to figure out and assess where the current agents that are being tested are deployed sit relative to new capabilities that are coming online and what's likely to happen in the future, the better organizations

Starting point is 00:07:08 will be able to make good strategic decisions. I think that the challenge is that this is going to be such a fast evolving landscape of solutions that it's not really going to be as clean as, you know, we piloted an agent, we liked it, and so we deployed it, and then cool, we've got our agent figured out. It's very likely that that's a process that's going to be, you know, continuously reinterpreting and retrying things as capabilities improve and as competition, you know, expands the boundaries of what's possible. So building a learning organization that can actually understand this on a deeper level is going to be, I think, pretty essential.

Starting point is 00:07:46 Yeah, and even if you just buy, the ability to define the right requirement for the vendor will probably have you at least talk the talk to some extent. Okay, number 16, growth in the number and practicality of multi-agent systems. Okay. So this is an exciting one. Again, you don't have to be scared about the technological aspect, but just a brief explanation of what a multi-agent system. These are systems where we have several AI agents working together to accomplish a goal.

Starting point is 00:08:19 And typically each agent has a specific role. They will often act just like a cross-functional project team. So that's the best analogy that there is. And in many cases, people who are building these agents will really give each agent like a title that really seems like a job title. If you want a concrete example, for a coding task, you might have one agent that writes the code, another agent that test the code, another that debugs it, and so on. And eventually, the overall code functionality can be even better by having a well-defined set of AI agents working together

Starting point is 00:08:57 if they're built properly. But it's not easy to build a multi-agent system because this is where you have to really have a good understanding of the agents or if you will be using frameworks or other capabilities that will enable you, the ability to build multi-agent system, it will become much more prevalent probably during 2025 and beyond. And because the analogy to real teams working and because we're already seeing some very promising results for multi-agent systems,

Starting point is 00:09:34 there will be more industry confidence, and we will see more and more of them in 2025. This is a really interesting area. I wouldn't be surprised if when the dust settles, it's only really when multi-agent systems become the norm that enterprises really start to see value, or at least big scalable value. The reason for that being that, you know,

Starting point is 00:10:02 if you're asking, you know, right now we have sort of a correlation between how specialized an agent is and how likely to perform it is. But that makes it a very discrete set of tasks that tend to be very narrow where you can kind of deploy these things right away. The multi-agent systems are going to be where you can get more customizable and you can sort of ask for more complex things. And so I think when people are really imagining in their mind's eye all that agents could do, they're probably in many cases actually imagining multi-agent systems, even though that won't necessarily where we begin the, where we begin the year. Yes. And also, they're like humans, right? If you try to get an agent

Starting point is 00:10:45 to do too many things at once, it will get confused. And thereby the multi-agent system, even for the sometimes the smaller use cases, if we were able to nail them, they will probably get us to better, more accurate results. Okay, number 17, more focus. on multimodal abilities of agents. Okay, also very exciting one in my opinion. Because when we're talking about AI agents, we're talking about things that will have to perform tasks and have good sensing and understanding almost like humans.

Starting point is 00:11:20 And in order to do that, we will have to have more ability of these agents to have like multimodal perception of the environment, whether they will be processing video, audio, images, whether they will be controlling the computer and so on, all of these amount to something that is very exciting. The most exciting thing that I've seen recently is Google's Project Astra. I've seen some demos and some testimonials of people who use that.

Starting point is 00:11:49 And it's a great example of where you have a model that is able to perceive the environment using video and interact with you and literally be like your eyes and ears in a real environment. And I think more exciting even is the possibilities for people who are with some kind of a disability to have these agents work for them. I know that we're very focused on the enterprise, but this is a consumer use case that I'm very excited about. And even for the enterprise, you can think of having a much more robust assistant that has all of these senses working simultaneously to help you. Yeah, I think that this is one of the areas that's been really notable to me. even in the last couple of weeks, we got an update in Project Astra, and we also got as part of Open AI's 12 days of Shipmiss, advanced voice mode with vision. And I think that we are still

Starting point is 00:12:39 underestimating how different the modality will be of when the normal way that we interact with AI is it having the same visual and auditory context for the world around us that we have. It's very hard, I think, for most people, myself included, to bring. break out of thinking about it as a thing that exists in a computer that you write to, you know, or maybe you speak to. But I think that we're going to just see a gradual shift over time that opens up totally, not just totally new use cases, but I think a fairly fundamental different understanding of what these tools actually do for us. All right. Number 18, more academic and open source brain power will be devoted to agenic research, which should further

Starting point is 00:13:26 accelerate development. Right. So I mentioned that in the previous conversation, but I've been working in AI for many years now, and I'm still amazed by what happened over the last two years.

Starting point is 00:13:38 And I think thinking about what created all of these capabilities beyond some specific technological improvements is the fact that so many smart people all over the world is literally focusing on one domain, on one problem.

Starting point is 00:13:54 And I believe that agents will enjoy the same thing with so much hype and attention we will just be able to get so much more and with so much brain power coming from all directions whether these are open source or academy

Starting point is 00:14:10 or industry the exponential curve will continue and we will all be probably both excited, scared and utilizing all of these technologies much more because of all of that. You know it's kind of ironic but interesting. I actually think

Starting point is 00:14:26 the fact that pre-training as a scaling methodology seems to be plateauing or at least running into some limits will only increase how much of that energy and brain power goes to agents and applications and expressions instead of just thinking about raw capabilities enhancement of the underlying LLMs. It was interesting. So on the Dwar Keshe podcast, I don't know, a while ago now, maybe three months, six months, something like that. Francois Chalet basically said that he thought that OpenAI had actually set back AGI, which is fascinating. And his argument was that once ChatGPT hit, everyone just switched to thinking about and focusing on LLM architectures and not doing anything else. And now that we're running into some limits in terms of getting kind of the

Starting point is 00:15:13 next level of capabilities, although who knows if that's actually true given 03, I think that there's going to be just even more fertile realms of experimentation on different ways to pull capabilities out of the tools that we have. Yeah, but I'm not sure whether it's the slowdown or the natural progression towards inference time reasoning. You know, the cynics will say that because they can't give us good enough results in scaling, then the hypers have all shifted to agents. But I'm not sure.

Starting point is 00:15:47 Maybe it's because like you and I are seeing the potential of agents. That's why they're so excited and are working on that. as much. And maybe they have some good stuff installed for us in the, let's call them, regular LLMs, because they are all claiming, maybe aside from Ilyas Suskeber, they are all claiming that we're not there yet in terms of scaling completely like a slowdown. So it seems like also a marketarial discussion and then not just the technological discussion. Yeah, I agree. So speaking of this, number 19, new interfaces, standards and protocol will emerge, an agent computer interface.

Starting point is 00:16:29 Right. So, you know, we all were very excited when Anthropic first introduced the computer use. Everyone rushed into experiment with that, and it really sounded like the true beginning of something major. And then everyone quickly realized that it's much more cumbersome, expensive, and not very accurate. And I'm not sure whether this is the right approach. like do we want agents to control the computer like humans do?

Starting point is 00:16:58 Or in fact, because agents will be doing so much work on the computer, there will be a new need for an interface for these agents to control a computer. And moreover, because there will be so many agents working together, then there will be a need for new APIs, new protocols for how to communicate. between agents to agents, as well as perhaps being much more literal about how we write stuff because agents can't read between the lines like humans often do. Maybe your error messages have to be machine readable versus human readable and so many other things. So that will also, I believe, will be a huge focus.

Starting point is 00:17:43 And an interesting part will be whether all of these different players will be able to get to an agreement between them or we will get to a point where everyone blocks one another with different protocols rather than being open-ended and letting other companies, agents operate on your data. And I'm not sure whether all of these websites

Starting point is 00:18:04 will let agents call and do stuff on them or will we see an economy of blocking each other where essentially they're telling you the end user that if you want to do this action, you have to use our agent because we will block your agent from doing that on our data or our tools. My guess is it proceeds sort of similarly to how most versions of this have, which is initial

Starting point is 00:18:28 balkanization and attempt to capture value that ultimately loses out to open protocols and standards that underlie things because there's just too much efficiencies to be had. If it's anything like the way that the Internet has developed in other areas, I definitely think that this is going to be a big part of the next few years is those sort of subaltern kind of battles happening. Let's hope that the open end will win for the sake of all of us because it will be a better economy in my opinion. Number 20, a lot of investment in creating agent-oriented benchmarks. Okay. So how do you measure an agent's performance? Is it only when it arrived at its final destination? Sometimes we don't even know the final destination, so it's very hard to measure that.

Starting point is 00:19:16 And we have seen some recent emerging benchmark that are trying to be more open-ended like agents are and try to pose a set of evaluation questions that will require these multi-step reasoning and open-ended thinking that an agent will have. Two concrete examples, the SWI, Software Engineering benchmark, that tries to let an agent or an AI have multiple human-like software engineering tasks and measure how well they perform in that. And there is also an interesting benchmark of research engineering where the agent need to basically do the AI research

Starting point is 00:19:56 that a human expert would do. So these are two interesting benchmarks that are emerging. And I believe that we will see more and more because the existing methods for evaluating the LLMs are not suitable for agents. They often look at the bottom line and are not really indicative of how well the agent performed, especially if you want to open the black box and see the multiple reasoning steps

Starting point is 00:20:20 that the agent has done in order to get to the result. So we will see more of those, and rightfully so, because as we talked before, there will be so many competing offering. And aside for maybe experimenting ourselves, it's going to be very difficult to assess how well they're doing if we will only use the existing benchmarks. Yeah, I completely agree with this. think there's going to be a highly functional set of benchmarks necessary. Again, just thinking about

Starting point is 00:20:46 it strictly from the standpoint of the enterprise. So as we are thinking about how to recommend in Agent X versus Agent Y for some specific purpose that we've with an enterprise determined is a great place to start experimentation, the types of things that would be valuable for us to know are exactly the types of things that you were just mentioning that there currently aren't benchmarks for. So, for example, how many times in the process, of completing the task at hand is the agent likely to need guidance from humans. You know, a one on that is very different than five on that, right? The value proposition is totally different based on that.

Starting point is 00:21:25 That's like that whatever that score is called is a score that I would like to see, you know, as relates to making decisions around agents. So I think that you're right that there's going to be a lot of exploration here. And it won't just be pure technical benchmarks. I think those will be highly functional and related to actual. usage as well. Yes, for sure. Number 21, the emergence of agent-oriented LLMs to serve as underlying models.

Starting point is 00:21:50 Right. Again, maybe something a little bit more controversial, but this is my opinion and feel free to weigh in yours. But I believe that unlike the traditional LLMs that are very much designed for a broad natural language tasks or sometimes image videos and so on, the LLMs that are more oriented towards agents will be more. more purpose build for powering those autonomous activities that the agents will need to do. And, you know, open AI is 01 and now 03 and the likes.

Starting point is 00:22:23 They are a good step in this direction of having LLMs that are more geared for agent reasoning. We can and we will probably also see more of these models created and used some concrete explanation. So they might not be better in the general benchmarks because they don't have to be smart in everything like we are benchmarking our existing Open AI and other models, but we want them to be more suitable for agents. So maybe they will prioritize the multi-step reasoning. Maybe they will prioritize the long-term memory or maybe they will be very smart about retaining very good context or enabling the agents to be more thoughtful in the way they plan and the way they make decisions.

Starting point is 00:23:13 And while we see these LLMs being specializing, we might even see a mix and match where even one single agent will use different models for the different steps of performing its task. So maybe it will use the O3 model for the initial planning and then it will use a smaller model for doing its ongoing tasks as part of the overall flow. And I believe that eventually we will see a very hybrid approach where some of the models that are used are smaller, cheaper, faster. Some of them are smarter, and the best engineering practices

Starting point is 00:23:51 will be around finding the right models and using the ones that were probably tailored the most, even for not only the overall agent concept, even for your specific vertical, we might even see those emerging. Yep. I don't have much to add. I think this is absolutely going to happen. I think that the more that we get sophisticated around what gets better performance, I think, and I think that there's there's going to be cost incentives to do this experimentation, if nothing else, right? The fact that the sort of highest state-of-the-art intelligence is still very expensive means that there's a lot

Starting point is 00:24:28 are reasons to try to ring more value out of other models and other approaches. And so I think we're just going to see tons and tons of this sort of customization. Today's episode is brought to you by Vanta. Whether you're starting or scaling your company's security program, demonstrating top-notch security practices, and establishing trust is more important than ever. Venta automates compliance for ISO-2-GDPR and leading AI frameworks like ISO-402 and NIST AI risk management framework, saving you time and money while helping you build customer trust. Plus, you can streamline security reviews by automating questionnaires and demonstrating your security posture with a customer-facing trust center all powered by Vanta AI.

Starting point is 00:25:10 Over 8,000 global companies like Langchain, Lila AI, and factory AI use Vanta to demonstrate AI trust and prove security in real time. Learn more at vanta.com slash NLW. That's vanta.com slash NLW. If there is one thing that's clear about AI in 2025, it's that the agent are coming. Vertical agents by industry, horizontal agent platforms, agents per function. If you are running a large enterprise, you will be experimenting with agents next year. And given how new this is, all of us are going to be back in pilot mode. That's why Superintelligent is offering

Starting point is 00:25:47 a new product for the beginning of this year. It's an agent readiness and opportunity audit. Over the course of a couple quick weeks, we dig in with your team to understand what type of agents makes sense for you to test, what type of infrastructure support you need to be ready, and to ultimately come away with a set of actionable recommendations that get you prepared to figure out how agents can transform your business. If you are interested in the agent readiness and opportunity audit, reach out directly to me, NLW at B-Super.a.I. Put the word agent in the subject line so I know what you're talking about. And let's have you be a leader in the most dynamic part of the AI market. All right. And now we get to our last section investment in media hype. Number 22, this is

Starting point is 00:26:26 probably your safest prediction. Significant VC dollars will be invested in agentic companies. Yes. So probably everyone who wants some funding or take a good care of their stock will have to say agent. And this can also be a fun drinking game. Each earning call, how many times each CEO will say agent. And you mentioned in the previous conversation, the Y Combinator team saying that vertical agents will be 10x bigger than SaaS. That created a lot of headlines, and we know that other VCs are also already on board with the technology trend,

Starting point is 00:27:04 and this for sure will continue in 2025, and thereby there will be many newly founded startups and companies, but also many companies that will add anogenic offering or pivot towards an hygienic offering, and some of it rightfully so, some of it, the natural evolution and progression of things will probably not have them in the history books as the companies that yielded a lot of value from that. I think that this is absolutely true. It's already happening. Certainly, you know,

Starting point is 00:27:38 this has been a major theme with venture recently. A couple of things that I think are interesting to watch for that should tell us how this is evolving. One is exactly what you just called out to the extent that AI gets supplanted or supplemented with agent mentions in earnings calls and things like that. That'll be very telling. But two, one of the things that I think will happen is that lots and lots of companies and startups will not for the sake of funding, but just because they realize that an agent can do something discreet and unique for them, accidentally start building agents on top of or as part of or as replacing their existing offering. We've had this process. So superintosh.

Starting point is 00:28:22 Deliverance delivers AI, you know, support, you know, enablement as a team, as a self-serve platform, and now increasingly as an agentic offering. And that wasn't a money-chasing thing. It was because we realized there are things that we could do with agents to scale ourselves that we couldn't do any other way. And I think lots and lots of companies are going to, you know, stumble into experiments next year where building with agents actually unlocks totally new possibilities. that they haven't seen before.

Starting point is 00:28:53 So this might be one of those rare VC themes. There's enough there there to justify all of the excitement and the capital that flows in. Yeah, I believe there is, but you have to be smart about what you're building and for what reasons. Yeah. I think I would much more so than speaking to the investors, I think the reminder for me when it comes to builders is it's always a bad, it's often, if not always, a bad choice to just chase the trends in what VCs are looking for rather than making the right decision for whatever company you're trying to build and whatever problem you're trying to solve. However, I would say I would caution against explicitly not

Starting point is 00:29:35 looking in the way of agents because you believe it's overhyped and just sort of a VC thing. I think there's going to be lots of opportunities to build there that are going to be really fun and meaningful until the summer, of course, because number 23, come summer there will be a media debate about whether agents were overhyped and whether development is slowing down. So, you know, the existing challenges might not be resolved, and we mentioned many of them, as we discussed in these two episodes, but new challenges will probably emerge and reality will meet the currently probably overhyped and overinflated media expectations. And fortunately, the media will also be the ones probably during the summer where the news

Starting point is 00:30:15 cycle subsides that will take upon itself to deflate the bubble and tell us all how agents were mostly hyped and are not delivering to promise. And what we predicted that comes fall, we will meet reality. And the reality, at least in my opinion, is that agents will continue to yield a lot of value. And bottom line here, from my perspective, is that while the news cycles will come and go, and we will see many headlines saying that agents are not what they promised to be. They will be, and it might, the only caveat is that it might take us slightly longer than

Starting point is 00:30:58 anticipated, and it might be a little bit harder than anticipated, but the value is there and will continue to be there, at least in my opinion. Yeah. So in the summer of 2023, the version of this was that Chat Chapti had its first down month in June of 2020. And that was the context for all of these pieces. And then in this year, of course, it was the Goldman Sachs, too much money, too little value, and the Sequoia, $600 billion question posts that created the whole discussion.

Starting point is 00:31:30 And so there does seem to be a trend where summers generate, you know, kind of a fud cycle around AI. Interestingly, part of why this one's going to be extra funny when it happens is that agents have actually been the most hyped things since chat GPT launched. If you go back to April 2023 when the AI Daily Brief was just starting, the thing that everyone was talking about was auto GPT and baby GPT. And it was agents from from then. And so it'll be very funny to see that we're actually have, you know, a discrete set of perhaps very specific, you know, kind of single purpose agents deployed. And yet, you know, the narrative might be that it's disappointing. But I agree with both the likelihood that there will be that hype cycle or that anti-hype cycle and also the

Starting point is 00:32:20 reality that it is incorrect ultimately. Yeah, let's play this tape once we're there to prove that we predicted that. Number 24, agents will be intertwined and accelerate AGI discussions. Okay. So when I created this prediction, it was before last week, last day of OpenAI, 12 days of shipmess. And for those of you who might have already went into vacation hibernation, so Open AI literally shocked as yet again last week when they announced the O3 model because they said that it surpassed the

Starting point is 00:32:54 human level in the ARC benchmark. And the ARC benchmark is a benchmark that was created specifically to evaluate the AI system's ability to generalize and solve problems that prove that it's AGII worthy. and up until last week, the best performing model was very, like, I think, low 20s or low 30s. I don't remember the number, but Open AI with their O3 model have surpassed human ability. And I think even more so the discussions around Are We Dariet will reignite in early 2025. And, you know, with all of these agenic discussions, we need to ask our, what's the relationship

Starting point is 00:33:40 because if agents demonstrate an increasing autonomous behavior and they're utilizing O3 in the background that already surpassed some human benchmarks in AGI, the lines will become really, really blurred and the debate will probably go further

Starting point is 00:34:01 about are we there yet with AGI? And I think during the year, as we will see more and more impressive, use cases of agents come to fruition. Some of these discussions might be even relevant. However, I have to say, first of all, that bottom line, I don't think that 2025 will be the year of AGI, even with agents and all this intertwined relationship. And I also don't think that it really matters.

Starting point is 00:34:32 I think, like I said before, what matters is the outcome or the results. and agents will yield good results at 2025 and will have a lot of potential of having human-like abilities in many, many different tasks, but I'm not sure whether it will move the needle as much or will it matter beyond some financial and some specific companies that have the incentive to say AGI is here.

Starting point is 00:34:59 Absolutely. I think AGI ultimately matters insofar as it's deployable to change, the way things actually happen, right? And so I think that that's why it will get caught up or connected to the agent conversation is that agents are going to be a lot of where the next frontier of the state of the art goes to get deployed when it comes to AI. For what it's worth, Francois-Lay again, who was the creator or the progenitor of the ARC prize, he was, he tweeted about whether this meant that 03 was AGI. And what he said was, while the new model is very

Starting point is 00:35:36 impressive and represents a big milestone on the way towards AGI. I don't believe this is AGI. There's still a fair number of very easy ARC AGI 1 tasks that O3 can't solve, and we have early indications that ARC AGI2 will remain extremely challenging for O3. This shows that it's still feasible to create unsaturated, interesting benchmarks that are easy for humans, yet impossible for AI without involving specialist knowledge. We will have AGI when creating such evals becomes outright impossible. So even though there's a huge discussion right now, At least the guy behind that particular benchmark doesn't think we're there yet. But I do think that you're right to call out.

Starting point is 00:36:12 Certainly, this has been the big discussion over the last few days. You know, we're recording this on Monday, December 23rd. And it's been pretty much all anyone's been talking about for the weekend. But actually, when push comes to shove, you see this happen over and over again on Twitter slash X. You know, someone will start with a debate around whether this is AGI. And then it'll quickly get to, well, it doesn't really so much matter. What matters more is, you know, does this mean software developers are cooked? does this mean, you know, different job roles are totally going to change. And so I think that

Starting point is 00:36:39 it's going to be all about that practice that really matters. And again, that's why, you know, agents are going to be such a big part of the story. However, according to number 25, there will be an even bigger part of the story a little bit down the line. So 2026 will be even bigger for agents than 2025. Right. So I think it came across in multiple times during this conversation that this is where we will see the beginning of the exponent. And of course, if everything that we just discussed will happen, 2025 will be an amazing year with a massively forward in agents and humanities progression overall. But it's just the beginning.

Starting point is 00:37:19 And I believe that 2026 and probably a few years after will be the years where many of these learnings and development and whatever we learn from this, you called it the pilot year or the year where more and more people put their hands on agents, this is where we will yield the big promise of Gen A.I. And that's why I'm so excited. You asked me at the beginning why I'm so excited about agents. It's what will happen in 2025, 2026 and beyond that will get us all to be amazed about how work and life were before this era.

Starting point is 00:37:57 Yeah. So I agree with this. And I would go a step farther. So I think that in 2026, enterprises will, that'll be the first year that enterprises meaningfully and regularly have agents deployed just in the normal course of their, of their workforce, right? It'll be a hybrid human agent workforce will be increasingly the norm. Not the norm, but more and more will be, it'll be normal to see that as part of certain

Starting point is 00:38:24 functions. I think that it'll be highly focused on particular functions to begin, but I think that it'll be fairly normal in 2026, to be having agents deployed at scale across certain functions. And so the implications of that are that you have to use 2025 to figure out which those functions are, how you integrate them with your systems, how you build the new systems around them that you need. And that's going to take a ton of work and experimentation. Obviously, this is what super intelligent is positioning to help people for. This is why we're doing these readiness audits.

Starting point is 00:38:56 It's why we're supporting, you know, agent deployment. It's why we're helping companies build systems for ongoing AI transformation. 2025 is going to be an incredibly important inflection year that is really going to push enterprises to build the systems that allow them to actually take advantage of this in 2026 and beyond. And I think that the implications of that are that you really will start to see, especially in 2026 and beyond a clear breakout of companies that have built these systems and have the capability who have gone through AI transformation and who have this system set up to continue AI transformation, they will start to break out from the pack in very meaningful ways in a way that hasn't even

Starting point is 00:39:41 happened yet. So I think it's going to be very, very exciting. And I think that this year, this year will be very fun because the stakes will be high, but still there's lots and lots of room to do things that don't work and to, you know, wander down paths that don't lead anywhere. That won't be the case for very much longer. It's going to be a fun year for sure. All right. Well, Nufar, thank you so much for hanging out.

Starting point is 00:40:05 This is a super fun conversation. We don't have anything quite yet to announce, but for anyone who did like this, keep an eye closely tuned, or an ear, I guess, closely tuned to this, as we might have some interesting announcements coming up. But hope that you have a very fun and non-agentic holiday, everyone. And we'll see you in 2025.

The AI Daily Brief: Artificial Intelligence News and Analysis - 25 Agent Predictions for 2025 - Part 2

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.