The AI Daily Brief: Artificial Intelligence News and Analysis - NLW on the Future of AI Agents

Starting point is 00:00:00 Today on the AI Daily Brief, a special interview with me on the future of AI agents. The AI Daily Brief is a daily podcast and video about the most important news and discussions in AI to join the conversation, follow the Discord link in our show notes. Hello, friends, welcome back to another AI Daily Brief. I am traveling this week, so we're doing a couple episodes that'll be different. I do have my podcast gear, so I will be recording some normal episodes. But for today, I'm sharing the first part of an interview that I did with another podcast, a great one called Tool Use, a couple of weeks ago, about AI.

Starting point is 00:00:35 agents. Obviously, this is the topic dejure. And because I was in the interviewee chair for this one, I got to riff a little bit more broadly around what I think the future of agents actually looks like than I otherwise normally would. So what I'm going to do is I'm going to share a little more than half of this episode. And then I'll send you out a link to where you can find the rest of it on their feed, the guys over at tool use interview builders and entrepreneurs and other folks who are actually using AI day in and day out about how they're using AI. So if that's interesting to you, I highly encourage you to go check out their show. So again, today's episode is an interview with me about the future of AI agents.

Starting point is 00:01:09 Today's episode is brought to you by Vanta. Trust isn't just earned, it's demanded. Whether you're a startup founder navigating your first audit or a seasoned security professional scaling your GRC program, proving your commitment to security has never been more critical or more complex. That's where Vanta comes in. Businesses use Vanta to establish trust by automating compliance needs across over 35 frameworks like SOC2 and ISO-21001.

Starting point is 00:01:35 centralized security workflows, complete questionnaires up to 5X faster, and proactively manage vendor risk. Vanta can help you start or scale up your security program by connecting you with auditors and experts to conduct your audit and set up your security program quickly. Plus, with automation and AI throughout the platform, Vanta gives you time back, so you can focus on building your company. Join over 9,000 global companies like Atlassian, Kora, and Factory who use Vanta to manage risk and prove security in real time. For a limited time, this audience gets $1,000 off Vantan, at Vanta.com slash NLW. That's V-A-N-T-A.com slash NLW for $1,000 off. This week, we're joined by Nathaniel Whittlemore, also known as NLW, the founder and CEO of Superintelligent,

Starting point is 00:02:21 as well as the host of my favorite daily AI podcast, the AI Daily Brief. NLW, welcome to Tul-Yos. Hey, it's great to be here. Thanks for having me. We're super glad to have you on. I guess we can kind of kick things off. I think everyone kind of has their own definition of what an agent is, it seems like. there's not really a very good definition. I'm kind of curious how you define an agent and kind of what that means to you. I actually have a super strong point of view on this. You'll find this is

Starting point is 00:02:45 a common thread for me. So you see a lot of kind of hand-wringing, I think, among people who have been in AI for a long time or who are sort of, you know, more technical experts on how mangled the definition of agent is as it's, you know, find its way into enterprise and stuff. And I actually think that we should not care about that. I think that when people on average or talking about agents or referring to agents, they're bucketing AI into two categories. AI that I have to use and AI that does stuff for me, right? Like without me having to really, you know, tell it other than maybe that one first time. And obviously, that's, you know, not, not super precise.

Starting point is 00:03:22 But I think broadly it gets people kind of in the way to think about it. Like particularly if you're, you know, an enterprise leader and you're thinking about whether you're going to deploy, you know, kind of a assistant style AI or agents, like they really kind of broadly bucket into those two categories. I also think that we've so rarely had as much like narrative consolidation around a single term that's like kind of in the ballpark that the fact that everyone kind of knows this term and is there like trying to kind of like, you know, get into the nitty gritty between agent and automation, I just think is a is ultimately sort of a not particularly relevant pursuit. I think what people are looking for when they're when they're

Starting point is 00:03:58 talking about agents is stuff that actually takes big chunks of work off the table for me, not just makes me do that work better. Yeah, I found it something similar to where people say, oh, the newest agent from OpenAI deep research, which I've used in as great. And other people say, like, well, what about code interpreter? Is that an agent? And ultimately it doesn't matter, whether it's a tool or a workflow,

Starting point is 00:04:17 as long as it solves a certain task for you. Through your use of them, what type of use cases are you excited for? What have you found to be actually helpful in the current state? So we think a lot. So the main product right now that super intelligent is being hit up for, is this something we call the agent readiness? it, which is basically an agentified process of looking across an organization's workflows and its procedures, its policies, to help them understand what they need to do to be ready

Starting point is 00:04:44 to use agents and which agent use cases might be a good fit for them based on current capabilities. And I think that what we often end up kind of, you know, what ends up getting shared with them is, you know, we have these grand ideas of these sort of multi-agent workflows that are orchestrated perfectly and take, you know, giant chunks of tasks off. And that's really just not where things are. Where things are right now is still in this sort of discrete task, you know, repetitive discrete task that you can do, that you have to do over and over and over again. And I think that the more that people and companies experiment, you know, with that in mind, the better suited they're going to be to actually, you know, taking advantage of where agents are now. I think it's going to change

Starting point is 00:05:25 dramatically over the course of this year, right? So really single purpose, very specific agents. I think the way that I think about it, you know, sort of from a personal perspective is, you know, we haven't really agentified a ton of the like podcast processes yet. You know, we use AI for a bunch of them, but they're sort of like not fully automated. I think that when it comes to building super intelligent, we're in the midst of going through and sort of totally reevaluating how everything gets done and actually trying to embed agentic workflows in how we work, right? So the way that we build the products is changing, you know, based on cursor and sort of, you know, different, different approaches there. We have, you know, the knowledge base that powers this agent

Starting point is 00:06:06 readiness audit is a workflow that sort of automates a set of different agents or automations or however you want to do it. You know, there's a Zapier piece and, you know, a couple other pieces that all add up to automatically extracting information from the web around current agent capabilities that happens every day. So we're kind of going one by one through all the things that we're doing and just asking which parts of this could be sort of supported by, augmented by, or replaced by an agent and trying to redesign on that basis. That's really smart. Yeah, we've played a little bit with like trying to use agents for kind of optimizing some

Starting point is 00:06:39 of the podcast tasks. And I think we have so much experience with AI that we kind of think in workflows. We think of agents. We kind of know what they're capable of. Some people we talk to like have almost no experience in AI other than like the one chat, GPT conversation they've had. And so like even just understanding where AI like fits into their equation into their business is kind of a difficult thing. Like where do you start with someone that doesn't have a lot of experience in AI? Like how do you kind of explain the benefit to them and how they

Starting point is 00:07:10 can get started? Like what's one thing that they could start like this week with AI? One of the things that often comes up and this has been the case for some time and is sort of not not agent related is people underestimate the value of some very basic use cases. So we did a survey of super intelligent users a while back. And the number one use case for AI across the set of enterprise users was brainstorming, right? Basically making their work better by having chat GPT act as like a consultant or a thought partner as they were thinking through things. Right. And this is going to evolve over time. I think, you know, an interesting analogy is sort of like imagine the marketing or social media that you guys do for the podcast.

Starting point is 00:07:52 You've probably shifted, I would imagine, from doing it totally raw yourself to now like partnering with chat GPT on some of the copy and using MidGermy for some of the images. And so it's like now that's sort of this AI enabled process. You know, it's an AI-assisted process where, you know, maybe the time has gone down,

Starting point is 00:08:08 but I bet the benefit is more just sort of like quality increase and cognitive load decrease for you guys. However, I would imagine that over the course of the next year or so, we'll all be able to actually, I think that, you know, a social media agent seems like one of the easiest to actually execute against. You know, how many tweets per day do you want? What do you want them to relate to? How much are they replies versus this? You know, like, you can kind of see how it comes together pretty quickly. What's the database that I have to pull from of previous messages? And so you're kind of going to see that process. And so if people are just getting started, just using the assisted level AI to, you know, see how it makes their work.

Starting point is 00:08:48 better before they worry too much about even time saving necessarily, I think is often a really good starting point, you know? Yeah, absolutely. And I've even seen the progression where you have those chats with Claude or ChatGBT to get some input, help with the brainstorming, coming up with titles, to creating a cloud project when you can upload a bunch of documents, a bunch of standards and best practices so you can get more consistent results over time. We've also experimented with the AI editors, and we've yet to find success there.

Starting point is 00:09:13 But it's interesting how the chasm between what works today and what is, you know, not quite working. What's a little ways off is just shrinking by the day. Yeah. Have you noticed any tools in your workflow that have really allowed you to completely offset a process or are you still a human in the loop a lot of the time for these type of things? So the thing that's closest right now in the super intelligent world is the automation of this knowledge base around current agent capability. So when we're trying to match, this agent readiness audit, basically the way that it works is a company will come to us and we'll talk to them. And then we deploy a voice agent that does this interview where, you know, we've customized the set of questions. And we can do this, you know, a very small handful of times and just get a very kind of high level overview.

Starting point is 00:10:04 Or we can deploy it all the way down to the employee level across hundreds or even thousands of people, right? You get all of this information. and then based on the interviews with all those people, we're running that through this knowledge base of agentic opportunities that includes what the agent is, what it does, what use cases are related to it, what industries are related to it, what compliance regimes that, you know, like it fits with what tech stacks it matters to. So it's sort of like it's not just like the two or three kind of vectors that you might imagine,

Starting point is 00:10:34 you know, it's like a database of like, I don't know, 20 or 50 rows or something like that, around all this information that we're trying to gather. And we've really like hyperautomated the process of ingesting that information. Now, we still have an additional layer of human interaction. So, for example, like a lot of these subjective information that I get as relates to agents comes from Twitter slash X, right? Like people saying, oh, this sucks. Oh, this is great. And that's actually quite useful to try to benchmark like where a thing is when you're trying to give a company an expectation of whether they should be using it or not, you know, like, if subjectively, like, half the sentiment on Twitter is that it's great and half the sentiment is bad, you can go in with the appropriate

Starting point is 00:11:17 expectations, you know, we're not sure how ready for prime time it is. It might be, like, slightly overpromising or whatever. So it's not fully automated in the sense that there's things that are still much more valuable for kind of like the human to do. But it's getting there, right? And I think that that's an important piece. When it comes to the podcast, there's nothing that's fully automated, although I did an experiment last week when I got sick of a much more automated process. So basically, I had pretty much lost my voice. And so I took a topic, used deep research to write a paper on it. I think the one that I ended up using, I tried a couple. And the one I ended up using was economic predictions in the era of AGI, basically how AGI is going to impact

Starting point is 00:12:02 the economic landscape, wrote a research paper on it with deep research. and then fed that into Google's notebook LM and let them turn it into a podcast. And that's what I published that day as an experiment. It went over reasonably. It was sort of a cute idea. So I don't think I'm going to be turning back to that too frequently. When I think about where automations might come in the future for the AI Daily Brief,

Starting point is 00:12:26 I think that there's, you know, there's probably not for my show because so much of it is like the context that I add implicitly around things. But, you know, news podcasts are going to be very easy to go from, you know, the automated feed that curates them and just turns it, you know, end-to-end pipeline into a podcast that gets pushed out, you know, is a very, very sort of simple set of steps that, you know, each requires their own automation, but you could do it really effectively. Yeah, absolutely. And as a long-time listener, I can tell you that the added personality, the added perspective always helps besides just, you know, an information dump.

Starting point is 00:13:03 I actually wouldn't mind double click on deep research because I've also used it, had positive result. But as you mentioned, the Twitter vibe test, a lot of people didn't seem to like it. A lot of people did, but it was one of those right down the middle ones. What's your experience been like with it? Do you think it's a step for the right direction? And even just like long running AI processes in general. Do you think that's the future? Yeah.

Starting point is 00:13:24 Well, I think it's part of the future for sure. I think we're going to have to do a lot of experimenting and iterating to figure out exactly how these things work. My sense is that most of the people who have had positive experiences with deep research have used it for particular types of knowledge, you know, summarization that it was well suited to do. And the people who have had bad experiences have started to figure out the jagged edges of where it's not so good, right? So it's very clear that like not having access to contemporary journals is a huge problem, right? it really like limits its ability to be super deep and contemporary when it comes to science or anything that requires access to you know journals that are behind paywalls. The other thing that I found is that when it comes to really like fast moving spaces, there are,

Starting point is 00:14:15 it could be a challenge. So for example, this AGI thing that I did, it was mostly great. However, it was definitely like overreliant on Nick Bostrom's super intelligence, you know, as a resource. And I think at one point it said that most scientists still think that AGI is a decade or two away, which is obviously like so not the, you know, it's not reading Twitter. Let's put it that way. So I think that there's, there's, you know, we're just going to figure out like basically research or, you know, grabbing a bunch of sources and turning them into a consolidated bucket of knowledge is actually a very diverse use case. It's not one use case. It's about a thousand use cases embedded in one category of use case that it's going to take some.

Starting point is 00:14:57 time for us to figure out what what pieces of it this particular tool is actually good at. I've enjoyed deep research so far. I think there's limitations. I think what's also weird is some of the limitations are like when it hallucinates, it's hard to know that it actually hallucinated. It's like in these areas that I'm not an expert at. Like it could just say something and then cite the reference. And I'm like, oh yeah, that's true because it read the thing, right? So it's like it's so much harder to spot these hallucinations. I feel like hallucinations are still an issue in the world of AI, and I feel like that's something that we're still trying to solve. How big of an issue is hallucinations do you think? And is that like a primary complaint

Starting point is 00:15:35 that you see with businesses? Yeah, it's actually a much bigger deal for businesses than it is for consumers, I think. Consumers have a higher kind of threshold for what they can deal with, especially if it's, you know, so much of the deep research use case isn't, it isn't trying to get something that's production ready. It's trying to get a kind of a thing that gets to 80 or 90 percent, right? So one of the use cases that I've seen a number of people have success with is basically background market descriptions and sizing for their startups. So they're trying to communicate and understand how big the total addressable market for the thing that they're building is. And, you know, it's really good at pulling a bunch of different resources in, blah, blah, blah, blah,

Starting point is 00:16:15 but they're never going to just turn that over to an investor, at least if they're actually a good entrepreneur, they're not going to just turn it over to an investor. But it saves them a huge amount of time. like I said, it gets them kind of 80% of the way there. And so for them, they're maybe more in a position to actually spot those hallucinations. Where hallucinations become a real problem is when people are actually, you know, basically replacing a human information source with an AI or an agent information source that really relies on having the right information. So an insurance company that we work with found that the sort of,

Starting point is 00:16:53 of like the threshold of tolerance that people had for a human agent being, uh, being wrong when they were giving them information was like 5% of the time, 7% of the time, something like that. Whereas within a robot basically giving them that information, it was like less than 1%, right? People expect it to be absolutely perfect. And in certain cases, if you're in kind of like a, you know, highly regulated industry, the, the barrier is, is extra high because if you give bad advice, you know, if you think medicine, insurance, I mean, anything like that, that the threat threshold is just extremely high. So hallucination is one of those weird things where it's like, it's often just kind of funny and silly when it comes to consumers, but is a major detriment

Starting point is 00:17:32 to how far into deployment and production certain use cases can be for big enterprises. Absolutely. I've kind of tried to teach people to a few as like Wikipedia. Use it to get started, but it's not something you can put as a reference in your paper. In regards to the hallucinations, a lot of people try to solve this with e-vowels or just build enough for robust e-biles set that they're able to kind of mitigate against some of the risk of hallucinations. Do you find businesses are implementing any other types of strategies or are they even following through with the evals or just kind of yelowing it? What's the vibe in the business community? I think that evals are still actually underutilized. I think that a lot of companies like they're still, they're just now coming to

Starting point is 00:18:11 understand what the full stack of things they have to do to actually implement these tools. And one of the things that's frustrating or can be frustrating is when you realize you have to like you can't just build the thing. You have to build this other set of infrastructure to support the thing to make the thing work. You know, and there's often a resistance to that. I mean, all of the custom build shops that are, you know, that are either building custom agents or helping deploy things, they always complain about how, you know, the budgets end when you get to e-vows and they don't want to, you know, kind of put those into practice.

Starting point is 00:18:39 So I think that even evals are there. To your point, I think that using them is a whole different issue. I think that the way that this will be most addressed in the short term, which is sort of has, it's kind of good for a number of reasons. I think that you're going to see humans in the loop for much longer than they theoretically need to be to help solve and spot this. And I think that human in the loop is, in addition to it just being a solved for technical problems of AI, I also think it's a transitional tool to slow down the sort of like

Starting point is 00:19:19 rate of full task and work replacement. that AI could possibly do, right? It creates a mechanism for, you know, continued human presence, even on areas that are, that are being, you know, highly automated, which doesn't solve all the issues for, you know, job replacement and things like that. But I think that we will overindex on things like that even more than are necessary, because society's going to have to find ways to slow down what AI could do from a replacement perspective. Yeah, it makes a lot of sense. I do share the sentiment that evals are very underutilized. It's kind of remarkable to me how many, even like startups aren't using very many evals.

Starting point is 00:19:54 I feel like probably less than 10% of startups are using like true eval suites. Kind of brings me to my next question of like, what mistakes do you see most often when businesses try to implement AI? I think there's obviously, you know, there's hallucination, maybe overengineering. But I'm curious what like are the primary mistakes you see when business are trying to implement agents. There's a lot of things. So if you look at what companies view as their biggest challenges with, with AI right now, kind of broadly speaking. The three that tend to come up are, one, like, data readiness and whatever complex set of things that that means, you know, like, is there

Starting point is 00:20:30 data all, you know, in the same place? Is it ready to be used? I mean, this is a huge industry dealing with just that, just that problem. The second is, you know, everything surrounding privacy, cybersecurity, all that sort of set of issues. And then the third is employee adoption and utilization. And usually it's sort of like in that order, roughly speaking. One of the the things that you see constantly is, you know, company X will have 10,000 Microsoft co-pilot subscriptions, but, you know, only 33% of them are being used or something like that. And, and there's just no real support infrastructure. There's no enablement infrastructure surrounding that sort of utilization. And I think that that's just a market gap that is very, you know,

Starting point is 00:21:13 that is starting to be filled. I mean, this is sort of the space that super plays in, obviously. but it just needs, you know, needs more people building more things in that space that can support, you know, adoption and implementation. So that's a big challenge. I think when it comes to agents, what we're going to see is sort of a lot of, we're going to have a misalignment of expectations. I think people are going to try to or they're going to imagine that they can do more than they can to start. And I think that, you know, they'll try to, you know, create these very complex systems right out of the gate that won't quite work. Yeah, there's probably a bunch more that I can think of as well. I wouldn't mind diving into the security aspect a little bit.

Starting point is 00:21:51 We're familiar with this one project, open this project, a code gate, which kind of acts as a local proxy that your LN request wrote through so it can redact PII and stuff like that. But it just seems to be just getting started. Do you have any either tools or advice for companies that are concerned about security and bringing LLMs into their workflow? I think that I would, at any given time when anyone is listening to this, I think it's really worthwhile to go try to do an audit of what tools are available in that space

Starting point is 00:22:18 because this is so clearly such a massively juicy problem that, you know, there are startups now. There are going to be more startups in three months. There's going to be many more startups in six months. They're all going to be taking different approaches to this or slightly different approaches to this. There's a full, I mean, a full sort of spectrum of options of available support to the extent that people want to solve that now. like there's companies that will come in and build things completely on premise. I mean, there's just a ton of options. I think that one thing that's interesting that's happening with this, with this sort of technology shift that maybe didn't in quite the same way in previous

Starting point is 00:22:55 eras is companies are building a lot more than just buying off the shelf than they have in the past. So Menlo did a study, their enterprise adoption study. And between 23 and 24, the rate of build versus buy, there was a huge, huge shift. So in 23, it was something like 80 by 20 build. And then last year, it was 53 by 47 build. So, I mean, huge, huge shift. Now, I think that this boomerang's back. I think that what that reflects is verticalized solutions and verticalize agents not quite being ready for prime time yet. And so companies that are in those verticals or that are thinking about those functions, seeing the opportunity racing to build it, you know, because there's all these frameworks that are available to them. But what I think

Starting point is 00:23:41 will happen naturally is that winners will emerge in the category that they had started to build in and then they'll naturally shift back over to whatever sort of the market leader is. But it does create this whole interesting dynamic. And because of that, I think that one of the ways that people are more emboldened to solve some of these issues is if they can't get there with the available kind of security profiles of third-party vendors, there are options more than perhaps in the past that are accessible for kind of rolling your own solution that just has. the kind of highest level of security that you can have.

The AI Daily Brief: Artificial Intelligence News and Analysis - NLW on the Future of AI Agents

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.