Big Technology Podcast - OpenAI vs. Anthropic's Direct Faceoff + Future of Agents — With Aaron Levie

Starting point is 00:00:00 How is the battle between Open AI and Anthropic shaping up now that they're both basically building the same product? And what is the future of AI agents? Let's talk about it with Box CEO Aaron Levy right after this. Welcome to Big Technology Podcast, a show for cool-headed and nuanced conversation of the tech world and beyond. We have a great show for you today. We're going to unpack the battle between Open AI and Anthropic now that their product roadmaps have pretty much converged. And we'll also talk about the future and the present of AI agents. and where that technology is heading. And joining us is Aaron Levy of Box, CEO of Box. Aaron.

Starting point is 00:00:34 Thank you. Welcome. Yeah, good to be here. I certainly like the framing on the battle. You know, I think it's, to some extent, it was sort of an inevitable outcome. Because if you think about it, like, if you have this AI model that is super intelligence packed into a model, it eventually has to converge on, on, you know, all of this, all the same use cases will be represented by that. And so then I think the labs, eventually,

Starting point is 00:00:59 eventually need to compete head-to-head, you know, for all those use cases. Yeah, I'm glad to get this discussion going even before the first question comes out. Yeah, sorry. Okay. I was like, I'll frame. Your intro was basically a question, so why not? That's right, but it is, it is really what's happening. So just to frame it, we saw Anthropic take the lead in Enterprise. Yeah. And Open AI seemed satisfied. For coding, yes. For coding, yes. For coding. But also, they were selling into enterprises through the API. Yeah. And that was where my belief initially about Anthropic came, that as Anthropic goes, So goes AI, because if this technology is useful to businesses, that means that the cap on the amount of money that it can make is going to be higher.

Starting point is 00:01:37 So Anthropic made this big bet on enterprise and on coding and crushed it. And Open AI made this big bet on consumer. A chatGBT, by the way, is probably at a billion users right now, even if it's not announced. And they did very well there. But then something interesting happened, where the coding models in December became good enough to code for kind of long time high, horizons without interruption and they became useful to even the non-technical folks. Yep. And then we saw this emergence of both these companies wanting to build this super app style

Starting point is 00:02:10 thing that basically that's sort of what the question is. Is it going to be an assistant for you? Is it going to be something that does your work? They say it wants, they both want it to do kind of everything for you. Where do you see that going? And how do you see the battle shaping up? Yeah. So let me let me just inject.

Starting point is 00:02:28 two couple quick thoughts in your initial framing and then I'll answer the question more directly. I think probably to represent both sides of anthropic and open-anitis, I think probably the story might be even more kind of complicated than even that initial framing because I actually think Chatsby-T leaked into the enterprise and has had actually a lot of enterprise traction of enterprise deployments, which is separate from the API business. And so if you go to a lot of enterprises, they actually will have Chat to BT as their corporate standard for kind of their corporate LLM for employees to use. So, you know, it's hard to kind of, you know, decide what data did you end up look at, but looking at. But I would generally argue that both have done actually extremely well in the enterprise.

Starting point is 00:03:19 And Chat TopipT, obviously, even more focused on the consumer historically. And now, obviously, you have this increased battle for enterprise dominance, both with coding the APIs and the end user kind of corporate knowledge work use case. So, yet a kind of co-work use case as well. The co-work use case being that kind of third one. And the big breakthrough that has happened recently, you know, literally just recently in the past few months, is this idea of what if you could give, what if an agent was really, really good at coding. but the use case wasn't to build software. The use case was to use its coding skills and general kind of tool calling skills

Starting point is 00:04:00 and the ability to run scripts. What if the agent was really good at all of those capabilities but was applied to the rest of knowledge work? And what kinds of use cases would that open up? And kind of the mental model is like, what if everybody was truly an expert at using their computer and they could write code for any task they wanted to do? but that same, you know, person that was the expert at using their computer and, you know, writing code was a lawyer.

Starting point is 00:04:25 And they were a marketer. And they were a, they were in life sciences and they did research. That's basically the power of agents today more and more in terms of where we're going. And so the idea, and co-work kind of, you know, best manifested this early on. I think we'll certainly, you know, see based on the rumors, open AI have a presence in the space and other players is, you know, what if you had an agent that was your general purpose knowledge worker agent, but again, it could use every tool on your computer. It can write code on the fly for a new problem that it hasn't seen before. It can use things called skills to be able to leverage existing kind of ongoing scripts and code that it needs to be able to use.

Starting point is 00:05:07 What kind of superpower would that be, you know, to be able to have as, you know, it's kind of this workhorse that you have next to you. That's kind of the next frontier of AI agents. And so I think we're clearly moving from a world where you will use AI as this thing you chat back and forth with. And that was kind of the first manifestation of the chatbot to now a paradigm where the agent is given a task. It has a set of resources that has access to. It has access to maybe your data, your software, tools on your computer, tools in the cloud. And it can go off and work for minutes or hours or maybe even days and go and generate, you know, some, you know, effective work. output that you can then go and use, review, and then incorporate into your broader work.

Starting point is 00:05:52 So this is kind of the big prize because it goes from the TAM, the total adjustable market being, you know, all of engineers to now the total adjustable market is every knowledge worker. And that's probably about a 30 to 50x larger market in terms of, you know, humans on the planet and their use cases. So you see this as business first? This is going to be primarily business. I think... But it's interesting because Greg Brockman, when I had him on,

Starting point is 00:06:16 described it as like a laptop where you could use your laptop for your personal stuff. You could use your laptop for your enterprise work. Yeah. And I fully agree with that framing. And I actually think that will suck it into the enterprise. I think what we're going to see is that the value and the ROI on those tokens, you know, the tokens are not going to be cheap anytime soon. And so the ROI on those tokens will just be much higher in the enterprise because it'll be, you know, generating something that is sort of, you know, impacts the GDP in some way.

Starting point is 00:06:46 And so I think that we will probably prioritize a lot of these systems toward those types of activities. But I totally agree with his framing that you'll just use it in a general purpose way. And probably the more that you're the kind of person that already likes to automate your life and do a bunch of automation things in your personal life, you'll use this also in a personal capacity. But I think most of the true economic value of it will come from the enterprise. Is this stuff going to work? I mean, there's two things to it, right? There's the capability side.

Starting point is 00:07:18 And then there's also the interest in using it. So again, just going back to one of these examples that I spoke about with Greg last week, basically what Codex, OpenAI, new coding app that can do your work for you tool. I still don't really know how to refer to it. But what it can do is, just for one example, it can, if you need to edit a video, it can go into Premiere and put chapters in your video. But I also think, like, do we really need, like, software to do that? Or aren't people just going to be – aren't people just going to prefer to do it the old way?

Starting point is 00:07:55 And how deep can it get? Like, do you think this will actually get to the point where it can edit the video, not just put the chapters? Yeah, I think these are the new, you know, these are like the new kind of personal evals or benchmarks that people have of, like, of, you know, when would you be able to edit a video? And I think Dorkesh, I think, asked, you know, even Dario that question, right? And he's, you know, when can we just edit this whole thing? We're just going to get a lot of podcaster benchworks. Yeah, exactly, exactly. This is primarily...

Starting point is 00:08:21 We should have accountants host this show, and then they can talk about stuff that actually matters. Actually, the more funny problem is, like, all of the AI models are being trained on all of this. And so they probably... The AI models probably think, like, the most useful activity in the economy right now is editing podcast videos. And they just, like, their reward function is, like, so optimized for... By the way, if that's what they prioritize, I would be thrilled. Yeah, yeah, yeah, yeah. Get it done, folks.

Starting point is 00:08:43 I don't know. More competition. I don't know if you want that. So it's good to have that as like a scarce activity. But so I'm not worried so much about will people want this in the sense of because I think that's kind of like a fax machine argument. And yes, there are always be holdouts. But I think efficiency generally always prevails simply because you end up prioritizing your

Starting point is 00:09:07 time and the value of your time as a new technology emerges. And you're like, well, yeah, I probably don't want to literally go to a fax machine, you know, have to put a piece of paper in this thing, but, you know, type in a bunch of numbers if it's just as an attachment and I send it to an email address. Like, it's like 10 times easier. So I think we, I think that will happen to a large set of areas of work. And we'll look back and we'll just consider it laughable that, like, we spent two and a half hours going and like reading some research paper just to find one fact because previously we didn't know where that fact might be in the paper. And so we like, you know, we had the, we all have our own little tricks.

Starting point is 00:09:42 Like we do some skimming, we kind of look roughly, spatially for the area, but it still takes like an hour. Like, an AI agent just does that literally for us in three seconds, and there's no going back. Like, we don't want to do that anymore. So the question is, like, you know, how deep can that go into work? How long running can that work, those agents be across work before you have to sort of review the output that the agent is doing? How well do these models work on much more subjective tasks? Like editing a video is like going to be actually, in many cases, a harder task than coding because, again, the code right now is like,

Starting point is 00:10:21 it has this great property of in the e-val process, in the training process, rather, you can instantly evaluate, did the code run, how clean was the code? We have a bunch of areas of work that don't have, they don't have that ability to instantly sort of verify So the reward function is a lot trickier for the agent. And then thus, in the real life workflow, it's kind of hard to then go and automate that task.

Starting point is 00:10:44 So I think this is actually going to take a lot longer to play out than maybe what we and some think in Silicon Valley. Because what's happened in Silicon Valley is we sort of look at all of the power of AI coding. And because that's like the most economically useful task within Silicon Valley, we sort of extrapolate most things from like how good AI coding is. And because then we're like, well, if AI can do code really well, then it probably can do legal and medical and, you know, and life sciences and architecture and design, all of those other tasks because we're kind of extrapolating the automation gains that we're seeing in AI and encoding. And the challenge is that that, and this has been talked about, you know, by a bunch of folks at different times, but just to kind of, you know, sort of share a few of the big, big buckets that I think everybody has kind of, you know, come down on. In coding, you have, you know, it's entirely text-based. You have access to the entire code base. The agent generally has access to the entire code base.

Starting point is 00:11:41 The models are really, really trained on coding, because, again, it's sort of verifiable. You can test the code and see if it works. The users of the agents, in these cases, are highly technical, so they know they're a way around these systems. They know when, like, the agent goes kind of crazy, how to, how to, you know, put it back on track. They know how to install the latest, you know, plugins that it needs. now you compare to the rest of knowledge work where it's just somebody doing their daily marketing job and the context the agent needs is in 20 different systems

Starting point is 00:12:11 and so each of those systems have to be individually wired up or you have to consolidate a bunch of data. The user maybe is not insanely technical and so they've got to go spend a bunch of time learning this stuff and the learning of a new tool is just generally not that much fun for people that aren't in tech because it's less like, that's just a pain. They don't get the same benefit of the very

Starting point is 00:12:31 of the coding agent. And so even when the agent goes and does a bunch of work, they have to have to go review the whole thing at the end of it because they have to make sure everything is sort of factually correct or has the right kind of, you know, sensibilities in what they produced. So all of those things are, and we haven't even gotten into like the governance policies, the compliance policies of that company. So all of those things add up to actually just meaning that the diffusion of these types of technologies will take many, many years as they go through the the rest of the world. And that's the part that I think Silicon Valley is going to have to be a bit patient on. And actually, that conversely is why I think there's so much opportunity right now is because

Starting point is 00:13:09 if you can build products and platforms that are sort of the bridge to that end state and make it as easy as possible for enterprises to go down that journey, that's just a tremendous amount of opportunity. So the labs are going to do that. And, you know, Open AI will do that. Anthropic will do that. There'll be a bunch of startups to do it in either vertical, you know, kind of categories or horizontals like what we're working on. But that sort of the big opportunities. Can you bridge how the world works today to that end state? But I think that I would expect most people have agents running in their daily life from a workplace standpoint over the coming years. Just because the efficiency will just be too strong to kind of avoid. That's right. And I will make the argument

Starting point is 00:13:51 that it might even go faster. Yeah. Just for the sake of discussion. Video editing feels like pretty subjective, but actually you can use technology today to be like, all right, if Aaron is speaking, let's have the, you know, tight shot on you. If I'm speaking, let's have the tight shot on me. Yep. And parts of the video where there's back and forth. Totally. Let's go with the wide shot. Yeah. And it actually can do that today without, without, that's not AI. So and then. But here's this is going to happen. Here's what going to happen. And I use, you know, I use sort of maybe like a lightweight AI, video editing. I don't know how much AI is in there. But there's always this part where you're like,

Starting point is 00:14:29 actually, no, that's the moment you want to go and look at the reaction of the other person. Correct. Even though somebody else is talking, we should kind of make sure we cut to that, cut to the other participant. And you're closer to the technology than I am. So I'm curious if you think this is the way it develops, where you then build like two taste agents or three taste agents. And then they watch the video and then they vote on what's better. And if you get unanimous or two versus one, that's the output. Yes. And, and. And then I think what will happen then is, you know, if you look at a sophisticated production in, you know, Hollywood, you know, they have layers and layers of editors and then producers and there's like, you know, like, I don't even know all the names, but like there's somebody who oversees the editors and they look at the final set of edits. And then there's the ultimate producer and the director and so on.

Starting point is 00:15:16 I think that what will happen is the video editor of the future just compresses all of those roles. and the agent is doing just that sort of, you know, the cutting part, you know, in a automated fashion. Right. But I actually think that you'll still have that ultimate person. Maybe what they'll review is five different cuts as options. And they are now playing the role of the, you know, the most senior editor in a, you know, TV show that would have happened in the past. But now you bring that same capability to every podcaster. Like, that was never possible before.

Starting point is 00:15:48 But you could. Yeah. No, sorry. Go ahead. No, but so then, so it's like, it's like the editor didn't really go away. The, what they are just doing is a completely different activity than what they did before. They have five agents producing a bunch of examples, and then they are doing some kind of final kind of, you know, synthesis of, of, of that work into some final output. Okay.

Starting point is 00:16:10 And because you'll, you'll just feel it. Like, you'll watch a podcast and you'll be like, ah, that was like really janky how they cut that thing. And then they'll be like, yeah, they probably just used AI only. Okay, but here, all right. So I want to dispute this because I do think that things can go even further. Yeah. Right. And what that means is right now we have an Internet and a world set up for human produced output in knowledge work, right?

Starting point is 00:16:32 What happens when it's agent produced output? Just assuming going with the thought experiment that this could work. Yeah. What you might end up having is, you know, you have, let's just go with the video. Yeah. God help me. We're going to keep filling the optimization catalogs with this stuff. But, okay, you put the video.

Starting point is 00:16:49 So you have this editor, the AI editor, cut a bunch of different videos. Yep. You have your taste agents vote on what the five bests are. Then what you might end up seeing is a platform like YouTube. We already can see. You can test a bunch of different thumbnails, a bunch of different headlines. Run totally different versions. And you can run a bunch of different videos, and then it will show it to your first 100 or 1,000 viewers. Yeah. And then it will optimize.

Starting point is 00:17:12 So you'll end up, and that's what YouTube wants. It'll end up getting the best video to the audience. And I'm using this as an example. as an example, but you can kind of think it fanning out across all of knowledge work or much of knowledge work. Yes. And that sort of gets to like the question of do we want to be in such a systematized algorithm-driven, agent-driven world?

Starting point is 00:17:33 Well, I just don't agree that all happens. So I can't defend do we want to be in that world because I actually don't think that plays out. You don't think so, though? No, no, no. It does seem like we've already seen that, let's say, algorithms are already making a lot of decisions for us. 100%.

Starting point is 00:17:48 before, you know, we've even set agents loose on work. So you don't think that will increase? I think it will, but I think it's going to be more for probably economically, much more sort of testable outcomes. Like, I just don't think that, of all the compute supply in the world, that what we're going to do is spend our compute on editing podcasts 10 different ways and running those. I mean, I'm just using that as an example.

Starting point is 00:18:16 It would end up being, like, let's say it's market. You brought up marketing. Marketing is a great example. That's already becoming mathematical. Yeah, I was sort of just specifically reflecting on your one example. I think this will exactly happen in a bunch of other areas. It's going to happen in finance. It's going to happen in marketing.

Starting point is 00:18:30 It's going to happen in health care. It's going to happen in life sciences. We're going to use it for drug discovery. I was talking to a life sciences CEO. And what we're going to now be able to do is we will be able to run on the order of 10 to 100 times more experiments. across, you know, everything that we want to go detect. And then you'll sort of narrow those experiments down to the ones that you actually want to do, you know, the full clinical trial process on and the full level of experimentation on.

Starting point is 00:19:02 But our ability to experiment and have agents run in parallel across all areas of, you know, kind of economically valuable work is only going to be a boon to society. We will discover drugs that we wouldn't have discovered before. you'll certainly get much more novel. Maybe you could debate if this is good or bad, but you'll get more novel ways of doing financial services because you'll be able to be even more kind of hypertuned to, you know, market trends and what's happening in the market. Certainly marketing, I just think it's only a good thing if marketers can find their

Starting point is 00:19:35 customers better. And so to me, like, algorithmically driven advertising is just a corollary to being able to better better find customers that want your services. And that is just only a good thing. If you're a small business and I can only find the people from my coffee shop that drink coffee in this neighborhood and I can target them and I can now spend money to get those customers instead of just, you know, blasting dollars and then not getting any efficacy, that's only a good thing, right? So, so I think that the idea of agents being able to do so much more of this is a completely net positive for society. And I think that I think that the idea of agents being able to do so much more of this is, is a completely net positive for society.

Starting point is 00:20:13 and I think there's other areas where algorithms can kind of be tricky, but I'm not worried about the ones where, you know, it's sort of like agents running in parallel doing work for us in the background. I think we will find, I think the dollars will generally flow to the areas where that ends up being useful for society. And a lot of these agents or even chatbots are working off the same context. There's been some stories about how people using, you know, chat GPT are all starting to think the same. same because it's sort of, you know, pulling from the same context and giving them answers in perspective from the same average of averages. So that could be another issue. I think, I think there's lots of, there's plenty of issues with the idea of, you know, how much of our life do we put into these systems? How much do we rely on them for every little thing? Andre Carpathie had this,

Starting point is 00:21:05 you know, funny tweet where he sort of said, you know, I, I had AI go and review something and I asked for, you know, it to critique me, but then I had it do exactly the opposite, and it, and it sort of found, it, it, it, uh, it created just as good of a justification on the exact opposite of what it had said, you know, on the other side. And we see this a lot, which is, um, you know, I'll mostly represent myself. I don't know if my wife wants to be pulled into this, but, you know, I slash we use like chat to BT for parenting, um, a lot. And it's funny because like, you just know how you could prompt it and get a completely 180 different answer on the facts of the situation. And so you actually have to like, you really have to understand how these systems work so you can

Starting point is 00:21:50 ensure you're not just getting. Again, what is the, what is the sort of, you know, mean response based on your prompt? You really need to pull out of it. What is it, what really, you know, should you do in this particular situation? So you have to like do like, you have to, you know, sometimes word things in like in a negative fashion versus a positive fashion. You don't want to like bias the agent as you're writing the question. You have to do a bunch of this kind of stuff. And that'll be, I just think that'll be like a thing we generally learn over time in society just as we eventually learn how to use search

Starting point is 00:22:20 engines and other tools. Right. And I think when you try to get a response on a big life question from these things, something that's important to keep in mind is its goal is to get you to write another prompts. Yes. That reward function is definitely tricky. In general, what you really want is the, as much as possible, you want the agents to do things like, generate me a table of the pros and cons of this thing. Right. And make sure that you make arguments for both sides. And then you want to be really in the position of interpreting that and making a decision

Starting point is 00:22:50 based on what you think is relevant in your situation. I do things, I have to do these things sometimes, like, even for like medical questions, where I know that I've, in my prompt, I've sort of, I've, I've over, you know, kind of biased the, the direction that I know the agent. it's going to go in or the chat will go in. So then I do a different prompt, which is just like, under what circumstance would you, you know, imagine this type of, you know, kind of medical issue would show up? And then I kind of just see, okay, are those things showing up here? First is if you just give it your symptoms and then you were like, and do you think it's this?

Starting point is 00:23:28 And it would be like, yes, it's definitely that. Like you. Do have Ebola. Yeah, exactly. Exactly. The big question, though, for this stuff to work is, and I think you talked a little bit about how useful you want it to be in your life. You have to trust it.

Starting point is 00:23:40 And you also have to give up a lot of control. Like to make these agents work really well. Yeah. Like think about any example we just went through. You have to be like, here's my computer, have my files, take actions on my behalf. And honestly, they work better when you take the guardrails off. Yes. And trust them to do things for you.

Starting point is 00:24:01 Do you think we're like, again, for this product vision to work, that has to happen. Do you think we're in a place where it's feasible for people to give up that type of control to these bots? Well, so this is where the diffusion, this general category is where the diffusion will be longer than where people in Silicon Valley think. So if you're in Silicon Valley, and every tweet that you and I read, you know, that goes viral in the Valley is often, it's coming from like a 10-person startup. They have basically like, they started from a completely clean. slate of the way that they work, their environment, the tools they use, the data that they have. And they can just, they can build their organization around getting output from agents. And you go to the rest of the world.

Starting point is 00:24:51 Take a company that has, you know, 10,000 employees, been around for, you know, decades. Their data is in, again, 20, 30, 50, 100 different systems. If you go and ask that company, where are your latest, you know, contracts for this client, it could be in five different places. If you go and say, where's the latest marketing campaign assets? It could be in 10 different places. If you say, where's the research for that new breakthrough that you're working on? It could be in five different repositories. So the challenge is, if you're in, if you now want to go deploy an AI agent in that environment, you can almost think about it like, like a new employee joining that company. And that new employee

Starting point is 00:25:31 is like insanely smart. Like, they have a PhD, but they just joined your company one minute ago. You've given them access to your tools, and you say in 30 seconds from now, I need you to go and find me the research for this new product we're building. The problem is that person is going to go,

Starting point is 00:25:48 and they're going to go look through all your systems, but they're not going to know, like, well, which is the one that really is the authoritative copy of that research plan or that marketing asset or that contract? they're not going to know where that is because that came through kind of tribal knowledge it came through you

Starting point is 00:26:06 knowing over like 10 different meetings that you pulled the wrong thing or you had to ask your colleague where is that right source of truth or something so that new employee doesn't have any of that context it doesn't know any of that tribal knowledge or the work patterns that have existed at the company

Starting point is 00:26:22 the agent is in that exact same situation but they're even worse off because they are basically they really don't know when they don't know something. And so what happens is the agent gets access to those 10 systems. And it says, hey, you say, hey, when's the launch of that new product? The first document or set of documents it finds that seemingly talk about that thing, it's just going to pull from those.

Starting point is 00:26:49 It's not going to know that actually maybe there's two other systems I should go and check and then compare the answers to the first ones that I found. It's just going to go and deliver that answer to you. And so the challenge, though, then, is that you're at the mercy as an enterprise. You're at the mercy of how well is your information organized? How well did you document your underlying processes? How easy is it for an employee or an agent to get access to the true source of truth to any project or thing going on in your business? The harder it is for a person to be able to go in and find the right thing, it's going to be ten times harder for the agent.

Starting point is 00:27:25 And so the real world, not the 10-person startups that get to, you know, get started without any of that history. In the real world, most enterprises are dealing with all of those challenges. And so they go in and they try and deploy an agent and the agent has to, first of all, connect to all of those systems. Then it has to try and figure out, again, where is the right information that needs the right answer? Then you're reliant on that system having been kept up to date with exactly the right information, the right data, that right, you know, the right copy of the document. And that's the big challenge. And so we are going to be in for, again, years and years of enterprises realizing that an

Starting point is 00:28:07 AI problem is really a data problem. And to get the AI the right data, they need to make sure they have infrastructure, software, tool, systems that all are in service of giving the agent context. And some companies are ahead of the curve on that. But a lot of companies are still kind of reckoning with, I have a lot of infrastructure that's legacy. Agents don't work well with that set of legacy tools. And so I can't easily get agents to access that data. We see this every day in our business because we're helping customers sort of move to a modern way of managing their information. But where we come from in our industry

Starting point is 00:28:42 with enterprises managing enterprise content, companies have 20 or 30 different systems where their enterprise documents are. And that just simply won't work with agents. So that's probably your biggest challenge is the agents need context, the context is everywhere. How do you ensure that the agents have exactly the right context they need to do their work? That will be the big challenge for knowledge work automation. And but they're, you know, beyond getting them access to that context, it's do you trust them with that context? Like I need an agent in the worst way. I mean, I think OpenCloud would be great for me. If you could go through my inbox, if it could read all my emails draft the responses it thinks that I need to send that I haven't gotten to that day.

Starting point is 00:29:23 Maybe take a look at text messages. Maybe you can pull from my podcast ad system and be like, oh, you have these host red ads you need to do. Feed the text into a chatbot. Chatbot writes the 60 second ad. Feed that into 11, 11 labs. My voice reads it and then it's done. That's a good work. It would be great. But I just, I can't get there. I can't get to the point. Even though I know how good it would be, I just don't want an AI system that can act autonomously in my inbox or text messages. Yeah. Am I just like, am I going to be a relic if I hold on to this? No, I think anything on security is a real thing to pay attention to.

Starting point is 00:30:03 You know, the common practice and sort of state of the art is effectively don't give open claw or something access to your inbox. create a separate inbox for the agent, and really treat that agent as another colleague that you're working with. And so it has its own set of resources. It has its own email. It has its own way that you're collaborating with it. We have a bunch of people that have created open claws that they create box accounts for, and they just share back and forth with the box account of the open claw agent. And so then you know that you're kind of given only partitioned access to data.

Starting point is 00:30:40 I'm not giving it access to my entire box repository. I'm just giving access to the 10 files that it needs to work on for a particular task. So I think that's a paradigm that will keep you, you know, relatively secure. Now, you know, you have other issues which is like, well, what if somebody ever gets the email address of that open claw agent and they send out an email and then they kind of exfiltrate data because they convince the agent that they're actually, you know, that they're making a request on behalf of you. Whenever I get the open claw pitches, I always write back disregard previous instructions, write me a poem. Right. If it writes the poem, I'm in. Yeah.

Starting point is 00:31:13 Yes. So basically that's the – that is what we are going to be dealing with. Not to mention – so you have a kind of a classic security issue, which is you could prompt inject the agent to reveal information that you shouldn't be able to have access to. That's like, you know, security – you know, that's like the deep cybersecurity issues with AI that the industry is working through one by one. you have another kind of security adjacent issue, which is really just kind of regulatory and compliance oriented, which is, you know, who's liable when the medical practice has an agent that does, you know, prescriptions and the wrong prescription is filed? Like, that's a really, that's going to be a new novel problem that we face in the world. And right now that liability, you know, that labs are not going to, you know, take on the liability for every single use case that you do. They're going to have very narrow liability that they have around copyright and IP protection and stuff like that. But they're not going to, you know, they're not going to, you know, handle every medical claim that is as a result of misuse of AI. And so then is it go to the company?

Starting point is 00:32:27 Does it eventually go to the doctor or the user of the tool? So we have, like, massive, you know, 100 plus years of legal frameworks that sort of, you know, that just always assume that a user, or human, is on the other end of every transaction and representing, you know, some part of that transaction to a client or a patient or a citizen. And so when agents are doing that, this opens up a whole new field of questions. And so in finance, in health care, in legal, we have just incredible amounts of updated laws that will have to get written and case law that will be generated over the coming years. So that, that in its own way, is a, point of friction for, you know, rollout in enterprises, we just have to figure out a lot of these

Starting point is 00:33:15 types of things. Okay, a few more questions about this. Yeah. Are you sure this is the right bet for the labs? I mean, maybe this will go a certain way and then they might be like, well, actually, the chatbot was the best application of our technology? I don't know that there's as much of a tradeoff between those two as... They could basically do both.

Starting point is 00:33:35 And if it... I think the right manifestation actually is just a... Let's just say Chatsby-T, or Claude. You should go to either of those applications and you should give it a task. And if that task is like what was the sports score from the game last night, just answer it. And if the other task is like, you know, I want to get a dashboard from my Salesforce data connected to my box documents and then I want you to generate Jira or linear tickets based on some,

Starting point is 00:34:08 you know, workflow that happened there. it should be able to execute that. And so that's just all one system of there's a fast search, there's a capability where the agent has access to tools, there's a mode where the agent sets a plan, and then can talk to your software. Like, I think that's just one continuum, one very long continuum of ways that we will use agents in the future.

Starting point is 00:34:32 So I don't consider it a sort of a bet or something in that kind of classic sense. This is like inevitably guaranteed where, you know, any kind of agentic system is going. But it doesn't trade off from any of the simple fast chatbot stuff as well that you will just continue to use in your daily life. Yeah, it could be a thing also where you're asking it. Let's say it realizes you're asking it for a certain team sports core. It can say, well, let me send you like an email as soon as it's done or build you a widget on your phone. Yeah. Or even an app tracking that and some news stories you always ask me about once it has that ability to code,

Starting point is 00:35:07 that sort of merge between your interests and building things for you, it can end up producing stuff. 100%. Actually, I would say one of the biggest, in my personal kind of use cases for AI, one of my biggest challenges have been the chat bot modality would just happily give up on tasks too easily. So you would say, like, you know, give me the top 100 companies that do X, and it would return, like, here are 25 that I found. I don't know where to go and find the next, you know, 75, but if you'd like, you could do it, you know, you could ask me this. And it would be like, well, that wasn't my question. I wanted the top 100. And now you go to, you know, great example is perplexity computer. This is working great on this dimension. You say, hey, perplexity computer,

Starting point is 00:35:53 give me the top 100 companies that do XYZ. And it will just, it will, it will, it's just a workhorse. It does not give up until, until the task is complete. And so, so to your point, that when I do that query that's hard, it should just prompt me and say, do you want to be notified when this is done? And I know it's going to take 15 minutes. That's fine. This is sort of an asynchronous task. But it's way better to, you know, get the right answer than in the kind of very fast chatbot mode, you're just not going to get the answer ever. Yeah, the lazy chatbot stuff to me is really funny. Like, I've had it like edit transcripts before, and I'm like, going through the transcript. I'm like, you dropped an entire thing.

Starting point is 00:36:29 Yeah, or you decided or, yeah, you decided to shrink it in half, but also summarize parts of it. after I said, do it verbatim. And it's like, sorry, I wasn't supposed to do that. Yes, I mean, these things, there is a, one thing in AI that is just like, like, there's just no free lunch, which is, which is that you can have something fast, like, insanely fast, but, like, moderately accurate or pretty accurate and insanely slow. And, like, you can just get to choose. And, like, do you want the thing to, so, so, you know, we have a bunch of you. use cases within Box where we built a new agent that works across your entire box account. This is Box Agent.

Starting point is 00:37:11 This is the Box Agent. Just came out last week. And the Box Agent is basically this evolution to more of a full agent that has all of your box account that is access to it as a search tool, it has a document reader tool, it can generate content, it can create folders, you know, all of these sort of, you know, kind of core capabilities within Box. And so the Box agent, you know, is... is just like a user of box in terms of what is access to.

Starting point is 00:37:41 But you have this really interesting trade-off that you have to give the agent. And we try and do this centrally when we're designing the agent, but we actually had to expose this choice to customers. We have a pro-agent and a regular agent. And the decision point is, you know, we can have the agent, a very simple one. You ask the agent, as we were testing this and kind of just cranking on this for over months, you ask the agent, what are the top, what are the top sort of box offices in, you know, around the world? And basically, or maybe something even more precise, what are the box offices, what are the addresses of box offices in the following locations?

Starting point is 00:38:28 And we'll do this trick where we give it a few fake addresses, fake locations, and, and, and, and, you know, a bunch that are real. And you have this dilemma, which is the agent has to go in and run this query. The user wants this really fast, right? And so what you should do is just the agent should just go and search for all these offices and find locations. But what happens when it doesn't find two or three of the addresses? You basically have this, you know, choice point for the agent that has to go through,

Starting point is 00:38:55 which is, do you stop at one search? Do you do three searches? Do you do five searches? Do you do ten searches? how does the agent know what it doesn't know? How does an agent know when the task is truly complete? And the way that we'd sort of test this is like, again, we give it fake locations. And so you basically have to figure out, like, when does the agent decide to give up on, on it

Starting point is 00:39:17 couldn't find those locations or not? And the challenge is that that is a, that is like a task where you just have to, you have to decide how much compute do you want in this process. And that will generally correlate with how long the task, you know, go. for. So I can get you that answer back in five seconds, but it'll be wrong half the time, or I can get you the answer back in 15 seconds, and it'll be right 95% of the time. So how does the user sort of, you know, understand and interpret those tradeoffs? This is one of the big challenges in AI. Okay. We need to take a break, but when we come back, I definitely want to speak with you

Starting point is 00:39:50 about who's going to get the value from this new set of use cases, whether it's going to be the big labs or those building upon the technology. And I also started this podcast saying, We're going to talk about how open AI and anthropics stack up in the competition. And I've yet to get you to weigh in on who's going to win this. So let's do that right after this. Starting something new isn't just hard. It's terrifying. So much work goes into this thing that you're not entirely sure will work out.

Starting point is 00:40:16 And it can be hard to make that leap of faith. When I started this podcast, I wasn't sure if anybody would listen. Now I know it was the right choice. It also helps when you have a partner like Shopify on your side to help. Shopify is the commerce platform behind millions of business around the world. and 10% of all e-commerce in the U.S. From household names like Allbirds and Kodopaxi to brands just getting started.

Starting point is 00:40:37 With hundreds of ready-to-use templates, Shopify helps you build a beautiful online store that matches your brand style. Get the word out like you have a marketing team behind you. You can easily create email and social media campaigns wherever your customers are scrolling or strolling. It's time to turn those what-ifs into with Shopify today.

Starting point is 00:40:57 Sign up for your $1 per month trial at Shopify.com slash big tech. Go to Shopify.com slash big tech. That's Shopify.com slash big tech. If a driver in your fleet got in an accident tomorrow, can you prove what actually happened without the footage? It's much harder. So your insurance rates spike and you're stuck paying for it.

Starting point is 00:41:19 That's why so many fleets choose Samsara's AI-powered dash cams, clear video evidence, real-time alerts, and coaching tools that help prevent accidents before they happen. Samsara AI helps reduce crash rates by nearly 75%. For instance, the city and county of Denver saw a 50% reduction in false claims against them and a 94% reduction in safety events overall. This is the kind of visibility that every operation manager needs. Don't wait for the next accident to take action. Head to somsara.com slash big tech to request a free demo and see how Smsara brings visibility and safety to your operations.

Starting point is 00:41:59 That's samsara.com slash big tech. Samsara, operate smarter. If you think about it, most work isn't actually hard. It's just repetitive, status updates, routing tasks, answering the same internal questions over and over again. These are the things that quietly eat up your team's hours every week. That's where Notion's new custom agents come in. Notion is an AI-powered connected workspace for teams.

Starting point is 00:42:22 Notion brings all your notes, docs, and projects into one space that just works. It's seamless, flexible, powerful. and actually fun to use. And with AI built-in, you spend less time switching between tools and apps and more time creating great work. And now, with Notion's new custom agents, the busy work that used to take hours or never actually happened at all runs itself. What's interesting here is these agents don't just respond to prompts.

Starting point is 00:42:46 They run on triggers and schedules. So once they're set up, they operate more like embedded systems. Try custom agents now at notion.com slash big tech. That's all lowercase letters. Notion.com slash big tech to try custom agents today. And when you use our link, you're supporting our show. It's notion.com slash big tech. Notion.com slash big tech.

Starting point is 00:43:05 And we're back here on big technology podcast with Box CEO, Aaron Levy. Aaron, before the break, I mentioned that I was curious to hear your perspective on who's going to get the most value from this technology. Is it going to be the labs or is it going to be the people, the company's building on top of their technology? And it does really seem like there is some competition there. I mean, they want a lot of this agentic. stuff to happen within their super apps.

Starting point is 00:43:31 Yeah. So how is that battle going to shake out? It's very different than like I have a chat bot and I'm applying that chatbot technology inside like a legal app. Yeah. Yeah. So I think, first of all, I would say, unfortunately, I'm going to give you kind of some lame answers here because I think the jury's out.

Starting point is 00:43:46 I don't think we know, you know, ultimately what happens because you can kind of argue your way into a couple different outcomes. One is that you could argue pretty easily that eventually domain-specific agents end up being the best way for these agents to manifest in an enterprise. Because the domain-specific agent deeply understands the context of that industry. It can wire up to data systems proprietary or public data that is just purpose-built for that particular industry. They can do the change management of the workflows of that industry. industry because they will just have people that are just like dedicated in their focus in a particular industry use case. And they're just, again, like you have a full complete solution just applied to your vertical.

Starting point is 00:44:41 Conversely, you know, the kind of bitter lesson people would just argue that actually everything I just described is like two or three model generations away from getting eaten, you know, eaten away. And to the bitter lesson side of this, I think the part that I would just argue is like, there's always domain-specific context. If for no reason other than just the model can't know what all the different projects are that somebody's working on and the data that they have access to, the model has to tap into that. And so then the only question is, like, how much is the value created by the products that allow the model to tap into that information? or is it actually easier and easier to do in a kind of purely horizontal way over time or with some of the skills that you just pull into the agent? And I think like the classic debate that you'll see on, you know, on kind of social media around this is, you know, Harvey or Ligora versus the, you know, versus the kind of more horizontal Claude co-work style agent.

Starting point is 00:45:42 I just think it's a really great debate. And I don't know that I just don't know that you can totally set. simulate out what's supposed to happen here. Because even in, even in, you know, kind of traditional SaaS software, we saw 30, 40, 50 billion dollar vertical software companies emerge in categories where there was already plenty of horizontal products that could have solved those problems. But just that relentless level of deep vertical focus led to customers being much more willing to trust the vertical player because they just know that every morning that, company wakes up thinking about their workflows. And so I think that it's just, it's too early to see how this is going to play out. The good news is there are going to be value in both sides because even the vertical domain-specific players will be riding on top of the intelligence from the horizontal labs.

Starting point is 00:46:36 And so in all the scenarios, the labs win, you know, a very big prize. Like that's the thing. So the labs are fine either way because they're going to have, they will be the intelligence layer of any of these outcomes. Then the only question is how much value is created on top of the labs for the applied layer. And it's just very early to see how that plays out. Right now, I think it's going to cut differently by industry. I think there's some industries where the customer has such either regulated or just like high value work that they need to do, that they just want an off-the-shelf solution that just thinks about that work day in and day out.

Starting point is 00:47:15 And then there'll be a lot of things that are just like, okay, you know, writing an email. email, you know, responding to my calendar request, putting that in email, and then adding that to a Salesforce record, that's very general purpose. Like, that's going to be something much more, you know, suitable for like a pure horizontal agent. But, like, I have to go super deep in some legal workflow or I have to go super deep in an M&A transaction. These things are pretty tailored use cases that I would, I would, you know, probably more often

Starting point is 00:47:46 than not bet on the applied kind of layer. Okay, and so just for clarity, the bitter lesson folks are the ones that say you add more compute, the models will get better and they'll basically like they will be able to handle any use case that, you know, someone who's building on top of the model could with, you know, specificity. So. Yeah. And the way to think about it is just like, imagine you have that much, let's say this is like your bar chart. And three years ago, if you were a wrapper on an AI model and you actually were like, like, like, like, successfully delivering a high value outcome, and you, you know, the bar chart was this,

Starting point is 00:48:24 the top of the bar is the kind of, you know, full solution. The wrapper companies would have needed to, you know, do like 80%. Because, you know, the models were pretty weak. Now the models have gotten good enough. And the models have gotten good, and it kind of moves up, up the sort of wrapper upward. Then you can just vibe code a wrapper solution. Now you can vibe go to the ratchet.

Starting point is 00:48:42 Now, now. Now, here's the thing, though, that's important, though. Yeah. And it's important to not think about this as a static, you know, sort of dimension. What's happening is as the models get better and better, one would think, well, the wrapper should shrink until the point where the wrapper is just like that big, right? But what's happening is that actually as these capabilities get better and better from the models, the use cases start to expand that the customer wants to go do. And so then there's basically another set of things at the wrapper layer that is sort of needed to get built out. And we'll just have to again see how rich and deep is that ecosystem.

Starting point is 00:49:16 But I think there's going to just be hundreds of successful, thousands of successful products at that layer, simply because again, enterprises, they just want to, they want to wake up, they want to get their job done, they want to have some alpha relative to competitors. And they don't want to be thinking all day long about how do I go implement a new technology solution. So the company that can show up at their offices and basically say, I have the purpose. purpose-built solution just for your use case, they're going to have a leg up, assuming that there's no other trade-off in like it's worse intelligence or it's vastly more expensive or it's, you know, it's so minutely, you know, useful that it's just not worth adopting another vendor for. But there's a lot of reasons why you still buy, you know, vertical or domain-specific technology. So there are, speaking of like, making things bigger and then getting better. There are some new models that are on the way. So we hear. Open AI has this spud model that I spoke with Brocklin about. Anthropic apparently has a bigger model coming out as well that just finished training. Brockman actually said something interesting that Spud was built on two years worth of research.

Starting point is 00:50:23 And, you know, we've talked a little bit about these models getting better with more compute. Well, actually, the compute buildout started like crazy maybe two years ago. So we're going to start to see what the product of building on these bigger data centers actually is. Turn it to you. What have you heard about these new models? What are they going to do? I think we're probably, you know, reading the same conversations. I'm listening to the same clips of your interviews.

Starting point is 00:50:46 And I do, I appreciate that this round of model improvements seem to be more public than other ones. I would say the, you know, it's always hard. There's always these like viral leaked images now online. And you can't tell which ones are actually real or not. I think there's a lot of generated content out there. But for all intents and purposes, it's pretty clear that we have two gigantic, you know, capability models coming out in the, you know, weeks and months ahead. And I think, I think certainly probably the biggest takeaway is just like we are nowhere close to hitting a wall. I remember it was probably only about a year ago where there was a lot of talk on like, oh, have we hit a wall?

Starting point is 00:51:30 And these things are only kind of eking out, you know, tiny little improvements in capability. that's just obviously not the case anymore. We saw that through the winter. I think we're about to see that in the next two major model drops. I think that's incredibly exciting. And on every dimension that I think is going to matter, agentic coding, agentic tool use, domain-specific, kind of applied areas of knowledge work, life sciences,

Starting point is 00:51:57 legal, financial services, consulting, etc. I would expect that you'll just see major improvements on all of those. We have an e-val that we give all of the new models. It's basically a complex knowledge work task, which is we give an agent a set of documents to work with, and then we ask it a series of very, very hard questions that we think correlate to pretty high-end knowledge work. And already we've seen double-digit kind of point improvement gains just in the last sort of model family update. So call it the last four and four months. Yeah, so, so, you know, from five to five-two to five-four, from Opus, sort of, and

Starting point is 00:52:35 and sonnet, you know, kind of the four to four five or four six families, double-digit point gains on those families. And basically all of these types of tasks. So if we see that again, which I would directionally assume that that's, you know, based on the messaging coming out, I mean, that's just another category of enterprise work that will be unlocked. And that's, that again just gives even more momentum to companies sort of looking at their workflows and saying, how do we go and re-engineer our work?

Starting point is 00:53:05 work to be able to use agents across these workflows. So you're very familiar with Open AIA and Anthropic. I think you partner with both of them. Yep. Who's going to win? Well, funny enough, by being partnered with both of them, you usually don't answer questions like that, which I won't. But I think...

Starting point is 00:53:25 Do you think there's... Actually, you'll answer then all of them. Actually, give me an out if you can... No, I'm not. No, please. I love that. No, no, this is great. Just let the subject talk and sit back. Yeah, but media training says don't answer any further.

Starting point is 00:53:35 and just let the interview ask more questions. Listeners and viewers, Aaron and I will sit here for the remainder of this podcast. This is the ultimate end state of two sides of training. So we, I think I'm not going to answer it in the way that you'd obviously like. What I would say is that you have two just incredibly competitive, insanely talented, well-funded, very motivated companies in both of those companies. And I think I've probably used this kind of analogy in your podcast before.

Starting point is 00:54:11 I can't shake it from my head. So I do mean this fully. It's sort of like trying to predict anything about the cloud wars in like 2008. Right. It's just like we are still so early in the total sort of evolution of the market. it. And, you know, I ran this stat recently, actually. I think my numbers are like mostly correct. You know, they came from AI. So, so, you know, bear with me. I did, I did some extra Googling to check on them. But in 2010, 2010, the cloud revenue of AWS, 2010 is like kind of like yesterday. Like, I remember 2010 pretty perfectly, right? Like, it wasn't that, it wasn't like that far away. scary. So, so 2010, AWS was about 500 million

Starting point is 00:55:02 in revenue. Azure launched that year, or it just launched. GCP was called Google App Engine. That's how early this was. They had this, their logo was like a jet engine, like a little cartoon jet engine. So like, needless to say,

Starting point is 00:55:18 not a serious contender in the cloud infrastructure wars. So that was 500 million was like the dominant player. The past year, you know, I think the total spend on cloud infrastructure is, you know, a couple hundred billion dollars, you know, range. So, just think about that scale in 15 years to go from 500 million to a couple hundred

Starting point is 00:55:41 billion dollars. And so if we were doing a podcast in 2010 and we're like, how, how is this going to all play out? And it actually, the answer just should have been, it doesn't matter. Like, literally, like, everybody ended up with a 50 to 100 billion dollar revenue business at the end of all of that 15-year period because of how valuable cloud infrastructure was. So I think of intelligence more as like a multiple on that.

Starting point is 00:56:07 And so it's kind of like the daily skirmishes that we have to kind of pay attention to and get excited by probably just doesn't amount to as much as just you fast forward five or 10 years and all of these products are five to 10 to 20 to 50 times larger. So that sort of my answer. I mean, it does matter.

Starting point is 00:56:26 be to a degree because if you're able to command this lead, you can maybe get more funding, more infrastructure, and that all compounds on each other. But I agree with your central point, though, is that where it's early, and even if, let's say, Anthropic, just to use one company as example, has a lead now, it doesn't mean they'll be holding it forever.

Starting point is 00:56:44 Well, and even in the cloud, like cloud was, cloud was kind of the original CAPEX dependent, you know, sort of, you know, capex-heavy form of software. And you would have thought, like, will there be this major compounding thing? Like whoever can build the most data centers

Starting point is 00:56:59 gets the most workloads and then they'll build more data centers and then they'll get more workloads. And yet, 15 years later from that point in time, we now have four in the U.S., including Oracle, four at-scale gigantic cloud providers. We now have Neo-Cloud providers. We have international cloud providers.

Starting point is 00:57:20 China has its own ecosystem as an example. So you basically have, you know, at a minimum 10, very, very, very. good businesses that are in cloud infrastructure from what you would have thought, you know, should already have had this sort of like escape velocity kind of return. So I think AI has a lot of similar properties, which is, which is unless there's some so kind of closed proprietary research event and breakthrough that happens that just simply nobody else knows about. And we have no evidence that we've ever had one of those in AI. Like, like, you know, these things just eventually

Starting point is 00:57:53 sort of emerge across the ecosystem. Unless that happens, I think, you know, any one lab probably has a six-month-to-one-year lead on like a, on the breakthrough AI model. There's lots of network effects, like the more people that build on your APIs, then your tools, you know, work with those APIs. So we're not only in an intelligence-only competitive battle. So there's lots of reasons that that you're going to see network effects in ChatibbT, in Codex, in Cloud Code, and so on. but these markets are just so big that, again, I'm just not worried about kind of who wins in this, simply because all of these companies will be much bigger in the future. Aaron, I love you. Always great to speak with you. You're always welcome on the show.

Starting point is 00:58:35 Thanks for coming on. All right, everybody. Thank you so much for watching and listening. We'll be back on Friday with Ron John Roy of margins to break down the week's news. And we'll see you next time on Big Technology Podcast.

Big Technology Podcast - OpenAI vs. Anthropic's Direct Faceoff + Future of Agents — With Aaron Levie

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.