The Infra Pod - The next cloud to overtake AWS are AI sandboxes?! (Chat with Ivan from Daytona)

Starting point is 00:00:00 Welcome to the InfraPod. It's Tim from Essence and, yeah, let's go. Hey, this is Ian Livingston, founder of Keycard, the greatest place in identity for your agent. I'm super excited today to be joined by not only Tim, who's awesome, but Ivan, Berzin, who's the CEO and co-founder of Daytona. Ivan, what in the world is Daytona? Daytona is, as you said earlier, we like to say Agent Native runtimes, but people just say, say, oh, you're the sandbox company. So, yeah, I guess we're the sandbox company. And why, I mean, I think there's two primary questions. I have. One is, like, what in the

Starting point is 00:00:39 world got you to start Daytona? Like, I asked that question to literally every entrepreneur. Anyone's starting to do anything. It's like, why would you do this? This is, like, literally the hardest possible thing you can do is try to go build something from scratch. And then two is how did you end up as the sandbox company? Yeah. Sure. I mean, I think it's a lot of things we did in our life. So founding team of Daytona worked together for the last 20-ish years. So we worked together for a long time. And so I don't know if I actually told you this story, Tim or Ian, but like our first company was actually building out server rooms like in the early 2000s. So we stacked servers, you know, HP, I, Dell, IBM, then doing like VMware stuff, HyperV,

Starting point is 00:01:20 Cisco routers and switches. We did all these things like all back in the day. And then we ended up selling that company when we understood, like doing this as a service doesn't make sense. We didn't have capital to do it as a, you know, do a core weave or whatever it was at the time, but we did that a service. The next company was Code Anywhere. So the first browser-based ID in 2009, you know about this. And that was sort of like the evolution. So we knew how to do this stuff in real life.

Starting point is 00:01:42 And then when we had to create the ID in 2009, we had to create everything from scratch. And that had some success. Did a conference company, then ran the developer experience for a large communication platform, again, infrastructure company. and then started Daytona on a slightly different story. So the idea originally for Daytona was to orchestrate dev environments for human engineers, right? And now what we're doing is similar. So we call them sandboxes generally in the industry, but it is a VM for agents because agents actually,

Starting point is 00:02:12 even though they run in our computers, don't have access to computers. And so agents to be able to do complex things, they just like us need computers to, you know, execute code, to run code, to use computer use, to run a browser, to whatever may be. and so end of last year actually beginning of this year we actually decided to focus specifically on agents instead of humans because it's just such a huge market it's a new market it's a new primitive no one owns the space and basically the number of agents in the world will be the number of humans to the power of n and i don't know what number that is but it's going to be drastically bigger than anything else and so just like you which is similar but a different space like it's up for grabs and having the experience the we did of like everything from stacking servers literally to running the very first, you know, ID that had its own runtime for humans to orchestrating dev environments for humans brought us to this place. And so sort of feel that we're uniquely positioned to build out this product.

Starting point is 00:03:10 Incredible. And I'm curious, like, how's that up and going? Like today, what can you do with Daytona? What are the common use cases? What are the things you see that people actually do in a success with? Where is Daytona as a technology and as a product today and what are people using it for? Sure. So the thing that's unique about Daytona, so just to be clear, we launched it, we started working on it six months ago today, like something like that. And we launched it three months ago and a little bit more than that. So quite recent, what we did do for Daytona, when we were building it out, we were thinking about what agents would need in the future. And we came from this idea running VMs for humans with the idea that these machines should be long running and stateful. And why say that? that is like when we work on our machines as humans, that you guys probably work in a Mac or a PC, whatever, like it essentially runs until you turn it off. Like you don't expect it to preemptively die.

Starting point is 00:04:05 Like you don't expect that. You want it to run, right? Or when you close the lid of your Mac, you come back, it's in the same state. You accept that to be the fact. Now, most tooling for agents today are built off of serverless technologies, which were built for something completely different.

Starting point is 00:04:21 They were built for app deployment, right? And the vast majority of them, They're really fast. They have API endpoints to manage them, which is great for agents. But a lot of them run for 15 minutes or 45 minutes or an hour or something like that. And also a lot of them don't have state attached to them. That's okay if an agent needs to run a piece of code like arbitrarily and get a result, that's fine. But if your agent is trying to do something complex like a human, so if you're giving it a task to, you know, search the web, analyze data, go like scrape LinkedIn, whatever you're trying to achieve, right?

Starting point is 00:04:51 it probably would run for a much longer time. And so to be able to do something that is fast like serverless, that's stateful and long running, is actually impossible with any technology that exists today. And so we actually built that all the way from scratch. And so we don't use any orchestration platforms that you would think of. We run everything on bare metal because that's the only way we can get to the level of virtualization that we need. And so we built a lot of things like really, really deep,

Starting point is 00:05:16 knowing what we had known from our previous companies, right? And we were able to build this very unique product, which people seem to be, I mean, enjoying and using, it's been growing insanely well since we launched. And what, like, you know, at one point, you were building sort of this cloud, runtime, repeatable environment thing, you made the switch to building, you know, sandboxes for agents.

Starting point is 00:05:39 A lot of the same underlying story there, like in concept. I'm curious of what's the delta in terms of requirements between those two use cases, like from a technology perspective? What makes it that you need something specific and new versus something that you could just pull off the shelf and use Kubernetes or some other orchestrations set up. Yeah, people using Kubernetes for this are the number one people switching over to us, to be honest.

Starting point is 00:06:01 So, yeah, there's a lot of divs. We originally thought that we could do both at the same time, so like service humans and agents. And then you figure out that agents need specific things. And so one, if you use something like Kubernetes, it's really hard to get that insane. So Kubernetes with state is really hard to get at high speeds. Like you can't get it really fast.

Starting point is 00:06:19 And if your agent is running in interactive mode, and basically most of our users, to be clear, are companies or startups or SaaS, you can think of them, that are agents. So we're basically B2B2C. So we're selling to someone that is built an agent that then serves an end human, right? And in a lot of the use cases, the human expects it to be instantaneous. And so you want that thing to run really fast versus what we were doing for humans. It didn't really matter that lasted like 30 seconds, spin up time or 45 or two minutes. It's like, ideally you want it faster, but it wasn't like the end of the world. So that's one.

Starting point is 00:06:53 The second thing is that humans do things in a series. Basically, you're running on one machine at once. You're trying to solve one problem. If it doesn't work, you roll back and then do it again. You roll back and do it again, right? You can do it in a series. We have a lot of people using agents in the same way. They'll spin up multiple agents and they'll run in a series.

Starting point is 00:07:10 But we also have agents that run things in parallel. So the agent can say, oh, there's a fork in the road, or there's five outcomes I can do or whatever it may be. And now what I need is it essentially a fork of the state of this machine, the sandbox, right? The agent can now say, oh, here's my machine. I'm going to spin out like five forks, try all five outcomes. And then if one is good, you take that. If two of them are good, I take two of them and I can fork it again.

Starting point is 00:07:35 Sort of like, you know, the multiverse in all the Marvel movies, sort of. Like you can get that. Humans have zero need for this, right? Like zero absolute machines do or agents do. The third thing, which is actually, I think, the most important of all these things. is what is kind of coined the term agent experience, which is the tools and the capabilities an agent has to execute on its command.

Starting point is 00:07:58 And the way we think about it is, what does an agent need to autonomously achieve its task? And at the beginning, people think, oh, I just need a runtime. I'll give it a Docker container or something like that, right? And that works, again, for a very simple use case. But as we're working with new startups, we find new things that their agent gets essentially stuck on.

Starting point is 00:08:16 So it's like, it can't get it done unless it calls back to a human. An example is, so we use Docker natively as a template in Daytona and it can use any Docker image on any container registry. That's all good. And obviously, the agent can, if it has an image running, it can install new dependencies on its own. And that's fine. But if it has to spin up 100 machines, does it really want to update all the dependencies on all the 100 machines? That makes no sense. Like why doesn't just create a new like Docker image or it needs something completely new? Like how does it do that at scale? And so we created a declarative image builder so the agent can say oh all of these are like i can't use any of these i want a new one i

Starting point is 00:08:54 declare the uh the dependencies the envars the commands i want at creation time we built that on the fly and it opened it up right how does an agent for example you as a human you know you're working on docker and your machine and you can share using darker volumes literally disk across all these containers using an isolated sandbox you can't really do that and so how does an agent share data across multiple sandboxes and that's like not a trivial thing to do And so we created this thing called Daytona volumes, very original name, like super original. Basically, the agent can invoke one of these volumes that can be mounted on the fly to multiple of these machines and then use the data across the machines. And there's like a bunch of headless tools, like a headless ID terminal and a bunch of things that we've created specifically for the agent.

Starting point is 00:09:38 And so none of these things are needed for humans. Humans use it in a different way. So you don't need this volume product. You don't need the declarative image builder. You don't need paralyzation. And so there's a lot of things that you actually don't need. And as we started mapping out Daytona, originally the V2, let's call it, we started mapping out.

Starting point is 00:09:55 We saw that the roadmap really diverged with the roadmap for humans. And as we move forward, there's like so many things that we're adding around security, like because an agent can go rogue. And so how do you stop the agent from killing itself? How do you stop the agent from accessing addresses? We have like a layer seven firewall coming up right now. And like for humans, you don't need that. You have the firewall generally of the company where it's working.

Starting point is 00:10:17 But for this, it's specific, like, this agent can only access XY websites or IP addresses or whatnot because you don't want it to like screw up and do crazy things. And so the more you get into it, the more different things are actually needed for agents that every single day when we work with these new agent companies for finding new tools and needs for them to be able to do their job successfully. And so just like understand like even the motivation to even do this. Like, I feel like, because I know you guys, where you're doing your last direction, which is really about developer environments, right? Like, I can set up my own development environments. I can easily, like, able to provision them, even do staging environments, right? It was much more general purpose.

Starting point is 00:10:56 It wasn't for agents specific. So were you like, hey, man, everybody using my Daytona environments to run agents. Like, that's like such a big draw from everybody. Or you just spotted almost like a new opportunity right here. And I'm just curious, like, how did that shit? How did it happen? Yeah, it happened when we did a partnership with the company called Open Hands, then called Open Devon. So this was at the time when Devon was still closed to the public.

Starting point is 00:11:23 And then you had Open Devon, which the way it worked is you had to install it on your local machine. You had to spin up Docker containers and things like that. It wasn't super complex to set up, but it wasn't like trivial for like a what we call now a vibe coder to get up and running. And so we're like, oh, we have this dev environment automation platform. why don't we automate the spin-up of Open Devon? And so we did that and we launched it basically as a SaaS, not to make money just to show people like, oh, this is the thing we can do.

Starting point is 00:11:50 And then just people started knocking on our door, literally, oh, we're building this agent. And at this time, we didn't really understand the space. Like as an outside observer, yes, but not as an insider. And it's like all these people building agents are like calling us, oh, can you do the runtime for us? And to be clear, both Open AI and Anthropic added the environment slash runtime to the definition of an AI agent early in 2025.

Starting point is 00:12:15 And this is like end of 2024. It's like not even the foundation model companies have added that a runtime or machine or environment was needed for an agent to do its task. So it's like super early. And people just started asking us for this. And then we're like, oh, here's our like dev environment manager. You just use that. And then we just kept getting into issues where the agent couldn't do its job.

Starting point is 00:12:35 It's like not good enough. It would break. And obviously the context when there was smaller, agents were less sophisticated. But even today, even that they're better, it's super hard for them to get a lot of things done. And so how can you make it as easy as possible for the agents? And so when we did that and started getting the inbound, we started talking to all these people building agents and we're like, oh shit, this is a thing. No one really does this. Like no one does it well or no one owns the space.

Starting point is 00:12:57 And the market has only started. And so when we think about agent native companies, the classic is the mobile native is Uber, the cloud natives like Airbnb, whatever. Like what are the agent native companies? Like I don't really know. I don't really think they exist today. I mean, chat GPT, fine. Granola seems to be cool. But what are the really big successful Asian native companies?

Starting point is 00:13:20 I don't think we're here yet. And that to me is a signal, okay, the market still hasn't existed. And we're a net new primitive in that market. And so this is where we go. And so that was the, it was kind of by accident where we sort of fell into this and saw this huge opportunity. And we're like, oh, this is just where the world is going. It's similar to like, Ian and his new company is like, this is where.

Starting point is 00:13:40 where the world is going, right? And so this is an opportunity to be a key piece of that. And even when you think of the companies, like in the mobile space or in the cloud space, even if they're not in the Mac 7, they're still insanely successful and no one's removing them. They became the default winner of that product and or service or and or infrastructure, and they are here even to this day. So that was sort of the insights behind us going full on this. That's amazing. That's really amazing. And so you mentioned open hands, right, as like a first use case. So code obviously needs to run some places because they can do crazy stuff, right? You know?

Starting point is 00:14:14 But I feel like this word agents is so generic at this point. Like sometimes I don't even like know how to describe one, right? And then you say, I'm a VN to run agents. It sometimes gets a little fuzzy. So I wonder for you, what is sort of your definition of agents that can run well in Daytona? Because beyond code, right, if I'm not generating code to run agents, is your other use cases that actually runs well and what kind of things you're seeing,

Starting point is 00:14:42 people even using agents to what? And how does that represent itself? Sure, absolutely. So the way I think about agents is basically it's like a three-prong thing where the agent is a piece of software, a web app, mobile app, whatever. Just like a piece of software, whatever it was.

Starting point is 00:14:59 And on one side, it's connected to the intelligence, which is the model or models. On the other side, it is connected to a human. so a human is prompting or asking, you know, a question and that goes into the agent, then the agent asks the model. And then the third thing is it's tools. Does it have tools to be able to execute on its tasks? And tools can be just like, oh, access to files.

Starting point is 00:15:19 It could be a browser. It could be a VM. It could be an API to something else. It doesn't matter. Like, whatever it has access to. And that's how we focus it, although it has been changing. We think of ourselves as a tool call where the agent runs somewhere else, like literally a mobile app or AWS or wherever it runs.

Starting point is 00:15:35 And it calls on a Daytona machine when it needs. to do something. And now we also have people running the agent inside. Bring me back to that, if I forget. But your original question, what do people do? Now, obviously code is a big thing. And just because there's so many coding agents, and code is a thing that agents do really well.

Starting point is 00:15:51 And so when you think about code, like you have to, if your code is on GitHub or GitLab or whatever may be, you have to clone it somewhere to be able to edit that code and run that code. And an agent, again, doesn't have a computer. And so how can an agent copy that code or clone the code and then edit it

Starting point is 00:16:07 and run it. So it needs a machine to do that. And that's why that's like the primary first use case that actually started being a thing. Also, when you think of like data analysis, and so that's like the second biggest use case where it's like just a bunch of data and either analysis for visualization or whatever may be, think like a hedge fund agent type thing, it uses code to analyze the data. So again, it uses code even though it's not like a coding app. And that's sort of where our tagline came from and we'll probably have to update that as there's new things. which is like run AI code is Daytona's now tagline because it's either coding or running code

Starting point is 00:16:38 to execute on a task. But we have agents doing so many more things and agents use us for three things. They use us for what it's called container use, which is like the headless execution of code and tasks and commands, computer use. So like the desktop UI that it has to run something or even for browser use.

Starting point is 00:17:01 Now we don't do agents specifically. So like a company like browser use or browser base are quite different than we are. But what we do offer is that underlying infras or your agent can do these things, right? So use cases for us outside of just coding and data analysis is we have companies that are the net new, let's call it, productivity tools. So you can think like an agent native Microsoft Excel or agent native word or Asian native notion. And so these companies or products, you don't actually feel like there's an. agent spinning up a VM somewhere in the background, but you're basically using a productivity tool. But the agent needs a runtime to be able to execute on things in the background.

Starting point is 00:17:43 So the human using that productivity tool doesn't know there's like a Daytona sandbox somewhere running, but it is running in the background to do different things. In this part, it's mostly like data analysis type things. We have people that are running essentially, they have their agents that need a browser. And so they have an agent. So browser use is an agent, for example. And we have people using browser use on Daytona as well. So the reason they would do that is one, the spin-up times for Daytona are pretty insane. So it is 27 milliseconds plus latency to the agent, so to the endpoint wherever it may be. And that's like usually sub-100 milliseconds. And then if you run your agent inside of the sandbox, which some people are doing, you have zero latency from your

Starting point is 00:18:26 agent to the browser itself. And so we've had people using these things and running their browser agents inside of Daytona just because of the spin-up time is almost instant and the latency time between the agent and the sandbox is also very instant. We have people that are using Daytona for RL and for benchmarking. So some companies that you know really well, I don't know if I'm allowed to say them. So when I do, I'll add them, but like foundation model type companies that will do benchmarks for their agents or RL and they'll just use a Daytona machine. And so the interesting thing for this is they could run. this on Lambda and any short living things because they're very ephemeral, but they still decide

Starting point is 00:19:05 to use us because of all the tooling that they get around it. And so they'll spend up like 20, 30, 50,000 of these sandboxes in one run just for doing RL and things like that. The interesting ones that have also come up with is we have some large enterprise companies that are using it for dynamic UI. And so what I mean by dynamic UI is they have internal software, which I assume looks like Salesforce. So like some software that they have internally in this large enterprise right and this software has API endpoints but the user doesn't have the ideal UI that it wants or the ideal dashboard and the way you would fix that is you'd like open up a request and the internal engineers would build it out but what they have done is they've created an agent

Starting point is 00:19:45 that spins up a Daytona machine that's loaded with their SDK and that has insights about their documentation and a human will prompt it and say I need this dashboard or this input field or this you what whatever it may be and it generally generates it for it. This is code, but it's interesting use case. It generates it for that human and it continues to run on Daytona after that forever. So you at a company that has literally hundreds of thousand employees can have your own interfaces into the corporate application because they have an agent that can do these things on the fly. So it's not exactly Sacha's idea of like a one shot SaaS, but it definitely is a one shot UI sort of thing. So I think that's quite interesting. That is so interesting. Yeah, I think this category of infra is so new to everybody that it's really hard to sometimes even fathom, like, what does this even mean, right? But I think you talk a good point.

Starting point is 00:20:40 Sandbox is, one, it has to feel fast, so that you actually feel like, okay, this is something I want to be able to like to use to spend it with a lot of stuff and stuff like that. And I'm very curious about this sort of like the toolings, right? People are kind of staying not just because I can run VMs fast and then run some code in an isolated way, but you also have some of these toolings. You mentioned security firewalls a little bit, but I was curious, like, what is the most like quintessential toolings that people are really like, oh, wow, agent plus this is super powerful, right?

Starting point is 00:21:10 And what are those tools like by B right now? I mean, we have a bunch of tools that we've created, headless file explorer, headless terminal, headless Git client, Headless IDE. So we have more and more of these things that we're adding inside of there. The most interesting one that people use a lot is the Git client, mostly because we obfuscate the authorization from the agent. So the agent can't sort of like sort of hack into your GitHub.

Starting point is 00:21:35 I mean, hacking. It has access anyway. But like it cannot do things that it's not allowed to do. And it makes their lives so much easier. So that's been something that people have been using quite a bit outside of just the headless terminal, which is like probably the most used one. But that sort of that is expected just because they're like installing dependencies or doing things like that.

Starting point is 00:21:53 So just having these tools inside and around. And when I think about tools, to be clear, I don't know if this comes off, is like when you buy a new Mac or Windows or whatever, you have a bunch of tools inside it, right? You have like Office and a browser and all these things. And so we do that same thing for agents headless. So I think we're more akin to creating a MacBook for agents than we are just like, here's a VM. Because it's not just like CPU and RAM. It is the full sort of experience of like turning it on, being stateful.

Starting point is 00:22:22 Here's tools that come in. You can obviously add your own tools, but all these things are in there. And it does come pre-packaged with, like, the UI desktop, the headless desktop, the browser, the browser headless. It comes with all these things right away. And so you as an agent builder, as a human building this software, have less things to build on your own. Plus, your agent gets to its task faster with less steps, less cost X windows, less tokens, which means it breaks less in general. And so that's some of the reasons why people do that. And we look at these, we have like enterprise clients, but we try to.

Starting point is 00:22:56 talk a lot to like the YC type company. So it doesn't necessarily have to be YC, but like these up-and-comers, because those guys, kids are on the cutting edge. Like they're the ones like, oh, I need this new thing, guys. Can you build this for me? And so that's what we try to do as much possible. It's super incredible. I mean, it's quite the pivot and, I mean, super needed.

Starting point is 00:23:16 The idea of like becoming the agent operating system and your point about token usage, I mean, broadly, you know, some of the things I think a lot about is every time we add a little bit of probability into a system, the equivalent deterministic system on the other side, the kinds of put, like, a box around what it can do. And broadly, what I'm hearing you say is, hey, look, the speed at which you can spin up environments and try things out and get basically a Boolean outcome drives that sort of agent inference loop.

Starting point is 00:23:43 You know, the core area agent is some giant four loop that's basically like, is this right? Is this right? Is this right? Right? You know, and there's a prompt, and it's just kind of stepping through this graph. I'm sort of curious to understand, like, where do you think this goes in the end, right? like today you've built a very, it sounds like you build a very fast orchestration layer that comes freebie some tools that enables agents to use tools like computer use tools. So run a program, whatever inside a sandbox, you know, compile some code, create their own tools, whatever.

Starting point is 00:24:11 What's the future of this category look like to you? And then broadly, I guess the real question I have is what do you think the future of infrastructure is based on the fact that your work dealing with agents at such a low level of the stack? Like, I mean, this is a pretty foundational component of the stack. Yeah. So the way I look at it like the shorter term is that I feel that these are like the, let's call it, PCs of agents. And so why I sort of say that, similar to what I said to Mac, but let's say things that are coming up for us. Some things we know exactly how we're going to do. Some things are still like are in very deep in R.D is like Windows, Mac, GPUs. Like that's very short term. But like some of them we know how to solve really easily. Some we don't. And so why do we need a GPU? And everyone says, oh, you're going to do inference. I'm like, no, we're not doing inference. What we're doing is we have agents that one of our. run Blender in our machines, right? And so we have agents that want to create 3D worlds, that want to, you know, create video games, want to do all these things. And so it's not

Starting point is 00:25:05 inference in that sense. Like we're not going in that direction. It is literally the computer you have at home as a human that you play video games on, do render, do blender, run code, like depending on what your stick is, right? Like it is a general purpose machine for the human. And that is what we look. And that's what we're trying to build. for agents. And that's why I also say, like, Mac and Windows. Like, why would you need Mac and Windows? Well, you need it because if your agent is trying to, you know, work in health care,

Starting point is 00:25:36 and I'm just kind of making this up, they have software that's built on Windows, that works directly with the database. There's no APIs. And so how does an agent have access to that? The only way to have access to it is have a Windows machine with that piece of software installed on it. And so how does an agent get to that machine? Well, you have to create this sandbox, which is also scalable, that also has a Windows

Starting point is 00:25:54 in there. same thing for Mac, right? And so that is sort of like near term what we're doing. So it's much more than the word sandbox. And that is what the industry has called it and we'll call it a sandbox. That's totally fine. But it literally is the PC or the Mac for the agent. Now, the difference is the tools are different. The scale is different. The speed and the amount of these things that an agent can do are quite different than what humans. But that's what I think about these things in the super short term. I'll get into the long term. I just want to see. What do you guys think about that? I mean, I think that makes a lot of sense from my perspective, like the way you describe it. I just sit back and I, you know, the question that resonated in my head, and maybe this is more long term or short term. But the question I was asking myself as you were talking is since 1970, when the Unix was dawned, we've had sort of this set of generic POSIX compliant open source tools. Now, it's kind of different in different runtime environments, right? Like Mac OSX, it's freebSD based.

Starting point is 00:26:52 And so, you know, it's very different sometimes than what you get. on, you know, a Unix base, like a Ubuntu machine or whatever. But I'm kind of curious to also get your thoughts on this. Like, do you think the future of this layer of the stack, which is basically the operating system, is going to continue to be open source or is going to increasingly more curated and specific to the agent in its tasks? Like, are we going to, is this, like, are these generic tools?

Starting point is 00:27:14 Are we still going to care about being POSIX compliant if, like, an agent can interpret and creates its own environment? Is what's the final question I started asking myself? Does this mean they were ultimately going to be moving more towards, like, vendor or is open source still going to matter a lot because it's going to be much easier for us to build and try these different things out. Yeah, I don't know. I just kind of feel that the open source is less valuable. And I can tell you from our point of view, like our open source repository has like 21, 22,000 stars. So like pretty decent users, whatever. And it's great to

Starting point is 00:27:45 have that because people can take a look at it. People can like fix bugs. People can suggest things. But right now we use it mostly to keep ourselves in check versus that this is like something super interesting for the community today. I feel that the people building today, and this might be just an effect of our market, the people building today care less and less about that. They're like, give me an API key, and that's all I care about. Like, I don't care about anything.

Starting point is 00:28:10 Abstract everything. I don't want to know. I don't give a shit. Just let me do that. And so this is not 100% verified, but it's very much to the companies that I had talked to. So I talked to this to a few companies. and another colleague, not in our company, verified this,

Starting point is 00:28:26 but we should do deep analysis. But the data shows that about half of YC's batches no longer has a hyperscalar account. I didn't talk to every single person, and of the ones that I've talked about half. And another person in a different company did this also half, which means they don't have AWS, GCP, or Azure, right? And so that's actually quite interesting.

Starting point is 00:28:45 It's not the question of open source, but it's like, what are people looking to achieve today, especially as agents are doing more and more? It's like just enable me and or the agent to get their job done and I don't really care is how I'm reading it. I'm not saying that that's what they say, but that is how I'm reading the market right now and what's happening going forward. And when you add on that, another acquaintance of mine is doing a new vibe coding tool, which he might be listening, might not, so it won't say anything. But the idea is the way his is going to work is the way the agent codes and creates structure of the app. in his words, won't be machine, human readable.

Starting point is 00:29:24 Like, you'll be able to read it, but the files won't be structured with names in English, but rather in numbers. It'll be sorted all in one folder. It will do it the most optimal way for a machine to create, test, and deploy these applications. You as a human can obviously figure things out, but it's not optimized for humans. It's optimized for the agent itself. And so if we're going directionally into that future, which I think more and more we're going into, then these things, I mean, they're less impactful.

Starting point is 00:29:49 Like, you don't really care. Obviously, there's, like, security levels and things that you have to take care of, but does it matter if it's open source? I'm not sure. And I'm saying this as someone whose product is open source, right? So I'm saying this from this perspective, just don't know that in the future people will actually really care about that. I mean, from my sample point, talking to like Gen Z years or even younger, they never even seen the Amazon console in their life. Which, it was mind boggling to me because it's almost like, almost like you never even seen their terminal ones or stuff. Like, it was so crazy.

Starting point is 00:30:18 but like we grew up in the days where like Amazon console was like the most hated but just have to really you got to use it yeah you got to at least see your VMs look like I dash whatever nobody care and it's just so mind-boggling

Starting point is 00:30:32 but I guess that's the opportunity for you it's almost like I can become the next hyperscaler if things can be this way that's the pitch and that's is so fascinating because I think this probably leads into our next section very soon but the space where we ends are so early

Starting point is 00:30:47 too because when I'm looking at maybe your blogs and the things you talk about there's actually still mostly a lot of code usage but there's so much other up-and-coming stuff and so when you have like a tiny startup like I think it'll be so interesting like where do we even what kind of things should we even do right now should we focus just on the core primitives like almost a VM level or up like persistent state better network like almost like this L1L7 stack right and like let's just focus all on L1 stack I whatever you call it like get our machines, go faster, disk run better, almost like another hyper-scarly mindset. Like, okay, EBS is a little bit better, you know, and stuff like that.

Starting point is 00:31:25 And then, okay, when do we need S3? When do we need some other stuff, right? And, you know, as a tiny start, we can do it all. So I just wonder, like, what would be, like, things you feel like is maybe most important core infra? And what are, like, up-and-coming things? Like, oh, I think browser is going to be super huge, right? Let's do a little bit more here. Do you have something like that already?

Starting point is 00:31:46 So I have like general, I have like less detailed things on that, whereas my belief, and this is probably, I can I lead, I can lean into this section that you wanted to ask me. Like, it depends on how they end up playing out. But I think there will be new companies that are hyperscalers that are fat that are larger than the hyperscalers. Like if they don't catch up. So basically the way I see the world going, which I'll lean back into your question here is the hyper scalers are still growing at like 20%, 30% year over year. It's like insane at the scale that they're going. And I think they're going to continue on that scale. I don't think that changes in the near term, but as agents become more pervasive and the larger users of these things, I believe there's like a net new cloud or net new clouds that will like supersede these. Just because the hyperscalers are, I mean, I'm super simplifying this, but the hyperscalers are as big as they are because of the number of humans using the internet. In general, there's services, there's other things that talk to each other, whatnot, but in general. And now if you take the population of the world and put it to the power, of N, which are like agents that are going to do things, then you'll need that much bigger of a cloud. Again, I'm super simplifying this, and it's not exact to numbers, but just like directionally

Starting point is 00:32:55 is correct. Now, what we encountered in Daytona is we were running, as you know, the inner loop. So dev environments for humans. Now we're doing sort of equal, they're not just dev environments, but general environments for agents. And we found that it's like a very, very different product. And if you take any of the old competitors of Daytona and give agents to actually, it's not going to work really well.

Starting point is 00:33:17 And if you take Daytona or new competitors, it works much better, which means the interface and tools we know for a fact in the inner loop are different from agents and humans. Now, if you extrapolate that and say, oh, agents as they become smarter and better and all these things,

Starting point is 00:33:35 what other tools are they going to need to your question? Like an S3, like, you know, are they going to use Redis? Are they use whatever? I don't know what they're going to use. Like, we'll figure that out on the way. Are they exact? Are they different things? but as far as I can tell and what I'm betting on is that all the tools on the outer loop

Starting point is 00:33:52 will also be completely different, right? And so who builds that? And the person or people or a company that builds that for a market that's orders of magnitude larger than human developers, that will be something much, much larger, right? And so the question is, can the hypers, do they see that? Will they see that? Will they be able to attack that? I mean, we've seen with the example that I always have, there's always counter examples

Starting point is 00:34:14 as well. as like cursor, right? Microsoft, that shouldn't have happened. They owned Vs code. They owned GitHub. They owned Asia and 49% of OpenAI. Like cursor should have not happened, right? But it did. It happened. And it's still leading that. We'll see where it ends, but it's still leading. So in that same vein, is it impossible to imagine that there will be a new hyperscaler, which is not made for humans, but it's literally made for agents. And agents are the people or the entities, we're going to call using that. and with the growth of agents where it's going that that grows that big, right?

Starting point is 00:34:46 And so sort of a hotish take is like maybe there's a new set of hyperscalers and the hyperscalers are not going to be the hyperscalers there. So is that us or someone else like TBD? But I think that's directioning where the future is headed. Yeah, I think you made a lot of good point specifically about like a net new AI cloud. I mean, I kind of want to push back.

Starting point is 00:35:05 I think like those GCPA, Azure, ABS, these are just indexes on the internet. The thing I constantly think about is like what actually gets people to move computers, off of those platforms. I mean, fundamentally, they have you locked in of data gravity and how expensive it is to move things around, right? And we've witnessed this sort of like war from Cloudflare

Starting point is 00:35:24 with Artu trying to get people to move data the edge and then try to get people to move data from AUS into Cloudflare. And this is like an ongoing, an ongoing battle. So like broadly, I want to believe, like we're going to have neoclouds, like a Cloudflare that continue to pop up over time, realize that their core, you know, the application they bring, in the case of cloud is WAF, which then allowed them to build these points of presence, there's then allowed them to get like, and then as that went to scale, they, you know, got capital efficiency

Starting point is 00:35:52 leverage, allows them to scale, like, the hardware, then actually allows them to create this business. And in fact, if you look at every single hyperscaler and every single person that, like, invests in compute, they have most of the very successful ones that figured out, like, some small app that then gets scaled, that then gives them capital leverage, which then allows them to actually invest in the hardware and build out in the real points of presence, right? And I sit back and I think, well, okay, there's really two questions.

Starting point is 00:36:15 Like, what is a use case that is so finely different that you get a ton of leverage from building, like, net new hardware in the same what the Cloudflare did with the points of presence, and is going to be scalable enough that everyone does it such that you can actually compete, right, to actually, like, get the depth and bread to build one of these, like, one of these neoclouds. And, like, what is the use case?

Starting point is 00:36:37 And one of them obviously is like runtime compute, like ephemeral, highly ephemeral runtime compute where you don't actually care about latency, potentially, but you actually care more about the ephemerality and the cost. Like there's a bunch of different other, I'm sure you can dream of. But I'm kind of curious, have you, like, given more thought to what you think the wedge point is that gets people off of ABOS? Yeah. We're still saying in a world where the vast majority on-prem is moving to the hyperscalers, right? Like there's more compute on-prem moving to the cloud than cloud moving off of the. the hyper scalers. And I totally agree with you.

Starting point is 00:37:09 And that's why we are not competing with the hyperscalers on existing technology. And so that's sort of like my caveat to that where like you're trying to get to tell people, oh, we're a cheaper, better, faster AWS. Like that usually doesn't really work. Like it can work for on the edges, but it's not going to make you bigger than that company. Right. So whatever the incumbent is, if you're like a cheaper, better, faster version, whatever it may be, like you can get some things, but you still don't own the mind share.

Starting point is 00:37:39 And the mind share is the most difficult thing to get people off. Because even newcomers for that same use case are just like, oh, I'm going to go to AWS, right? Or like I worked at a competitor of Twilio for a long time. And like it's really hard to get people off of Twilio. And net you people like creating using messaging still use Twilio because it is the way it works and everyone knows how it works and that's it. And so the reason I kind of dare to dream that this happens is because our news case,

Starting point is 00:38:04 is no longer the human nor that same it's not that same workload it's a completely different workload which you cannot get addressed at the hyperscalers at the moment right and so because you can get it at the moment and the market is still like it's nascent it's really small and they probably don't care at this point that much but like can you be that market and the market grows fast enough that you become the net new for that and then you sort of widen the wallet from there right and then they have to get people off of you to go to them in that sense And my sort of example where this kind of works, but it's less and ours should be more to the fact is CoreWeave.

Starting point is 00:38:43 So if you look at CoreWeave, like GPUs, man, there's like, there's nothing smart there. Like it's just GPUs. You have that at AWS and everywhere else. But the way they've created the software stack with the software stack on CoreWeave is different than AWS. And it's different enough. And there's a market big enough that that difference makes a difference. And they're a public company that's doing fairly well, right?

Starting point is 00:39:03 so is it a mean stock now or whatever or is it just like where people are investing in AI so it's great but generally it's a big company and doing really well and it's still humans to humans running inference right and so what we're trying to say is like it's not even humans anymore like the user is an agent and so it's completely new and again I understand that this might be far-fetched and I might be like in the bubble completely and completely lost it but if there is a chance that this is the future then that is your wedge because if a core weave can exist just for inference, then if agents need a new, not just runtime, but the point like the whole compute that agents need, and you're specifically made for that and that market is growing really

Starting point is 00:39:44 fast, then you can own that market and then swore with that market and become the incumbent in that market. And so I've seen, of course, counterpoints and points to both of these, but that's sort of directionally how I think about it. I mean, this actually feels similar to some degree back in a day when internet just came on. Like, you will buy stuff on your internet? What? it was unfathomable that you actually have same day shipping too like what how but now it's like so normal so i i think i totally get your point but i'm also very curious right like we've um i remember that i saw the headline when gpd5 came out uh i think sean put out like this is the stone age of AI now right it's almost like the evolution that needs to happen like for instance

Starting point is 00:40:24 to be agents to be the primary use cases we need to continue to make the agent's the agentic adoption in the gentry use scale so much more easier, so much for simpler folks, right? And so I think cursor definitely helped a lot, right? Cloud code helped a lot either. But it doesn't feel like we're like there. Next year it will be like 50,000 more agents running wild and maybe agent we buy directly from you,

Starting point is 00:40:47 not even talking to humans anymore. So in your mind, what are like next evolution required to even get closer to like faster? Sure. I mean, I had a call with a, colleague acquaintance today, which is doing infrastructure payments for agents. And so, like, his ask was, hey, would you, like, embed this in Daytona? So an agent can just come to your website all on its own, pay all on its own, and use the compute.

Starting point is 00:41:14 And my honest answer was, like, if it's not a heavy lift, I'll do that today. But I think we're still not there where agents are actually the ones doing this. And so it's also chicken and egg problem. Like, to get all these things, the same thing with authorization, right? So log in. So you have like companies like WorkOS and Arcade Dev and all these others trying to solve that as well. So and that's a problem we have to solve before even the payment.

Starting point is 00:41:36 So can an agent actually go out, log in? And then once it's login, can it actually pay for the thing that it's trying to do? I mean, you can do that now. You can give it a virtual credit card kind of and set it off. But like all these things on the infrastructure layer need to be done. And so I mean, it is a timing thing. So I've been early in my life and I've been late with companies. And so I don't know.

Starting point is 00:41:57 The honest answer to him as well is like, I don't know if you're super early or just in time because you have to create the product, you have some users and the market is going directionally where you should. Now, are you there right in time when the market sort of inflects and that you are become the leader in that thing. So I think there's a lot of things on the agents itself. And my thought is right now, even if the models don't continue like getting better, like with like tooling and things that we can do, there's so much more to unlock. And I think it's more on. And I think it's more on. the app layer people, which we're obviously not part of that, but like we work with them, it's on them to sort of build out these things. And I feel that for now, the vast majority of like app layers that I've seen, not that they don't have value, but they basically take the agent sort of the model as is and just like, oh, here's access to tools and the prompt and just like, go. I don't feel that there's like a lot of work done around that to make the agent more sophisticated. And I think the reason is just because these models come out so often, like they don't really care. It's just like, oh, the next model will be better and better and better.

Starting point is 00:43:03 And then I'll just like be lazy and I just like acquire customers and it gets better over time. Whereas when people start actually really working hard around the models that we have to get these agents to work better, I think that's is when these things all sort of makes sense. So fascinating. I mean, this is the whole new world we're in. A gentic era, I guess you can call it. I think you're right. All right. So we have so much we could ask, but just based on time, if folks want to try out Daytona, folks want to learn more, where should they go? How do they look up more stuff here? Daytona.io, the website. Obviously, there's a GitHub repository and everything else,

Starting point is 00:43:39 but that's the place. Sign up. You can get free like $200 of credits, get an API key, and you're off to the races. Awesome. Well, thank you so much being here, sir. You know, we have so much fun. Thanks so much for having me, guys. It's always a blast to talk to you. Awesome. Thank you. Thank you.

The Infra Pod - The next cloud to overtake AWS are AI sandboxes?! (Chat with Ivan from Daytona)

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.