The Infra Pod - From 30 Seconds to 20ms: Solving Browser Speed for AI Agents (Chat with Catherine from Kernel)

Starting point is 00:00:03 Welcome to the Infer Pod. This is Tim from Essence and Ian, let's go. Hey, this is Ian Livingston, lover of agent, CEO, and co-founder of KeyCard, helping you secure agent interactions. Couldn't be more excited to have Catherine Jew, this co-founder and CEO of Kernel on the pod today. Catherine, what made you decide to take the crazy leap and build a company, but was the thing that really you realize you're like, hey, I just got to get out and I got to do this thing. Help us understand that I got started.

Starting point is 00:00:32 Yeah, sure. Well, thanks so much for having me on the pod. Really excited for our conversation. Man, that is a fairly deep question out of the box, out of the gates. I have spent most of my career in startup and tech world. So I had co-founded a company straight out of college that ended up being YCBACs. My current co-founder, Raff, also had founded a YCBACs company that did very well. And so in many ways, we are builders at heart.

Starting point is 00:01:00 We love building products and businesses and doing things from zero to one. And so the question of jumping into startup world has, you know, that was a question I may have asked myself straight off college, but haven't really looked back since. What really started our journey with Kernel, which is much more recent, Colonel has been around for about a year, came out of two things on my end. So before starting Colonel, I was working at Cashup, Leaning a team of Ford Deploy, code engineers. And back in kind of like 2023, when chat GPT was really starting to heat up, this concept of LLM, being applied to so many different use cases was really starting to come to

Starting point is 00:01:41 fruition. Block bought every single employee, like 10,000 people at this company, an open AI API key. And the CEO at the time was like, we don't know what you're going to do with it. We know that AI is going to be super critical to our workflows. Here's an API key. Just go and do something with it, like, figure out what might be useful. And that was such a seminal moment because my team ended up being one of the first teams at Cash App to build, you can think of it as an internal Glean product. This was like before Glean even became a real household name. So we basically built Glean for our own internal organization of 50 people.

Starting point is 00:02:22 And, you know, to see at a public company, you know, given an API key to this new technology, and the ability for us to stand up something into production that had immediate business value. And we did that in a small team in a few months. For me, was an aha moment that while this technology that has been maturing for over a decade, depends on where you want to call AI and whether that as pre-transformers or after the development of the transformer,

Starting point is 00:02:52 this technology is here and it has immediate business potential use cases. and as someone who loves to be on the bleeding edge of technology, having spent my entire career in startups, I need to be on the bleeding edge building for the future. So in 2023, it was when I started down my rabbit hole of, okay, now is a great time to be starting a new company. There's so much opportunity in Greenfield to be building something that hopefully has legs as the technology matures as well.

Starting point is 00:03:21 And so I started going down a rabbit hole of talking to users, trying to figure out the space that I wanted to be in, and ultimately, like, kind of landing all on the pain point of infrastructure for agents, and specifically browser infrastructure for agents, from talking to a lot of users around that time. So what was the unique pain around browser infrastructure that you're like, this is, like, the place I want to spend some time, you know? Like, if started a company's hard, also, like, eats your life.

Starting point is 00:03:51 So, like, yeah, what was it about that problem? And then what was your unique insight? You're like, hey, we can do this in a way that is like funnily different than what other people were doing or what came before. Yeah. So how I first came across this concept or the need for AI agents to access the internet and access browsers was while I was at CashUp, a sister team that sat next to ours started to try to use cloud computer use, the vision model, to automate their QA. And they weren't trying to automate QA of, you know, Cash App services or Square surfaces, but actually of partner websites in productions alive on the internet. And so this QA engineering team was trying to go to their partner e-commerce websites like

Starting point is 00:04:38 Norsham.com and Levi's, and then use cloud computer use to analyze those product pages on those e-commerce sites for the presence of like, does the afterpay or cash app logo exist on those pages because every single time Block partners with one of these e-commerce websites, they're, you know, in contracts. Like they have a business partnership, and those e-commerce websites are kind of obligated to include, you know, the presence of that logo, you know, a specific copy that's really critical for consumer financial education. There's all these stipulations around them. Before, you know, computer vision models and LLMs, it actually was not possible for that QA engineering team to automate that, in part because Block is a public company and, I'm not.

Starting point is 00:05:22 AfterPay has thousands of merchant's websites that they need to integrate with. It would be impossible. It would be technically infeasible for that QA engineering team to write, you know, brittle, playwright, or puppeteer or selenium scrapers for every single one of those websites and then maintain them and update them every single time that merchant website changed. And so the paradigm shift of computer use vision models wasn't unlock for this new use case where you can imagine they're building these prompts and templates that say like, Okay, go to insert website name.com, you know, click on the products, you know, take a screenshot,

Starting point is 00:05:57 and then like look visually for all of these different requirements. That was really the use case that I started to see and got really, really excited. It was like, okay, this is not, you know, unique to cash app or even the financial services industry. This technology can now be applied in so many different ways. And, you know, when that happened in October of 20203, the technology, cloud computers had just come out. And there were so many problems around solving that. problem from a tooling perspective, from an infrastructure perspective, that for me is like,

Starting point is 00:06:27 which of these problems are the most interesting to build a business around that we think that I can really, really solve. And so that's kind of how I started to go down this rabbit hole of, okay, how do we build infrastructure for computer use? Where we ended up landing from an infrastructure perspective, my co-founder Raf was, and I started to build prototypes for design partners around this concept of, hey, computer use models need their own isolated VMs to be hosted in, and how are we going to do that at scale? The most obvious way that, like, when you look online for how to do this to put a browser and scale a parallel in the cloud, is you might put that on a Docker container, and that's what we did at first.

Starting point is 00:07:09 My co-founder RAF had been tinkering with micro VMs and unicronals kind of on the side in parallel to all of these investigations from my viewser perspective. At one point, he was like, I just think it would be cool if we tried to put a browser on a unicronal, on a microBM. And we weren't entirely sure why. We were just like, it seems really cool. MicroVMs have the benefit of lazily fast startup times. They can spin up in, you know, 20 milliseconds in the world of like true unicernals or, you know, on the order of 150, 100 milliseconds if you're doing a firecracker VM. There could be something really interesting there.

Starting point is 00:07:44 The other really cool thing about microvMs is they have this ability to. snapshot and restore as a technology. And so you're able to take a running VM and basically like freeze it. And then whenever you want to come back to it, you can just wake it back up and it's in the exact same state that it was before. And no one had ever tried to put a browser on a unicronin or micro, at least publicly, on a micro VM before. But we just thought the technology was really cool.

Starting point is 00:08:13 And so we kind of were like, okay, we know that this is a growing pain point with users. We've built an initial implementation of putting, you know, managing browsers in Docker containers and managing fleets of them. It just could be really technically cool to try to put it on a unicurnal and see if like developers or users are interested in that. And so we ended up doing that, figuring out how to do that with the help of the Unicraft team who manages that technology. And then we decided to do a show HN where we basically said like, hey, here's an open source, we figured out how to put Chromium on a unicernel, the

Starting point is 00:08:50 response was super positive and kind of put us on the map as to a developer infrastructure platform to start to watch and got a lot of early user signups from it. That's kind of how we got started. Incredible. And, you know, your insight around the unicernal

Starting point is 00:09:10 and the advantages that gave you and the stop start, like, what are the use cases where that is like just such a clear? This is so much better, right? Because we've had browser automation stuff forever, you know, Selenium's existed. There was companies like Sauce Labs, right? There's other companies that have come out around like Playwright and such. So curious to kind of think about how you think,

Starting point is 00:09:29 hey, this gave us like this huge advantage for these use cases in a way that like was so different. Yeah. I think practically speaking, there are two main advantages of this technical approach that in many ways we've kind of walked into. Like we release this. We're like, hey, we think this is really cool. We don't entirely know. We have some notions of what people might. find it useful for, but we don't entirely know. And so we kind of like put it out there in the world.

Starting point is 00:09:52 And then as people start to use it more and more, they kind of came back to us and told us what was most interesting or what this technology enabled. The first piece about unicernels is going back to the problem of like, agents need access to browsers. And a growing number of agents can do anything on the internet. And so the idea with Kernels, like every agent that wants to access to internet, they'll need a browser and hopefully kernel can power that.

Starting point is 00:10:17 The thing about browsers is that they're very slow. They're very resource-intensive and very slow to start up. And so if you've ever tried to put Chrome in a Docker container, maybe for your CI-CD builds, you want to automate some QA tests, and you try to home-roll that, you'll quickly find that. One, it's super finicky. You're kind of massaging different Chrome launch flags where you have no idea what any of them do.

Starting point is 00:10:41 It's unclear why you can't get Chrome to stop crashing in that container. And then the second thing is that, And once you do that, after you've built that Docker image, to spin it up, it can take 30 seconds plus for Chrome to start up because that's so many processes that needs to get going. And as a serverless offering, you know, we have an API that allows developers to grab a browser on demand, connect it to their agent, do whatever they need to do. It just doesn't feel right as a developer for our API to be like, we're going to take 30 seconds plus to return a browser to you. you might think like, okay, now in order to solve that, I might start managing like warm pools of browsers. Like that's kind of the solution.

Starting point is 00:11:22 And that becomes very, very cost and effective. Really, like, doesn't make sense from a unit economic perspective at scale. Unicornals solve that at a technical level because of that snapshot and restore functionality. And so we can basically, you know, before a developer or a customer comes to us and says, hey, give me a browser right now, under the hood, we can spin up that browser, you know, eat that cost of 30 seconds plus when no one's really asking for it. And then we can immediately put it into standby. So we'll put it to sleep.

Starting point is 00:11:53 It's a fresh browser. It's ready to go. All the things are loaded. But we're able to put it to sleep. And when we do that, we don't consume, you know, active CPU, RAM, et cetera. It's just kind of sitting on disk ready to go. And then when that user comes into our API and asks for the browser, we'll wake it up. And with the unocernal technology, the browser can get started in 20 milliseconds.

Starting point is 00:12:14 or less, which is just immensely faster than that 30-second initial startup time. And so that's how we're able to deliver that initial performance. And then the second piece about the snapshot and restore that is really interesting from an agent's accessing the internet use case perspective is because we can kind of put that browser to sleep, wake it up whenever we want to, it's in the exact same stage. That means like in practice, let's say you go to a website on a kernel browser, you, even zoom into a specific part of the page. You maybe have logged in.

Starting point is 00:12:49 Your agent has logged in on your behalf. You've added things to cart. If that agent needs to go away or you have these asynchronous flows, maybe some of those workflows involved, you know, that end consumer coming into the loop or different agents doing different tasks on that same browser, we're able to kind of give that browser a very long lifetime. And it kind of returns in that same state every single time, whatever work you've applied to it.

Starting point is 00:13:15 And on top of that, neither the user nor kernel really has to pay for that long-lived clock time that the browser is in existence. Because in between all of those moments, we'll just put that browser back to sleep. And so like, hey, Tim like goes to Norseon.com,

Starting point is 00:13:32 his agent adds the item to cart. Maybe the agent is not allowed to actually go and check out on your behalf, Tim. We can just put that browser to sleep after the agent does its work. You go at 8 a.m. or whatever time you want to pick back up, you'll go and access that browser that agent was working on. And maybe there was a three-hour delta. There's some marginal cost, but largely our costs to maintain that three-hour dead time is fairly small. And you don't, as a developer, don't have to pay for that idle clock

Starting point is 00:13:58 time either. And so it really enables a ton of very interesting use cases as agents continue to mature. It's concept of like humans in the loop coming in and interacting with agents doing their work. Those are the long-winded way of explaining of the different exciting things that we see that the Unicernel technology has unlocked. Yeah, I mean, that sounds like a real agent I'll be building. So love the example here. I think we want to double it down on the thread here because I think Unicernel is a known thing. You know, when I were working on with Docker and Messles back in a day, like there's a lot of people who are trying to put Unicernel into Kubernetes and that kind of thing. But obviously, that has been difficult.

Starting point is 00:14:38 It's not just Unicrono itself is a brand new concept. It's like all the usability stuff around it is pretty new, right? And I think I can see that since you have a single app, I guess, the Chromium browser, right? So it's more controllable on your end. I'm very curious, you know, using Unicraft, right, to be like you're a platform provider. So I'm sure they take quite some complexity away. What has been my biggest, one of the few hardest things to, challenges even to make this work.

Starting point is 00:15:08 I'm sure there's a lot of moving pieces, right? On top of browser pools, you know, flags, this and that, right? I'm sure there's lots of things. Can you maybe go through one or two really fun technical challenges that might involve this sort of stack that you didn't know you had to go figure out? But ended up a very interesting technical challenge here? Yeah. So fairly specifically, we're using.

Starting point is 00:15:33 the Unicraft Unicronel, which is, you know, maybe the definition of unicernels can vary depending on who you talk to, but the concept of like single process, single application. The form of unicronal that we're using is somewhat a fork of Firecracker. And so, you know, browsers have many processes. You know, we're running, we are running Linux. And so not entirely like single process, single application in that truest sense. And the Unicraft platform does take a lot of some of the very, very low level,

Starting point is 00:16:02 they abstract it for us. And so we don't have to worry about building the snapshot and restore functionality ourselves, managing that. They kind of give us a nice interface to interact with that. The area that is somewhat unique to browsers that we're really excited about pushing on the infrastructure side is, you know, browser workloads. So what people want to do with browsers when they give an agent a browser, they're often accessing different websites.

Starting point is 00:16:30 those workloads themselves, maybe bursty. So how many browsers you're spinning up at one time is bursty. And then the website itself is fairly bursty because, you know, Hacker News' website, super resource lights, you know, not much more than just kind of like a little bit of HTML CSS. Really anything can load that pretty easily, quite quickly, very low latency.

Starting point is 00:16:52 Nortrum.com or some older school e-commerce website, tons of animation, lots of graphics, very, very heavy, resource-intensive. the internet is, by definition, varies a lot in the resources required. And so one of the areas that we are now starting to work on that we get very excited from a technical perspective is like, okay, Rose on Firecracker or MicroVMs, huge unlock for so many reasons. Now, like, how can we actually optimize our microvMs for those specific websites?

Starting point is 00:17:27 And so you can imagine, like, hey, an agent wants to actually actually actually optimize. Hacker News, serve them a very light browser, like lightly allocated browser resource. They then try to go to more heavier website. Using the next evolution of Firecracker, which is Cloud Hypervisor, we can actually start to think about hot swapping memory allocation. And so you can actually like increase the resources that browser has for that specific website. Website requires graphical processing, the potential to add, you know, true webborder

Starting point is 00:18:00 GL and the ability for that browser to have a graphics card and be able to perform just like your in my browsers. There's so many interesting things that we can start to do that will make that agents browser performance and fine-tuned specifically for that website and whatever they are trying to do. And that is a, I think, a very interesting technical challenge for us to encounter. And we think that when we do that, we will be enabling agents that need to access to the internet, in increasingly cost-efficient ways in ways that give them

Starting point is 00:18:35 just a first-class experience to the internet in the same way that we do on our personal computers. Yeah, that's the one I'm most excited about right now. Yeah, that's actually super cool. It's almost like the combination of right-sizing, but also the ability to actually learn on the fly and then do hot swapping, right? Like also, like, I think resizing on the fly

Starting point is 00:18:55 of resources and allocation. And actually it's been a pretty tough challenge even just normal containers for a lot of times. So doing that on a unknown workload, like it's a known workload, there's something to learn about. But that workload may not be consistent. Like maybe even North Stream balloons five times more, and you ooms, right, and a lot of different kind of problems.

Starting point is 00:19:15 So that's really cool. So maybe go down the other side of things. I guess we've been talked about Unicarnels and browser pool. It's pretty low level. Actually, one thing, all of our developers, even non-developers, we're trying to, like, very curious, what is it, browser, browser, web agent difference is.

Starting point is 00:19:33 From a conceptual level, it's like just doing playwright type interactions. It may not be really that different. But are you seeing agents changing the paradigm of how browsers, every browser interaction might look like? And do you see yourself building products or tooling around the agent's specificness? That's probably like a very opaque area. I think a lot of us don't understand yet. Like, okay, is agent just doing whatever humans are doing?

Starting point is 00:19:59 and that's it. Maybe some pauses here and that, that's kind of it. Or there's a fundamental difference. Maybe the parsing information, maybe the capturing or something like that. What you see is like the new layer or even asks their customers, like, please help my agent to do X. That's very hard to do right now. Yeah.

Starting point is 00:20:18 When we think about the space that kernel operates in, we provide browsers as a service, we leave it to the developer to figure out what they want to do with that browser. And so many of them are still using, you know, writing playwright or having an LLM right playwright to generate a deterministic scripts. And then kind of the next evolution beyond that that we are starting to see is like, okay, and if that script breaks, passes a screenshot and the DOM to an LLM, allow us to try to self-heal. Maybe that succeeds sometimes and doesn't succeed other times. But that's kind of like the next evolution of, okay, how do you turn very deterministic riddle things? the websites may change at any given time

Starting point is 00:20:59 into something that actually is more durable, has recovery properties, self-healing properties. We see increasingly our customers and developers doing that where you might have a fallback method or you might use an allowance to generate that player in the first instance, run in until it breaks, recover, and you kind of like loop back around the second time and build something new on the fly.

Starting point is 00:21:22 And then the third kind of evolution, which I would say is broadly the industry is still very much in the experimental phase. But we do see some companies doing this in production is like, okay, truly agenic behavior. We don't want, you know, we maybe don't want to use playwright at all. We want to use just the vision models, the computer use models. We want to pass it a prompt and have it, you know, figure out what it is, it wants to do. I think the cash up example is a good one that is fairly agentic because in order for that use case to be successful, it needs a genetic reasoning to figure out, like, where on that nav bar are

Starting point is 00:22:00 products pages and figure out, like, how to click on a product. You don't know the DOM ahead of time. And use cases like that, I think, are really exciting. And the technology is only improving. So when we think about where the cash app team first started using that, you know, a little bit over a year ago, computer use models are slow, they're expensive, you know, definitely not reliable. You ask you to do the same task three times and I'll do it a different way every single time. But we kind of like are actively using these products and tracking the technology. And I would say where we are today is much more in the realm of like, you know, similar to the coding agent space. We're getting very close in the coding agent space to be just one-shodding

Starting point is 00:22:46 a lot of our code and saying like, yeah, that worked. Did exactly what I expected it to do, wanted it to do. No notes. Like keep on. And I think like we're starting to see that on the computer vision side as well. And so the industry today is kind of, you know, different use cases, different business requirements take different approaches. Primal kind of is compatible with all of them, but I think writ large where we see the industry headed is to a world where, you know, computer vision models themselves become increasingly capable, increasingly reliable. I'm curious what your position is on the long tail here. You know, I think there's just sort of tension, specifically on what browsers we're like, we're in a phase of, let's say we're really like

Starting point is 00:23:25 in the pre-agent phase of software in the sense that, and what I mean by that is, like, software that we use on the daily basis is not agentic in nature. And it's not designed for agents to interoperate, right? That's why, like, the browser is such an interesting, I'm sure you all have a very large vision based on the name of the company, but is why, like, something, like what you're doing around browsers, like, makes so much sense in such a prolific use case. I'm curious what you think the evolution as software becomes more agentic. So, like, for an example, what we hypothesize. So one is, like, there's a, this collapse of the software interface, right?

Starting point is 00:23:58 We don't need endless UIUX in a world where you can type or speak to a computer. It can figure out your intent, and it can start making decisions on your behalf. Right. And so there's sort of this collapse that's coming. And so in the future, like, agents, like software we use is going to be much smaller in UIUX footprint than previous. And sort of these browsers, to a certain extent, are a bridge and solve like a long-tail problem. as this all evolves, what do you think the timeframe is for that?

Starting point is 00:24:27 Do you agree with the sort of sentiment I just said or disagree? In either case, it'd be interested. And then if it does evolve, what do you think the interface between agentic software components are? What do you think it becomes? Yeah. A couple perspectives I personally have. One is where software is headed is directionally in the realm of the things that you had mentioned. I like to think of the world as being built on duct tape.

Starting point is 00:24:53 When we look at like successful companies, Platt is an example where when you think about what Plad does, assuming you're both familiar with Plad, it's kind of allows consumers to connect their bank accounts to all these financial applications. And how Plad really got started was under the hood, you know, they did not have direct bank integrations because banks didn't offer APIs. And so they were often using scrapers or kind of browser technology to kind of bridge that gap. And Plad became a very, very successful company. and the world is built in a way that often does not make rational sense. The reality of the world is not conform to what should be the case. And so in the near to medium term,

Starting point is 00:25:36 we expect browsers to be a very meaningful interface that unlocks agents and to do the work that they're interested in doing. I come across so many different use cases with our customers and developers where I'm like, yeah, it makes perfect sense why you're using a browser agent or browser automation to go and do that, I can totally see why that end service and website will never offer that as an API or an MCP. They're nonsensivized.

Starting point is 00:26:02 It's not their secret sauce. It's not their poor products. And I can also see why that end service would have no problem with you doing that, especially when you're acting on behalf of your end customer and user and you have their consent. And the cool thing about vision models and LLM is generating playwright code is that that unlocks all of these use cases that historically would be too difficult to kind of wire up yourselves.

Starting point is 00:26:25 And so I think that's kind of why we're seeing that proliferation. As it relates to the kind of like longer term future, we will think about this internally and enjoy partnering with, with other players in a space that are pushing that forward. And so like one example is, you know, what does it mean to have a website that fine-tunes itself to a human versus an agent? What would it mean for there to be like an agent-optimized website? There'll be questions of like who gets to own the

Starting point is 00:26:53 those websites and who can get the buying of the website to opt into that. But that's also very exciting. And so we actively love working with people and other companies that are thinking about that problem, serving those end services, and then thinking about how can Kernel's infrastructure for us to be the best place for agents to access the internet? What is the infrastructure we need to build to enable that, assuming that the websites are going to mature and change as well as agents become more widely adopted? Yeah, it was really good thought. I think it's going to take a long time.

Starting point is 00:27:27 If you're a software developer right now on Twitter, wherever you hang out, you open up your cloud code, you built like, you know, your thing over the, over the holidays, you use Opus 4-5 and you're like, holy crap, this is real. I think like a lot of us have had that experience where it's like three months ago was like, I can see it, but I don't feel it. You know, now you're like, oh, I feel it. Like, it's not 100% all the time, but it's like close enough. I'm like, okay, I'm entirely, I'm tiredly bought it. And like, this is here and this is now. Like, the core of the internet, which I think is a point really you're making, is going to take much longer. And I think that's 100% true.

Starting point is 00:27:59 One of the next questions I was really trying to kind of think about is, what you build isn't just awesome for browsing agents. I'm curious how you think about other types of age interactions, right? Like a browser is a type of tool. You're hyper-focused on optimizing that tool. It's a prolific use of that tool. It makes a lot of sense. I'm curious how you're thinking about, because a lot of what you described as core tech,

Starting point is 00:28:24 there's a reason we were very excited about unicernals and all, like, VMs and all this stuff, you know, 10 years ago when Cloud came about. Now, it didn't get proliferation for lots of different reasons. But I'm curious, sort of how you think about the evolution as well of tools and this primordial ooze that eventually becomes like the tool, the world of agents calling tools and also the fun stuff. Yeah. I think from a business perspective, Colonel,

Starting point is 00:28:51 most of work and digital lives happen on the internet. And so it is definitely true, and we take a lot of pride on this internally, that our infrastructure is built in such a way where the browser, we're actually, we're spinning up microvMs, and we just happen to be putting browsers on them because that's what people are asking for them.

Starting point is 00:29:14 We've built the infrastructure in a way where if a, you know, increasingly agency to access the Flack interface, for example, like Kernel, our primitives are built in such a way where we could like fairly easily add that as like another application. But I think from a business perspective, I do think that we are interested, deeply interested in browsers because of this internet access unlock. You know, when you look at the cloud evolution, it took us from mostly working on computers with desktop apps to nowadays, I mostly just have like the browser open, Slack, and maybe superhuman. You know, everything happens, every single software and where all work takes place, it will

Starting point is 00:29:55 happen through the internet over web apps. And so that's a business reason. That's the why we're most interested in the browser technology itself. I don't know if that answered your question, or if there was like kind of a corollary or addition that you had in addition to that. I think I was interested if you were thinking about a good example is I think about this proliferation about sandboxes.

Starting point is 00:30:18 And really what you build is a really great sandbox for a browser. But a lot of what you talk about, the pausing, the replay, it'd probably be very awesome for coding sandboxes. And it's okay,

Starting point is 00:30:29 maybe it's like, hey, we should talk about this because maybe we're revealing real bad. So don't reveal things you don't want to reveal. But I was curious because it doesn't take it. A super smart individual will be like,

Starting point is 00:30:38 well, I know what's going underneath the hood there, what you're optimizing for. So I'm really curious about a bunch of these other use cases. That was pretty good to. Yeah, I got it. I would say, yeah, as we think about the business, today we're most, most interested in that browser technology. But again, when we think about what we are building, yeah, 100%.

Starting point is 00:30:57 And like, one of the open source projects we're working on right now is a control plane for cloud hypervisor. MicroVMs is called HypeMan. And we use this repo that we've built to put browsers on it. But we envision that anyone who wants, who's interested in micro VM technology could easily check out our open source, fork it. We welcome open source community contributors as well to it. I think Kernel, the team and the people that work on kernel get really excited about you know, all of the need, the infrastructural needs that agents require. And we get to

Starting point is 00:31:34 manifest that in various ways. And I think the open source tooling that we're building and then building on top of is kind of how we approach that. And so if you've ever been interested in building your own agent sandbox solution, building your own lovable and managing the sandbox is under the hood, you should check out Hightman because it's infinitely flexible. And we think it has a lot of legs that we're going to continue to invest in as well. Very cool. Well, this is my favorite section of the podcast.

Starting point is 00:32:06 We call it a spicy future. So you got to give us a hot take here. What is your heart take about Infra that most people don't believe in yet? I don't know. Okay, I'm not entirely sure if this is how spicy this is. So if it is not spicy, let me know. But as we answered 2026, I don't believe big models are just going to eat the world and we'll have nothing left.

Starting point is 00:32:34 When you look at, you know, cloud, software, paradigm shifts over the past 20 years, certainly the Googles and Facebooks and Apple of the world, are absolute BEM off, massive, massive companies, and maybe Open AI and Anthropic are seen as those comparables. But generally, the world doesn't operate in a zero-sum or kind of narrowing pie sort of way. Google and Facebook and Apple really just enabled more businesses to enable more use cases and solve more problems

Starting point is 00:33:06 and become, you know, power so many different, they are the platforms for many, many large companies on top of that. And so whether that be specialized models, we recently got a chance to know an upcoming kind of computer use specific model called Open AGI, or that be specific models fine-tuned to specific tasks that an agent will need to do. And then the infrastructure and tooling around it,

Starting point is 00:33:29 I think what we're seeing already in the coding agent space is that jobs are not going away. They're certainly changing for any developer. Our jobs are changing, and the way we write code is changing. But I think we're right. writing more code, or shipping more interesting things. It feels like the world is expanding, not kind of like eating into existing bandwidth or jobs.

Starting point is 00:33:53 And so I simply just don't believe that, yeah, Open AI is going to take away all of our jobs, all of our, you know, every possible startup idea in the world, and they're just going to be one or two winners. I think the answer for me is that the pie is expanding, and that's a really exciting place to be building in. I fully agree with that, that assignment. And I think that the thing that's hard in transition phases is like, you know, we just, I mean, we talked like literally like 10 minutes ago or whatever, but we're still pre-native pre-agent.

Starting point is 00:34:22 We're pre-agent native software, right? Like, you can kind of imagine if you're playing with this stuff, like where this all goes, but it's not here. And we're in this like weird transitory phase. And what we're experiencing in certain ways is like a consolidation while that opens up all these new operations. opportunities that we don't yet understand. And that's like the tricky thing to wrap our heads around. And it's one of those things that's like, you see it at points in time. But then there's all this sort of, there's always the pro con. You see the point of success. And then you see like the fear. And it's just the concept of change. I'm really interested to hear like as you think

Starting point is 00:34:58 about that. You talk about a lot of the foundation models. Now we're not going to end up with just four companies in the world that rule everything, even though sometimes like the stock market might feel that way. How do you think the future of an agent's as like the core fundamental component of the web has always been this sort of federated thing? It's led this as massive long tail experience. What do you think the future of agent native apps are and what, how, like, is it special, like, what centers around what retains that federation, that long tail? Is it specialized models? Is it specialized data sets? Is it still that we need agents that are super specialized use case? Is it collaboration? Because if you think about the web in 1999, it was like, hey, literally you could have a website for every interest.

Starting point is 00:35:38 And there was just so many endless interests that it created this like federated Internet where there's like all these places to go for every weird thing. And then it's sort of evolved from there. I'm kind of curious if you've thought about how the network effects and the motes occur such that you still end up with like, yeah, it's all changing, but we're still going to end up like this is sort of federated long tail thing. Is there like some technical barrier or something they think will drive that? Are you asking if like, you know, is Open AI, the Google where ultimately, even though we live in the open web, Google and Google search and Google Oz kind of, you know, had this kind of core hold over how people experience the Internet, is that why you're? So, yeah, and, you know, that's actually a really good example, right? Like, you know, we have 10 blue links and that was the entry door of the Internet. And Google found its way to have a toll.

Starting point is 00:36:27 And then, but it didn't have all the information, so it was still federated. Like, couldn't suck it in. He's still a 10 blue link, so it's still like kind of worked. And then you kind of open AI or pick your model vendor. Like it could even be Gemini, right? Like you don't go to the website anymore. Where do I go? And what creates that locality?

Starting point is 00:36:45 Like, we don't have the locality. Yeah. So it's probably related to data. I'm just curious if you thought about like what formulates those little bubbles. Like previously it was, well, if I wanted to talk about sporting cards, like, you know, trading cards for sports, I went to like NHL trading cards. you know, dotio or dot com or whatever in 1999. And that sort of like then led to, you know,

Starting point is 00:37:05 all this consolidation and community formulation and stuff that ended up happening. But I'm kind of curious, like, if you have a view on that because I think there's like a lot of different, no one really knows. And it's the biggest question I actually ask. And so I was curious if you had thought about it. Yeah.

Starting point is 00:37:19 I mean, it is one of the biggest risks. And it already exists today, right? Whether it be Google with the blue links or our social media echo chambers. you know, it's really hard. We both want things that are personalized to us, but then in order to do that, are often, you know, the algorithm is machine learning, agents, AI,

Starting point is 00:37:42 whatever, you know, whatever form of technology that then serves that to us is often in the hands of larger entities. And I think that, especially as we, you know, as a consumer, as we start to delegate more and more work to agents, that risk becomes larger because now we're in less, autonomous control, although I would almost argue that we may feel like we're in control of our internet experience today as humans. And you step out of your social media bubble to any other

Starting point is 00:38:13 place in the world. And you find that we're talking about these things as if they're already in existence in the vast majority, don't even know what AI is. We're still talking about in very high-level strokes once you leave Silicon Valley. And so that is certainly, of risk, and I think that's from like a, you know, optimistic perspective, the ability for LLMs potentially fine-tuned or not to understand true human personalization, intent, etc., potentially offers the ability for us to be served at things that we really, you know, are unique to us. And I think if you look at that from an optimistic perspective, it's never been easier to find your niche. Once you break through the.

Starting point is 00:38:59 average mean that is chat GPT's average response and push the LLM a little bit further. And the beauty of LLM is that it does offer, you know, the ability to build context and personalized understanding of, you know, your unique perspective, your varying interests, et cetera. And so I think there could be absolutely a world where that enables, in the best case scenario, the ability for you to find random card website.com that was, you know, not service in the first three pages of Google search. But it's a big open question, right? Because, you know, as you think about, you know, there's a, there's a lot of conversations with creators and bloggers and, yeah, niche people and niche parts of the internet about how can they exist when you have AI aggregators,

Starting point is 00:39:45 Google search, just surfacing information directly from their content. It's a big open question for the industry. I tend to be a very, take a very optimistic stance about where we had it. I think that as a founder of a startup working in the AI space, that is something I feel I really want to enact upon the world. I don't want us to end up in kind of a doomsday scenario where we are all small inputs to a larger LLM. And so I see that as kind of like my duty as well to be pushing that boundary forward. And, you know, kernel is an infrastructure platform that allows agents to access the internet. And so the more that we can do our part to enable, you know, good in the world through agents, that's what we're most excited about.

Starting point is 00:40:28 Very cool. Well, if anybody listening, want to try out your product, when I learn more, where should they go? Yeah. Check out kernel.sh. We have an awesome free tier and hobby tier that you can, if you are building the AI agents and you want to access the internet, you can easily get started from our website, give it a try. And if you have feedback, we have a Discord community as well. So I'm happy to help out answer questions, help you build your agents.

Starting point is 00:41:00 We'd love to see more and more developers building agents that are doing interesting things on the internet. And hopefully we can help with that. Awesome. Well, thank you so much being on. And I think we had so much to fun chatting about Infra and browsers. Thanks so much. Thank you.

The Infra Pod - From 30 Seconds to 20ms: Solving Browser Speed for AI Agents (Chat with Catherine from Kernel)

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.