The Infra Pod - AI needs a browser infra! Chat with Paul from Browserbase

Starting point is 00:00:00 Welcome to the Infrapod. This is Tim from Essence and let's go, Ian. today. Paul, tell us a little about yourself. What got you started as a developer, but more importantly, why in the world did you start BrowserBase? What was the insight? Yeah guys, great to be here. I'm a huge fan of the podcast. I think it's a big lack of great in-fro focus pods, especially by developers for developers. And you know, it's cool to kind of be here because I'm also a DevTools guy first and foremost. And that's really what shaped BrowserBase. What is BrowserBase? BrowserBase is a infrastructure platform for running headless browsers. So we're a very verticalized infrastructure platform. We only do headless browsers.

Starting point is 00:00:53 We're kind of a jack of one trade, master of one trade, if you'll give us that, you know? And the reason why we do this is that headless browsers are kind of a uniquely complex system to run. If you haven't heard of a headless browser before, it's basically the same browser that you're running on your computer, you know, Chrome, but it's running on a server environment. And if you've run Chrome on your computer, you've probably got a hundred stack overflow

Starting point is 00:01:14 tabs or I guess a hundred, you know, clod tabs these days. And you know that it's just a lot of memory consumption, it's high on CPU. If you want to run that on a server environment, it's even harder, especially if you're doing it like in a serverless way where you're spinning up and down many browsers. It's a stateful distributed system. Each browser is state. It's distributed. You have to talk to it via WebSocket.

Starting point is 00:01:33 So a lot of challenges in running many headless browsers in prod at scale. A headless browser is also different from a regular browser that we use because it doesn't have a GUI. The way that you interact with a Headless Browser is with code. A Headless Browser is like a code for a browser, you know? So there's a lot of these kind of browser automation frameworks, Puppeteer, Playwright, Selenium, and our new framework, Stagehand, that are built for controlling a browser running in production. And those frameworks kind of send these commands to the browser's kind of DevTools port and say,

Starting point is 00:02:06 move the mouse here, click the mouse here. And just kind of putting all that together, it's a hard problem. And it's a problem that's often not core to what a business is doing. It's not going to help them find PMF to invent browser infrastructure. And that's where we kind of come in. We are kind of like your browser team. If you want to run a thousand headless browsers in production, we have not only the infrastructure to do that and the right price points and cost effectiveness to make that great, but also all these great features around like,

Starting point is 00:02:32 we record what happens in the browser for you. We capture all these nice logings. We can know the resource utilization of the browser so you can kind of manage that appropriately. So everything you need to run headless browsers in production, that's what BrowserBase does. So we got so much stuff I'm going to ask you about. This is going to be super exciting, man. We're going to talk a lot about browsers, obviously,

Starting point is 00:02:51 but we want to also mention how you got to even thinking about starting a company. Because there obviously has to be a belief, right? That there's got to be a lot of agents that need browsers. Unless BrowserBase is not just for AI agents, right? So let me talk about sort of like the motivation or the belief system here. Are you believing that in the near future, there's going to be some percentage or large percentage of AI agents all accessing the browser? Do you have sort of like almost like what you're seeing in the market when you started and what you're seeing has grown now.

Starting point is 00:03:26 Like, do you see a delta already happening? I'm just curious, like, what are you seeing has been evolving in this AI agent space that leads you to believe like, oh, this browser agent is going to be everywhere and be used predominantly in production in all different places. Because it's hard to tell, you know, from the outside. So, we'd love to see, get your little bit of insights here. Yeah, I mean, the two questions in there is like,

Starting point is 00:03:48 why start BrowserBase? And then, what is my view on how BrowserBase relates to AI agents and the future of AI agents browsing the web? So maybe I'll cover the first one for context and go a little bit more in depth on the second one. So why start BrowserBase? I wish I had a really cool answer that I was like this AI researcher, or I just was like early at DeepMind, but it actually is a

Starting point is 00:04:09 completely opposite approach. I started my career at Twilio. I was actually an intern during the IPO, which is just a crazy time to start a career. You're at the IPO party and everyone's getting rich and you're just having a great time, right? But I got to see a category-defining infrastructure company really kind of grow from a hundred billion in revenue to a billion plus and 10 X and people and all this stuff. I also kind of like after that started my own company called Stream Club Stream Club was building this browser based live streaming product. And to do that we had to use a headless browser as part of the video composition engine basically instead of doing the heavy lifting on your computer. We know the heavy lifting on a computer, we did all the heavy lifting

Starting point is 00:04:45 on a headless browser in the cloud. We did all the video encoding there. We sold that company to Mux. Mux is a video API company, another great infrastructure company. And I got to kind of see them operationalize headless browsers at scale for video encoding and video streaming and really approached this problem

Starting point is 00:05:01 not from an AI first perspective. I had seen that headless browsers are hard to run because I spent like two years of my life as CTO of my startup trying to make this stuff work. And I kind of joked that I would never start another company after Stream Club unless it was a headless browser company. Because I just got super in the weeds on that, you know, topic. And if you look on YouTube, I gave like a talk in 2021 about how to use headless browsers with GPUs.

Starting point is 00:05:24 And like I just spent so much time banging my head against this kind of like painful, unloved stack headless browsers are basically like, really we're designed for testing or like sketchy web automation. It's never been like the direct path. And that really kind of dovetails into like, why now are we starting browser base and what is the opportunity that we see? I think more and more it's possible to automate many websites in a dynamic way.

Starting point is 00:05:48 You know, the rise of LLM code gen makes it so that you can go to a website and actually generate the code that controls that website. And that kind of expands this market dramatically because previously people were writing one to one scripts. If you wanted to automate a website, you had to write a single script for that single website. And if the website changes, your script breaks, right? If you wanted to automate 10 website, you had to write a single script for that single website. And if the website changes, your script breaks, right? If you wanted to automate 10 websites, that's 10 scripts. But now with LLMs, you can kind of feed in the context of the website and say, hey, click this specific button. And maybe that's a DOM approach or using a vision model.

Starting point is 00:06:16 There's a lot more possibilities for browser automation that just weren't possible until, you know, the rise of LLMs. And I think that really took off in 2024. And that's actually when I started BrowserBase. I kind of come up to my two years at Mucks, and I was like, you know, I'm probably one of the people who knows a lot about this and have seen it at scale at Mucks and at my startup shrink club.

Starting point is 00:06:39 And I have a lot of opinions about how it should be done. I felt like I tried everything out there, and nobody was building like a Vercell or Stripe or Clark level experience for browser infrastructure. And I think a lot of people in the early days kind of just said, oh, this is just like a small piece of the project. It's not actually the biggest problem to solve. But I actually was like, no, this thing is super hard and no one does it really well.

Starting point is 00:07:00 And I wanted to build something that was a great infrastructure company within, you know, browser infrastructure. And the biggest giveaway, and it was in our pre-seed deck, is if you look at the number of Playwright installs per month over the last two years, you can see it go from like a million or two million to ten million. Now, there has been some consolidation around Playwright being the best framework, but if you aggregate that with all the frameworks, just the usage of all those have gone up quite a bit. And you have to wonder why. I don't think people are writing a ton more tests.

Starting point is 00:07:31 Maybe they are doing that too with CodeGen, but more and more people, more and more developers are using browser automation to do meaningful tasks online, and they need a headless browser to do that. I think back to my days as a developer, the first thing I wanted to build was something that would automate work for me. Like that was the appeal of code. I can actually have the computer do stuff for me.

Starting point is 00:07:51 And the first thing you do is like, I'm gonna try and scrape a website, or I'm gonna try and click a button on this website. I'm gonna try and figure out what my football practice is or something, right? So there's always been a need for this primitive of browser automation. The tech has kind of always been ignored

Starting point is 00:08:04 because it's been really hard to operationalize and get many different automations out there. But now with the rise of LLMs, it's much, much more possible. And I just didn't see anybody in the market who was building the product that I wanted as a developer, which really is like inspired by these great infrastructure companies, but for a very specific vertical, which is browser infrastructure. Maybe kind of to kind of end that, you asked like, what am I seeing that people aren't seeing? I think in 2024, we really served a lot of like bleeding edge companies.

Starting point is 00:08:30 Like, you know, when you're an early stage startup, a lot of your other customers were early stage startups. What they were doing was really innovative. Like they're coming up with new ways to kind of use LLMs to control a browser. And some of those things were really starting to work in 2024. I think now with OpenAI's operator being released and entropic computer use, people are now

Starting point is 00:08:48 seeing, oh, the models are going to get better, cheaper and faster. That means that they can control things like a web browser. And the possibilities for automation of human work just goes up quite a lot. And you still need to run a headless browser to kind of do that. And we hope that we can solve that specific problem for many, many successful AI companies out there. you still need to run a headless browser to kind of do that. And we hope that we can solve that specific problem for many, many successful AI companies out there. AI is like one use case. You kind of have Sage Hand, which is from a QBuild, and you have Operator, which is the open AI version of it.

Starting point is 00:09:20 Can you help us understand, if you think about the use cases for a headless browser? they wouldn't know that this is actually a core component of most people's path production. But I'm curious, what do you think is, what are the biggest use cases and how does this all break down? Yeah, for sure. And just to note, you kind of talked about Stagehand versus Operator, you know, two very different things. Operator is a model that's driving a browser.

Starting point is 00:09:59 Stagehand is a framework for building web agents. So think of Operator as a web agent, an agent that can control a web browser. Stagehand is a framework for building web agents. So think of operator as like a web agent, an agent that can control a web browser. StateChain is a framework for building web agents with a little bit more determinism. We can go into that more later, but you're asking about use cases. It's hard because it's so horizontal. We have customers doing the weirdest little thing. Like they are automating the compliance check of their oil field in Texas. Or, hey, they're helping you get rebates on your food stamps by submitting the form for you, so you don't have to go through this complex rebate process.

Starting point is 00:10:32 Some of those people are not using AI at all. People always wanted to automate interaction with legacy software and put a beautiful UI on top of it. I think that what people think about when they think about browser automation, they immediately go to like scraping, they immediately go to solving captures or buying tickets or booking restaurants. And those are obviously like, you know, big categories, but ones that we don't serve very much because it's going to be way more efficient to go get first party API access to the flight booker or first party API access to open table. But

Starting point is 00:11:03 what's not going to happen is my barbershop down the street, Joe's barbershop, been going there for 10 years. Love those guys. They're never going to add an API for getting on the wait list. I've asked. It's not going to happen. They had the same form that they built five years ago, maybe 10 years ago now, that you got to go fill out to get on this wait list. If we want AI agents to be an extension of

Starting point is 00:11:23 ourself and do work for us, they're going to have to use a blend of API's that are first party and web browsers for things that don't support this integrations to really kind of meet us where we are and meet the internet where it is. So it's really horizontal in terms of use cases, I would say the primitives that we see are often form filling, page data extraction, button clicking, and maybe like screenshotting and file downloading management,

Starting point is 00:11:47 or kind of like the big ones. So like any combination of like, when is a human going to a page, reading some data on that page, potentially filling in a form, and potentially processing files, downloaded and uploaded to those forms, those are like the building blocks that we can see,

Starting point is 00:12:03 implications in procurement, go to market, insurance, legal tech, gov tech, way, there's like every vertical AI company in the market map has a use case for browser automation. Some people are a little more coy to talk about it because they feel like it's a secret sauce, but everybody is doing some sort of browser automation because that's really what the future of software is. The future of software is software doing work for you. And if software is doing work for you and we all work in a web browser on websites, the future software is going to need access to a web browser just like we do. So that's kind of my vision for browser base is that this is just a really necessary primitive

Starting point is 00:12:43 for, you know, this agentic software, this AI software that's kind of coming to happen right now. And in the early days, you know, we have hundreds of customers now. It's pretty exciting to see how vastly different a lot of their use cases are. It's really interesting. So I'm curious, kind of like, what is the hardest? I have my own experience here. My first company I worked at, it's called Go Instinct, I bought my Salesforce and it was like this ridiculous man, kind of like what is the hardest? I think there's really two angles on this. It was a hotness, and that was the only way to do it to where we are today,

Starting point is 00:13:50 there's been pretty big changes. So I'm curious to learn what those changes have been and how that simplified the developer process. And then second is what have we learned in terms of making things snappy and repeatable and deterministic? wasn't really there. In terms of from the developer perspective, the CSS render hadn't completed or some Ajax hadn't completed in the background and so then you had to build all these weights and end up being slow and terrible in the worst thing in the world. So I'm curious, Paul, educate me on the latest, greatest in browser. Well, can I put that quote on our website first from Ian? It's the slowest, terrible, worst thing in the world. Yeah, you can verbatim take that as a clip.

Starting point is 00:14:30 I want you to clip that. Yeah. Hey, chat, clip it. The hardest part about browser automation, at least pre-AI and even a little bit now with AI, is that you're writing deterministic code for a non-deterministic website. Websites change all the time, they behave differently, networks can be different. The website's having a slow day.

Starting point is 00:14:53 One of our customers, they do some automation healthcare space and the website they're automating, it just goes down all the time. It's just like a normal thing that maintenance windows and like their script will fail and it doesn't know how to handle that, doesn't know how to know that it's down because a normal thing they have maintenance windows and like their script will fail and it doesn't know how to handle that. Doesn't know how to know that it's down because it changes the error message.

Starting point is 00:15:08 So you really are kind of dealing with a lot of non-determinism, which is the worst thing to be doing when you're writing code, right? Code wants regulated inputs and outputs. We want types, right? What's challenging beyond just handling the non-determinism, which AI does help with because now we can add dynamic code generation.

Starting point is 00:15:23 If we see an element we're not familiar with, we can try and understand that and basically have a little bit more flow control that's a little bit more non-deterministic. But actually, operationalizing that is really challenging too. I think a lot of developers can get something working locally. But when they want to go to prod, they just run into so many foot guns.

Starting point is 00:15:40 And I think this is a pretty technical audience, so I'm going to nerd out for a second, so stay with me. Let's just walk through the nerd out for a second. So stay with me, right? Let's just like walk through the challenges of deploying a browser. So you have this amazing playwright script working locally, or maybe you build something with like Anthropix computer use and it's controlling your browser. Let's go put that in prod. Okay, it's maybe just a node script. So I'll just use a Lambda, you know, I'm an Amazon fanboy, get push deploy set the Lambda. Oh Oh shoot, this browser is actually too big for a Lambda

Starting point is 00:16:06 because it's 250 something megabytes. That's the limit on Lambdas. Okay, I'll use a Lambda layer. I'll sneak it in. Okay, I get my browser running on there. But Lambdas are very much performance constrained. You get like what? One vCPU, you can really run like one process at a time.

Starting point is 00:16:21 You're not getting a lot of performance. That's the thing, it's running really slowly. Okay, I need a bigger instance. We'll go get EC2 instance. Okay, I need a bigger instance. We'll go get EC2 instance. Okay, I have one browser running my EC2 instance, but I wanna run more browsers. Okay, how do I do this?

Starting point is 00:16:32 We'll make a Docker image. We'll do Kubernetes. Now I'm like running a Kubernetes cluster of these images and they're running out of memory and they're crashing. I don't know why. So I have to go add observability in. I have to like capture all this logging. And then at this point, you're gonna to go to a website, it's working, you have the observability

Starting point is 00:16:49 and you get blocked. You know, you run into a CAPTCHA or like you just can't, like Cloudflare blocks you so you have to go buy proxies because if you're coming from an Amazon IP, you're going to get blocked. So you have to go through residential proxy network, you have to figure out if you want to go buy this sketchy CAPTCHA solver with Bitcoin online, you know, it's a lot of like pain. And then at the final point, you finally have something working. And you've just gone through so much effort. And I think like developers definitely underestimate the complexity of running headless browsers in production. As you scale that out

Starting point is 00:17:19 even more, you know, you have to start thinking about like, well, how do I secure this? Like, this browser is going to any website, maybe it's customer input. What if it downloads a bad file? What if someone uses an exploit? Chromium is open source, right? So every time there's a security patch, people reverse engineer the security patch

Starting point is 00:17:34 against former versions. So you have to constantly be updating your Chromium binary. And this is exposed to the open internet, so if it goes to a bad website, someone can try and get into your Amazon cloud. So I think there's a lot of hard like, kind of like faults that people run into as they try and operationalize.

Starting point is 00:17:50 A simple script that ran on their shiny MacBook Pro in their home, Wi-Fi connection, it's like, it works fine on my machine. That doesn't really connect with running this in production and that's where we see a lot of the frustration. It's becoming easier and easier to build these scripts, but actually running them in production at scale is like a major problem for our customers. I'm curious, can you help us understand the Delta of like what, you know, okay, so when

Starting point is 00:18:15 back in 2014, I use Sauce Labs a lot, which was backed by Selenium, right? And like Sauce Labs is still, I was just looking at their website, they're still, they exist. Right? Sauce Labs is the big monkey in the room, for lack of a better word, or the big elephant in the room. What was your view of, hey, there actually is need in this space and they're not fulfilling it? I'm curious because it's both, I'm sure the answer here is actually highly technical and infrastructure-related one, but there's also an interesting go-to-market thing,

Starting point is 00:18:55 which I think is just useful to talk about when you're dealing with infrared in general. The truth is the world of infrared is massive and there's always these massive untapped wedge niche markets that no one thinks about. So I'm kind of curious to get your perspective on that before we dive into another area. Yeah, I think I think like the the simplest answer and I'll preface this by saying I have so much respect for the selenium team and

Starting point is 00:19:16 Sauce labs and what they've done for the industry without a doubt like they push the thing forward and Testing is better off with sauce labs and we're not a testing company about space We're a browser mission company, but whenseSpace, we're a Browse Automation company. But when I use Sauce Labs and I use Versel or StreetStripe, it is not even in the same league of developer experience, right? And I've grown up working at these companies that have such a high bar for developer experience

Starting point is 00:19:41 that I knew that that's something that has to be within your DNA as a company. From the day one, you have to care about the DX of the product, the quality of your dashboard. If you're doing a PLG motion, like browser-based is, you know, we sell to individual developers and then they come use us a ton and we say, hey, let's get you on a better deal contract at volume. Sauce Labs, more sales led. It's like, hey, we're gonna go hit up everybody and go sell them directly. Sales led can lead to a more fragmented product because you're incentivized to do the things that

Starting point is 00:20:09 help close deals. With Browserbase, what we really want to do is power as many of the builders as possible. So we get a huge index. And we have hundreds of customers after one year. And what that means is we get to see a bunch of different stuff that people are doing and really figure out, how do we build the right features for everybody

Starting point is 00:20:26 and have this really cohesive product experience that's centered around developer experience. And our KPI is like number of signups that are going to production deploys. And it kind of centers around how do we help developers get zero to one much faster. In terms of like the two technical differences between companies, like Sauce Labs built for testing,

Starting point is 00:20:42 support for many different browsers, support for mobile, very anchored in the testing world. Browserbase, people have built testing companies on Browserbase, but really oriented more towards browser automation. Single browser, we're only Chrome. We don't do mobile. You can have a mobile viewport and fingerprint,

Starting point is 00:20:57 but it's not mobile first. It's really like, we'll give you a browser and give you everything you need to operationalize and build production applications on top of headless browsers. And you're not going to have to think about using different browsers. You don't really need that if you're just trying to automate a website, you just want a browser that works everywhere every time. So I think you provided really good. Both the views of like the previous browser based companies, I would say

Starting point is 00:21:20 that definitely more focused on testing than any other spaces. And sort of like the infrastructure required to even run browser in production. I guess I'm really curious today, in 2025, seems like a lot of the automations people are building, or in the past I feel like a lot of browser stuff was more scraping, you know? I think probably the most common is I want to get some information out. I just want to go figure out how to get XPath selectors to run in a very reliable way and repeatably.

Starting point is 00:21:55 But today, I guess with Stagehand and these frameworks, even going to your website, the very first thing is write a prompt. It'll do something for you. It's all really like AI is like the center of everything now. So, May, can you talk about maybe what are the things beyond just getting your browser to run reliably and beyond just able to like debug things from a session interceptor and all this stuff. Is there anything particular that, hey, we need to add these primitives because AI is doing things a little bit more differently now. Is there anything

Starting point is 00:22:24 particular that these things are doing? Or do you think it's more like an onion? Like, hey, we have basically the same foundations, but stagehand is our AI framework or something like that. Maybe helping understand, is there something that AI is changing how browsers be interacted at all? Yeah, there's a ton of differences. And, you know, frankly, we've only done the low-hanging fruit.

Starting point is 00:22:48 Like there's so much on our roadmap still that's really oriented around supporting, you know, our browser automation framework, Stagehand, as well as like other popular AI-powered browser automation frameworks. That is yet to come. But kind of directionally, we had to go build this fundamental infrastructure because that's the building blocks on which you can actually make reliable scalable production applications. Like we are so confident that browser automation is going to be necessary for a lot of customers, a lot of people who are building new companies and a lot of established companies, that one of the first things we did is like how the hell are we gonna scale this thing to many regions? You know, we're pushing the limits of Kubernetes clusters. You have to start sharding clusters at one point

Starting point is 00:23:28 because you have so many individual pods running. Those are hard engineering problems to solve. And doing that with, you know, firecracker and Kata containers, you know, all these things around, like, performant VMs. It's pretty still early days for some of those runtimes. And kind of, you're going to run into some sharp edges as you productionize them at scale like BrowserBase has. And that's really year one of BrowserBase is like, let's make this the best browser infrastructure out there. And we still have a lot of work to do. Now in parallel, we built Stagehand. And Stagehand is kind of our response to seeing so many of our customers have to reproduce the same thing where they're looking at the DOM, the HTML on the page,

Starting point is 00:24:06 they're putting it into an LLM, and then they're trying to generate some code to control that DOM. Now, there's a lot of interesting techniques people have done here. Some people will take a screenshot of a modified DOM with a bunch of labels on it. This is called set of mark prompting and put that into a vision language model of LLM and then have it say, this is the box you want to click on,

Starting point is 00:24:25 and then generate the code that way. Okay. We saw a bunch of customers doing that. We saw a bunch of customers doing a DOM, and then they may turn it to markdown, and then they prompt engineer the DOM so it's more accurate, and then generating code off and playwright code to go click on that button. So for us, like Stagehand is really taking

Starting point is 00:24:42 all of those best practices that we've seen in papers from our customers, like in the WhatsApp group chats about, you know, web agents, and putting it into an open source framework that allows you to kind of use these three primitives, act, extract, and observe to take actions on a page, click the button, extract structured data from a page, extract the things that browser-based supports into the array of strings. Then observe, which will actually give you a list of possibilities that you can prompt for this page to help ground any agent approach where it's like

Starting point is 00:25:13 maybe there's a high-level goals by the shoes. Observe, what's the next action should take from this list to go by the shoes? Oh, you want to go to the search bar and search for shoes. Okay, do that, observe again. So you can build this kind of agent loop. Act, observe, and extract. And we made this open source because, one, it just felt like it was super important.

Starting point is 00:25:32 We've taken a bunch of inspiration from people. We also come up with some interesting stuff. I think we've really pushed the boundaries on some stuff with turning the DOM into an accessibility tree, using Chrome's kind of native alley tree functionality. And that's been really cool kind of stuff we've started doing. But making it open source because we really worked with the community, but also we just

Starting point is 00:25:50 know that developers don't want to tie themselves to a closed source framework. They want to be able to change it. And I think the overwhelming feedback I heard from developers about trying AI software out there is that, like, you're at the constraints of the developer who's tuning the prompts behind the scenes. And instead, what if we gave a framework to developers where they can actually fork it,

Starting point is 00:26:10 change it, they can write their own prompts to get more reliability. And most importantly, this is the key principle of what stage hand is, we're trying to add more determinism and reliability to browser automation with AI. Right now with things like Oper operator or other frameworks out there, it's generally you're giving a single prompt and just letting it do its thing. It can go down four different paths to go by that set of shoes. If you're building production AI applications, you really want to have repeatability.

Starting point is 00:26:39 That's why we have all these observability features, so you can actually see what's happening and be able to consistently check that your agents is doing the right thing. But also in your framework, you probably want to have more guardrails around how it goes out and actually automate something on the web. You may want to say, you know, you're going to be given a website that's like any startup website and your goal is to book a demo call. So what are the steps?

Starting point is 00:27:00 Observe, find the demo call button. Act, click on the demo call button. Act, go book the next available demo call. Here's my email, my name, my phone number. And giving it those guardrails actually tends to improve reliability. Now it tends to cater us to developers who actually are trying to more automate like a SOP, a standard operating protocol. They have like this workflow they want to run on every website in the world or any website they're given by a user and they kind of know the steps they want to take so they website in the world or any website they're given by a user. And they kind of know the steps they want to take

Starting point is 00:27:27 so they can get more reliability out of it. They don't want to give it some open-ended thing because they can't trust that. That's where staging comes in. It's one of the only frameworks that kind of like self-imposes more constraints to give itself more reliability when building production applications. We actually just crossed 200,000 NPM downloads in a month,

Starting point is 00:27:44 which is awesome because we just launched it in like November. And we've seen really cool results, like someone in our community channel at the Uniswap team, and they just switched all their end-to-end tests over to Stagehand because it was proving to be more reliable for handling some of these fuzzy changes that were happening when they were kind of building these tests. So I think it's been really cool to see that developers who are building AI web automation in production are using Stagehand. It certainly has a little bit of a higher learning

Starting point is 00:28:10 curve than some of the stuff out there. It's not going to take a single prompt. You kind of have to lay stuff out. But we think that learning curve is necessary for actually building the applications that work. And I think going back to your question you had a long time ago, you were like, well, what's changed this year? With the model quality continuing to get better, we're seeing more and more stuff work reliably. The scope of these agents can really increase, and the possibilities for automation are really just rapidly increasing. I mean, that's pretty compelling.

Starting point is 00:28:35 So I have like a final question in terms of how you think about the future of agents. I mean, you kind of started at the beginning when I asked the question, you kind of framed operators as generalized agent, right? With like, and at the top level, what operate question, you kind of framed operators as generalized agent, right? With like, and at the top level would operate like the magic of operators or like the planning component, which is like, okay, you give me one prompt and I build you this whole plan, hey, and then I'm just gonna go and do all this stuff.

Starting point is 00:28:54 And, you know, open AI is operator like motion is like, let's just focus on making the planning incredible because the better plan will result in, you know, better outcomes is, you know, broadly the idea. It sounds like what you're focused on is, ignore the top level planning thing, that's not what we're going to do. What we're going to do is we're going to make that

Starting point is 00:29:11 interaction between whatever the plan is, which is what you build developer, we're going to make the interaction between the plan and the actual browser doing the stuff the best possible layer, right? Like we're just going to make that incredibly deterministic, which means that like your top level, whether you're using a planning agent to create a plan to tell the browser to do a thing or your hand writing the stuff in code,

Starting point is 00:29:30 like these are separate problems and we're gonna focus on this bottom one that's closest to the browser. Is that basically correct? I'm putting that quote on the website, too. I don't know what you're doing today, Ian, but you're writing our next pitch. Product marketing. I should have been a product marketer. I don't know what I was. You're killing it. You're totally right. For us, we actually believe that the planning step has become somewhat fungible. People are using different models to do their planning step. There's different models who are better at planning and reasoning. And there's trade-offs like, do you want to spend money on a plan for this?

Starting point is 00:29:59 Developers are, from what I've seen, building a lot of their agent loops in-house. They're doing a lot of custom prompting there. They have different tolerances for, you know, stickiness of the plan, temperature on the prompt. And I think the, you know, when you're building an agent, the likelihood of your agent succeeding is directly correlated to the reliability of the low-level tools and actions. So our goal is like, if we can make act, extract, observe,

Starting point is 00:30:25 like really, really accurate, you're gonna have more accurate agents because the raising step can then handle when things don't go well. But if you have a tool that has like low probability of success, that's multiplicative across all your tool calls. If each tool is at 50% chance of success, your agent's gonna be pretty bad.

Starting point is 00:30:41 So you have to make the tools super reliable. And I think that we've made a lot of progress around station by really focusing on that kind of low-level atomic prompt that will actually convert to being a high-level success in the browser automation. I'm really curious to get your feedback. I have this fun little question I keep asking myself about this sort of long-tail problem. Because you position browser-based, specifically in AI as look, there's a long tail list of millions, trillions,

Starting point is 00:31:08 whatever number of websites that don't have APIs and never will. But I am curious to hear from your perspective, you know, as more of the web is automated by things like operator or things that people built using browser base or headless browsers, like what do you think the response from the website builders are?

Starting point is 00:31:24 Like what do you think happened to front end? Does front end die? Do we have smaller websites? What's the future of the way that we actually build applications as a result of the fact that the browser love it or hate it is being abstracted increasingly from the human? And the UI specific was being increasingly abstracted away from the human. How much time do we have for this question? I can go on. We have as much time as we want time do we have for this question? I can go on.

Starting point is 00:31:45 Oh, we have as much time as we want. Yeah, yeah. Great. Because I can go on for hours. Let me answer with a few analogies first, and then I'll kind of go into my take. I've been lucky to be able to spend some time with Jeff Lawson, the former CEO and founder of Twilio. He's just been a great advisor to BrowserBase. And when he was pitching Twilio, people would tell him, well, okay, you're building this texting API, but SMS is going to go away. It's going to be RCS or WhatsApp or something.

Starting point is 00:32:09 And so why build this? Why build on a legacy protocol? And he's like, sure, it's going to go away, but it's going to be a long time. And they built a multi-billion dollar revenue company out of that, right? So I think in terms of like the success of browser base as a company, I have the same viewpoint that it's gonna take a long time

Starting point is 00:32:28 for us to be a new internet that's AI first. I think Neuralink might come sooner than us rewriting the whole internet. And then at that point, you know, we're just all gonna be dialing to each other's brains, right? I also think about this like kind of quote from Elon Musk about, you know, humanoid robots, like why make robots look like people? Well, the world was built for people, right? And I think for a long time, we're gonna have to build websites for people and maybe agents.

Starting point is 00:32:51 And I think if computers are just as smart as we are, if not smarter, why would we have to build a different interface for them when they can just use the same interface we do? Now, as engineers, we think about efficiency. Like, oh, we'll be more efficient through the protocol, which I think is true, but I think they're going to get really, really fast and really, really good at doing these

Starting point is 00:33:09 things. And I think the people building the websites or the people using the websites are going to be the constraint. I imagine a future where there's probably going to be a requirement that every website has a people version too, so people who don't have agents can use it. So I don't think the web is gonna change as fast as we think it will. I do think it's gonna be a blend of first party APIs

Starting point is 00:33:30 for high volume use cases. It doesn't make sense for everyone to book a flight via a website. If every agent is doing that, we should have direct connections there. It's gonna be way more efficient. But people are still gonna be building websites for a long time.

Starting point is 00:33:41 If anything, the rate of building websites has increased with LLMs. People can use vZero, they can use Bolt. They're like turning out new websites day and night. So I often get pushed back on BrowseSpace. People say, we'll just build a new API layer for the internet. We'll have agents.txt. We'll have all this stuff. And I think those are all helpful and they grease the wheels of this problem. But it's going to be a long time before we have a new framework or anything coming out. I come from the world of identity. That was my job at Twilio. I was like doing the login team, we were doing SSO. We're still using SAML.

Starting point is 00:34:08 It's been 10 years, you know? Like we're still using all these identity protocols. And even like authentication for AI agents is a big category people are talking about a lot. I think it's gonna be a really hard problem to solve because it's hard to get new frameworks adopted and it's hard to rewrite legacy code. I think it'll be a problem we face

Starting point is 00:34:24 when we're a billion dollar company, if not 10 billion dollar company. And at that point, we can come back on the pod and see if these priors were right. And hopefully the world's going the way we think it is. Yeah, that's super interesting. Maybe just directionally, you know, in the same realms of questions, LLMs are going to generate a lot of websites and web apps. We have a lot of these app builders out there

Starting point is 00:34:49 and we can see like 10X, 100X, even thousands of them out there. But I think what we've seen is for AI to be able to consume these contents for agents able to answer questions and stuff like that, we've seen also the introduction of LM.txt, LM.flow.txt. Things are just able to answer questions and stuff like that. We've seen also the introduction of lm.txt, lm.fo.txt. Like things are just able to actually help the agents to consume information better from just pure websites.

Starting point is 00:35:13 If you just go documentation, hey, here's lm agents. Go look at this instead, right? Do you see maybe there's like an intermediate thing for websites in the future? Is it all just going to be lm.full.txt? Do you see some other variations? Because I feel like if you're read-only and you just want some low, very dense documentation or text, you can do that.

Starting point is 00:35:37 But for visually guiding agents to do something better, like if I want agents to do something better. Like if I want agents to do something better on my website and maybe being an API, it's not worth it to do a production separation here. Do you see there's some kind of intermediate thing that would be helpful? Or do you see people already start to think about that? Or do you think that's, hey, this is probably too messy in the middle. There's not worth investing in this at all. Well, browser base is definitely not a web scraping company for this reason. I think the read-only is going to become just way more efficient. You know, generally the way people do web scraping, headless browser is the last choice.

Starting point is 00:36:16 They're trying to curl a website, get the HTML first, and then maybe they're trying to do maybe a more advanced type of curl with different headers if they're getting blocked. And then they fall back to a fully hydrated version of the page using a headless browser if there's a lot of client-side scripts being loaded, like on Airbnb.com or something. So I think web scraping is really a race to zero. You know, it's a commodity business.

Starting point is 00:36:35 It's like you're going to try and get the data as cheap as possible. And tools to make it easier for LLMs to ingest web data from websites are going to be really valuable because people do a ton of web scraping. BrowserBase is like, we're not betting on web scraping

Starting point is 00:36:49 being the category that helps us. If anything, as an entrepreneur, I want to sell something for pennies that people make dollars off of. And automation is really more aligned with that in a margins perspective. We charge like $0.10 per browser hour. And that's hopefully an hour of someone's time

Starting point is 00:37:04 is worth more than $0.10. I think it's worth like $10.10 per browser hour. And that's hopefully an hour of someone's time is worth more than $0.10. I think it's worth like $10 at least, right? If you're getting an hour's worth of people's work out of this browser automation, I think that's just much better from a margins perspective, but also just a value creation perspective. So for us, I think that you're right. I agree that there's probably stuff like lms.techs.

Starting point is 00:37:21 And I think the team at Mintlify has been doing a fantastic job of like, let's just incorporate this into every docs out there. And those, and I think the team at Mintlify has been doing a fantastic job of like, just like, let's just incorporate this into every docs out there. And those platforms, like, I think the people best suited to solve the auth problem are also like the auth platforms, like Clark Stitch and Okta. People who are publishing things like Vercell

Starting point is 00:37:35 could easily publish a lms.txt for every website on the Vercell platform, right? I think they're very well suited to do that because read-only is like non-destructive. But for write or actions on the web, it's really hard for you to come up with like a generic solution for that. And I'm excited to see what happens. And if there is something happens, then we'll react to that appropriately. But I haven't seen a way where you can have a generic like LLMs.txt for controlling a website yet. Maybe Browseways can be the first to offer that.

Starting point is 00:38:04 We'll have to see. Well, it's time for our favorite section here called the spicy future. Spicy futures. So Paul, tell us what you believe that most people don't believe in yet. It's funny because I feel like I got all my hot takes out throughout the pod. We weren't holding back on some of this stuff, right? We talked about, you know, the future of the internet. Maybe my hottest take is probably going to be around the future of authentication

Starting point is 00:38:33 and really the future of CAPTCHAs, right? I think everyone thinks about this with browser automation. They're like, well, what about the CAPTCHAs? And BrowseBase does do CAPTCHA solving as a feature. You know, CAPTCHAs are kind of a short-term problem because one, like, it's going to be really hard to keep up with AI and in that new little mini games for it to play. But two, they're mostly like fingerprint checking right now. The captures may be more of a facade behind more sophisticated algorithms for checking if you're a bot or not. And I

Starting point is 00:38:58 think captures right now, they block good and bad bots, you know, they just block everybody. And I think there's gonna be a world where good bots will be able to be authenticated and act on behalf of a person. I actually truly believe that the internet's not gonna say, we don't want bots anywhere. They're gonna realize that agents and bots, you can call them bots or agents,

Starting point is 00:39:19 they're the same thing, right? They're gonna realize that agents actually add value by using their product. Maybe they're actually using Workday more because the HR IT person doesn't want to do this. He's a web agent to go automate Workday. So I think these authentication providers and these capture companies are going to have to figure out a way to identify proof of delegated personhood.

Starting point is 00:39:40 And I think a lot of the systems we have set up, like pass keys, which are very much tied to your hardware, are going to need to be shared with your agent in some way. And I don't think we're set up for that yet. So my hot take is that there's gonna be a reckoning with anti-bot versus agents, where they both have to agree on some sort of KYC intermediary, where I can say, hey, I'm browser-based, I'm representing Tim at Essence,

Starting point is 00:40:05 here's his proof of personhood, will you let us through? And some kind of KYC layer may be necessary to allow good bots to roam the web without being impacted by the bad bots. And that may give up some privacy because you have to say, this person is using this browser for this thing. And I think that's always been a hard part

Starting point is 00:40:21 with like proof of personhood online. So challenging problem. I'm optimistic it'll be solved. I don't think there has to be an infinite cat and mouse game. I think there is right now because that's just the way the web is. AI has grown faster than the internet can react to it. But you know, my hot spicy take in the future is like, captures aren't going to be blocking bots for too much longer.

Starting point is 00:40:39 They may be letting some bots through. And I've spent some time with, you know, some of the leading anti-bot providers and I think they're open to that that. I think there's going to be some exciting partnerships happening soon. That's a great take. It's something I think about on a pretty regular basis. You know, like what's the future of identity for an agent and how does how much the intermediary do we have the protocols? And we definitely don't. But I think like one of the things we talked about that kind of resonated, we talked about so far was this idea of like,

Starting point is 00:41:05 look, there's this massive long tail websites and they're always going to need like, no one's gonna go back and fix this thing because it just isn't like a reason to. But you know, the, the best websites, you know, the ones we spend all of our time, the ones that drive lots of revenue. So they're, you know, the top 1000 or maybe 10,000 websites are going to, you know, want to build agent-first experiences. Because they'll just eventually decide, hey, look, the distribution channel has moved from Google to, let's say, OpenAI operator.

Starting point is 00:41:35 Or maybe Google just says, okay, we're still the front door of the internet, but we've distracted away. You don't going to the website on behalf of Paul or it's Google going on the website on behalf of Ian and like it's a completely different interaction mechanism and more importantly the broadness of the UI and the UX is last because what they're actually more focused on it's just like we have SEO they're focused on like how do I optimize for the fact that like the front door of the internet has changed and I won't have the best possible experience in the front door of the internet has changed, and so things like operator or whatever We're very far from that in the sense that we have to start seeing some patterns and there has to be changes in the way that people work.

Starting point is 00:42:28 And so things like Operator or whatever have to become more sophisticated. But I definitely agree with your hot spicy tag, that's the future. I think the economic reason that will drive the best in class website to do those things is because they don't care about the eyeballs, they care about the interaction with the user and the money at the end of the day. And as long as they don't lose their brand value,

Starting point is 00:42:47 right? Like I think the one thing you can take away from the fact that Shopify is like a hundred billion dollar company is the fact that brands actually care about their brand and the experience that their buyers have. And that's the net value. That's why Nike's Nike, right? Like it's you're buying a brand. It's why Lululemon works. It's why all these things work. You're buying a brand and experience and you're sending a message. The's why Lululemon works. It's why all these things work. You're buying a brand and experience. You're sending a message. The brand value is very important. So we I'm really interested to see like what happens between that interplay between a like deep consolidation of UI UX,

Starting point is 00:43:14 but also brands still wanting to be brands. Yeah. You know, I really doubt that you're going to let your AI agent go shop for a brand that you love. Like, I think that shopping experience is still very, gonna be human, and you're gonna do it on your phone or on a website. I don't think those can't coexist. If I need to go buy laundry detergent, I don't think I care about the brand experience

Starting point is 00:43:33 of my laundry detergent. I just wanna buy the one that's the best bang for the buck. So it's super natural to kinda think about the future of AI plus people as being like these two outcomes, but it's gonna be in the messy middle, you know? And I think it's gonna take time for it to figure out, kind of think about the future of AI plus people as being like these two outcomes, but it's going to be in the messy middle, you know, and I think it's going to take time for it to figure out. And I'm actually, maybe I'm too optimistic, but that's how you're a founder, right? You got to be optimistic about the future. But all of the signs I'm

Starting point is 00:43:57 seeing is that we're all open to doing less tedious work online and we want to do the work that matters. I want to shop with, do that analogy. I want to do the work that matters. I want to shop, if you do that analogy, I want to do the shopping that's fun for me. I don't want to do the shopping that's not fun for me. I want to book my trip and like look at all the flights and have fun doing that. I don't want to book my business trip where I just need to get out there by a certain time, right? It's not eliminating all of the work we're doing. It's just leaving the fun stuff that we care about that people will really cater to. So I think that's just like a really exciting feature. The feature is going to be super cool and it's really great to be building a small part of that at BrowserBase. Awesome.

Starting point is 00:44:32 Well, we have so much more we could ask, but for the sake of time, and I think we have so much good stuff we already got from you, tell us where can people find out more about BrowserBase or you? Like, should it go Browserbase.com? What are the places to get using this awesome product? Absolutely. So if you want to learn more about Browserbase, it's just Browserbase.com. And if you want to learn about Stagehand, our open source AI browser automation framework, it's Stagehand.dev.

Starting point is 00:45:01 And check it out. It's actually really easy to get started. You do npx create-browser-app and you can have this cool little interactive browser automation thing. I think Ian's running it right now as I'm looking at him. So, you know, it's pretty fun to try. We've a great community in our Slack. So feel free to join.

Starting point is 00:45:17 And if you have any burning questions about browser automation, my Twitter is at pk underscore ivy. Hit me up. Awesome. Thank you so much, Paul. It's been such a pleasure. Awesome guys. Yeah, happy to be here. Thanks so much.

Your Ad Here

The Infra Pod - AI needs a browser infra! Chat with Paul from Browserbase

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.