The Infra Pod - AI needs a browser infra! Chat with Paul from Browserbase
Episode Date: April 21, 2025In this episode of the Infra Pod, Tim from Essence and Ian Livingston host Paul Klein, CEO of Browser Base, to discuss the intricacies and future of browser automation, the challenges of running headl...ess browsers, and the emergence of AI-driven web agents.00:29 What's Browserbase?01:20 Challenges with Headless Browsers02:57 AI Agents and Headless Browsers09:21 The Future of Browser Automation13:26 Technical Challenges in Browser Automation18:10 Differences between Browserbase and Other Tools21:12 Addressing Use Cases and Developer Experience
Transcript
Discussion (0)
Welcome to the Infrapod.
This is Tim from Essence and let's go, Ian. today. Paul, tell us a little about yourself. What got you started as a developer, but more importantly, why in the world did you start BrowserBase? What was the insight? Yeah guys, great to be here. I'm
a huge fan of the podcast. I think it's a big lack of great in-fro focus pods, especially by
developers for developers. And you know, it's cool to kind of be here because I'm also a DevTools guy
first and foremost. And that's really what shaped BrowserBase. What is BrowserBase? BrowserBase is a infrastructure platform
for running headless browsers.
So we're a very verticalized infrastructure platform.
We only do headless browsers.
We're kind of a jack of one trade, master of one trade,
if you'll give us that, you know?
And the reason why we do this is that headless browsers
are kind of a uniquely complex system to run.
If you haven't heard of a headless browser before,
it's basically the same browser that
you're running on your computer, you know, Chrome, but it's running on a server environment.
And if you've run Chrome on your computer, you've probably got a hundred stack overflow
tabs or I guess a hundred, you know, clod tabs these days.
And you know that it's just a lot of memory consumption, it's high on CPU.
If you want to run that on a server environment, it's even harder, especially if you're doing
it like in a serverless way where you're spinning up and down many browsers.
It's a stateful distributed system.
Each browser is state.
It's distributed.
You have to talk to it via WebSocket.
So a lot of challenges in running many headless browsers in prod at scale.
A headless browser is also different from a regular browser that we use because it doesn't
have a GUI. The way that you interact with a Headless Browser is with code.
A Headless Browser is like a code for a browser, you know?
So there's a lot of these kind of browser automation frameworks,
Puppeteer, Playwright, Selenium, and our new framework, Stagehand,
that are built for controlling a browser running in production.
And those frameworks kind of send these commands to the browser's kind of DevTools port and say,
move the mouse here, click the mouse here. And just kind of putting all that together,
it's a hard problem. And it's a problem that's often not core to what a business is doing. It's
not going to help them find PMF to invent browser infrastructure. And that's where we kind of come
in. We are kind of like your browser team. If you want to run a thousand headless browsers
in production, we have not only the infrastructure
to do that and the right price points
and cost effectiveness to make that great,
but also all these great features around like,
we record what happens in the browser for you.
We capture all these nice logings.
We can know the resource utilization of the browser
so you can kind of manage that appropriately.
So everything you need to run headless browsers
in production, that's what BrowserBase does.
So we got so much stuff I'm going to ask you about. This is going to be super exciting, man.
We're going to talk a lot about browsers, obviously,
but we want to also mention how you got to even thinking about starting a company.
Because there obviously has to be a belief, right?
That there's got to be a lot of agents that need browsers.
Unless BrowserBase is not just for AI agents, right? So
let me talk about sort of like the motivation or the belief system here. Are you believing
that in the near future, there's going to be some percentage or large percentage of AI agents all
accessing the browser? Do you have sort of like almost like what you're seeing in the market when
you started and what you're seeing has grown now.
Like, do you see a delta already happening?
I'm just curious, like, what are you seeing has been evolving
in this AI agent space that leads you to believe like,
oh, this browser agent is going to be everywhere
and be used predominantly in production in all different places.
Because it's hard to tell, you know, from the outside.
So, we'd love to see, get your little bit of insights here.
Yeah, I mean, the two questions in there is like,
why start BrowserBase?
And then, what is my view on how BrowserBase relates
to AI agents and the future of AI agents browsing the web?
So maybe I'll cover the first one for context
and go a little bit more in depth on the second one.
So why start BrowserBase?
I wish I had a really cool answer
that I was like this AI researcher, or I just was like early at DeepMind, but it actually is a
completely opposite approach. I started my career at Twilio. I was actually an
intern during the IPO, which is just a crazy time to start a career. You're at
the IPO party and everyone's getting rich and you're just having a great time,
right? But I got to see a category-defining infrastructure company
really kind of grow from a hundred billion in revenue to a billion plus and 10 X and people and all this stuff.
I also kind of like after that started my own company called Stream Club Stream Club was building this browser based live streaming product.
And to do that we had to use a headless browser as part of the video composition engine basically instead of doing the heavy lifting on your computer.
We know the heavy lifting on a computer, we did all the heavy lifting
on a headless browser in the cloud.
We did all the video encoding there.
We sold that company to Mux.
Mux is a video API company,
another great infrastructure company.
And I got to kind of see them operationalize
headless browsers at scale for video encoding
and video streaming and really approached this problem
not from an AI first perspective.
I had seen that headless browsers are hard to run
because I spent like two years of my life as CTO of my startup
trying to make this stuff work.
And I kind of joked that I would never start another company after Stream Club
unless it was a headless browser company.
Because I just got super in the weeds on that, you know, topic.
And if you look on YouTube, I gave like a talk in 2021 about how to use headless browsers with GPUs.
And like I just spent so much time banging my head against this kind of like
painful, unloved stack headless browsers are basically like, really we're
designed for testing or like sketchy web automation.
It's never been like the direct path.
And that really kind of dovetails into like, why now are we starting browser
base and what is the opportunity that we see?
I think more and more it's possible
to automate many websites in a dynamic way.
You know, the rise of LLM code gen makes it so that you can go to a website and actually
generate the code that controls that website.
And that kind of expands this market dramatically because previously people were writing one
to one scripts.
If you wanted to automate a website, you had to write a single script for that single website.
And if the website changes, your script breaks, right? If you wanted to automate 10 website, you had to write a single script for that single website. And if the website changes, your script breaks, right? If you wanted to automate 10 websites,
that's 10 scripts. But now with LLMs, you can kind of feed in the context of the website and say,
hey, click this specific button. And maybe that's a DOM approach or using a vision model.
There's a lot more possibilities for browser automation that just weren't possible until,
you know, the rise of LLMs. And I think that really took off in 2024.
And that's actually when I started BrowserBase.
I kind of come up to my two years at Mucks,
and I was like, you know, I'm probably
one of the people who knows a lot about this
and have seen it at scale at Mucks
and at my startup shrink club.
And I have a lot of opinions about how it should be done.
I felt like I tried everything out there,
and nobody was building like a Vercell
or Stripe or Clark level experience for browser infrastructure.
And I think a lot of people in the early days kind of just said, oh, this is just like a
small piece of the project.
It's not actually the biggest problem to solve.
But I actually was like, no, this thing is super hard and no one does it really well.
And I wanted to build something that was a great infrastructure company within, you know, browser infrastructure.
And the biggest giveaway, and it was in our pre-seed deck, is if you look at the number of Playwright installs per month
over the last two years, you can see it go from like a million or two million to ten million.
Now, there has been some consolidation around Playwright being the best framework,
but if you aggregate that with all the frameworks, just the usage of all those have gone up quite
a bit.
And you have to wonder why.
I don't think people are writing a ton more tests.
Maybe they are doing that too with CodeGen, but more and more people, more and more developers
are using browser automation to do meaningful tasks online, and they need a headless browser
to do that.
I think back to my days as a developer,
the first thing I wanted to build
was something that would automate work for me.
Like that was the appeal of code.
I can actually have the computer do stuff for me.
And the first thing you do is like,
I'm gonna try and scrape a website,
or I'm gonna try and click a button on this website.
I'm gonna try and figure out what my football practice is
or something, right?
So there's always been a need
for this primitive of browser automation.
The tech has kind of always been ignored
because it's been really hard to operationalize and get many different automations
out there. But now with the rise of LLMs, it's much, much more possible. And I just didn't see
anybody in the market who was building the product that I wanted as a developer, which really is like
inspired by these great infrastructure companies, but for a very specific vertical, which is browser
infrastructure. Maybe kind of to kind of end that,
you asked like, what am I seeing that people aren't seeing?
I think in 2024, we really served a lot
of like bleeding edge companies.
Like, you know, when you're an early stage startup,
a lot of your other customers were early stage startups.
What they were doing was really innovative.
Like they're coming up with new ways to kind of use LLMs
to control a browser.
And some of those things were really starting
to work in 2024.
I think now with OpenAI's operator being released and entropic computer use, people are now
seeing, oh, the models are going to get better, cheaper and faster.
That means that they can control things like a web browser.
And the possibilities for automation of human work just goes up quite a lot.
And you still need to run a headless browser to kind of do that.
And we hope that we can solve that specific problem for many, many successful AI companies out there. you still need to run a headless browser to kind of do that.
And we hope that we can solve that specific problem for many, many successful AI companies out there.
AI is like one use case.
You kind of have Sage Hand, which is from a QBuild, and you have Operator, which is the open AI version of it.
Can you help us understand, if you think about the use cases for a headless browser?
they wouldn't know that this is actually a core component of most people's path production.
But I'm curious, what do you think is,
what are the biggest use cases and how does this all break down?
Yeah, for sure.
And just to note, you kind of talked about
Stagehand versus Operator, you know, two very different things.
Operator is a model that's driving a browser.
Stagehand is a framework for building web agents.
So think of Operator as a web agent,
an agent that can control a web browser. Stagehand is a framework for building web agents. So think of operator as like a web agent, an agent that can control a web browser. StateChain is a framework for building web agents with a little bit more
determinism. We can go into that more later, but you're asking about use cases. It's hard because
it's so horizontal. We have customers doing the weirdest little thing. Like they are automating
the compliance check of their oil field in Texas. Or, hey, they're helping you get rebates on
your food stamps by submitting the form for you,
so you don't have to go through this complex rebate process.
Some of those people are not using AI at all.
People always wanted to automate
interaction with legacy software and put a beautiful UI on top of it.
I think that what people think about when they think about browser automation,
they immediately go to like scraping, they immediately go to solving captures or buying
tickets or booking restaurants. And those are obviously like, you know, big categories,
but ones that we don't serve very much because it's going to be way more efficient to go
get first party API access to the flight booker or first party API access to open table. But
what's not going to happen is my barbershop down the street,
Joe's barbershop, been going there for 10 years. Love those guys.
They're never going to add an API for getting on the wait list.
I've asked. It's not going to happen.
They had the same form that they built five years ago,
maybe 10 years ago now,
that you got to go fill out to get on this wait list.
If we want AI agents to be an extension of
ourself and do work for us, they're going to have to use a
blend of API's that are first party and web browsers for
things that don't support this integrations to really kind of
meet us where we are and meet the internet where it is. So
it's really horizontal in terms of use cases, I would say the
primitives that we see are often form filling, page data
extraction, button clicking, and maybe like screenshotting
and file downloading management,
or kind of like the big ones.
So like any combination of like,
when is a human going to a page,
reading some data on that page,
potentially filling in a form,
and potentially processing files,
downloaded and uploaded to those forms,
those are like the building blocks that we can see,
implications in procurement, go to market, insurance, legal tech, gov tech, way, there's like every vertical AI company
in the market map has a use case for browser automation.
Some people are a little more coy to talk about it because they feel like it's a secret
sauce, but everybody is doing some sort of browser automation because that's really what the future of software is.
The future of software is software doing work for you.
And if software is doing work for you and we all work in a web browser on websites,
the future software is going to need access to a web browser just like we do.
So that's kind of my vision for browser base is that this is just a really necessary primitive
for, you know, this agentic software,
this AI software that's kind of coming to happen right now. And in the early days,
you know, we have hundreds of customers now.
It's pretty exciting to see how vastly different a lot of their use cases are.
It's really interesting. So I'm curious, kind of like, what is the hardest?
I have my own experience here. My first company I worked at, it's called Go Instinct, I bought my Salesforce and it was like this ridiculous man, kind of like what is the hardest?
I think there's really two angles on this. It was a hotness,
and that was the only way to do it to where we are today,
there's been pretty big changes.
So I'm curious to learn what those changes have been and how that simplified the developer process.
And then second is what have we learned in terms of making things snappy and repeatable and deterministic? wasn't really there.
In terms of from the developer perspective,
the CSS render hadn't completed or some Ajax hadn't completed in the background
and so then you had to build all these weights and end up being slow and terrible in the worst thing in the world.
So I'm curious, Paul, educate me on the latest, greatest in browser. Well, can I put that quote on our website first from Ian? It's the slowest, terrible, worst thing in the world.
Yeah, you can verbatim take that as a clip.
I want you to clip that.
Yeah.
Hey, chat, clip it.
The hardest part about browser automation, at least pre-AI and even a little bit now
with AI, is that you're writing deterministic code for a non-deterministic website.
Websites change all the time, they behave differently,
networks can be different.
The website's having a slow day.
One of our customers, they do some automation healthcare space
and the website they're automating,
it just goes down all the time.
It's just like a normal thing that maintenance windows and
like their script will fail and it doesn't know how to handle that,
doesn't know how to know that it's down because a normal thing they have maintenance windows and like their script will fail and it doesn't know how to handle that.
Doesn't know how to know that it's down
because it changes the error message.
So you really are kind of dealing with a lot
of non-determinism, which is the worst thing to be doing
when you're writing code, right?
Code wants regulated inputs and outputs.
We want types, right?
What's challenging beyond just handling the non-determinism,
which AI does help with because now we can add dynamic code
generation.
If we see an element we're not familiar with,
we can try and understand that and basically
have a little bit more flow control that's
a little bit more non-deterministic.
But actually, operationalizing that is really challenging too.
I think a lot of developers can get something working locally.
But when they want to go to prod,
they just run into so many foot guns.
And I think this is a pretty technical audience,
so I'm going to nerd out for a second, so stay with me.
Let's just walk through the nerd out for a second. So stay with me, right? Let's just like walk through the
challenges of deploying a browser. So you have this amazing playwright script
working locally, or maybe you build something with like Anthropix computer
use and it's controlling your browser. Let's go put that in prod. Okay, it's
maybe just a node script. So I'll just use a Lambda, you know, I'm an Amazon
fanboy, get push deploy set the Lambda. Oh Oh shoot, this browser is actually too big for a Lambda
because it's 250 something megabytes.
That's the limit on Lambdas.
Okay, I'll use a Lambda layer.
I'll sneak it in.
Okay, I get my browser running on there.
But Lambdas are very much performance constrained.
You get like what?
One vCPU, you can really run like one process at a time.
You're not getting a lot of performance.
That's the thing, it's running really slowly.
Okay, I need a bigger instance.
We'll go get EC2 instance. Okay, I need a bigger instance.
We'll go get EC2 instance.
Okay, I have one browser running my EC2 instance,
but I wanna run more browsers.
Okay, how do I do this?
We'll make a Docker image.
We'll do Kubernetes.
Now I'm like running a Kubernetes cluster of these images
and they're running out of memory and they're crashing.
I don't know why.
So I have to go add observability in.
I have to like capture all this logging.
And then at this point, you're gonna to go to a website, it's working, you have the observability
and you get blocked.
You know, you run into a CAPTCHA or like you just can't, like Cloudflare blocks you so
you have to go buy proxies because if you're coming from an Amazon IP, you're going to
get blocked.
So you have to go through residential proxy network, you have to figure out if you want
to go buy this sketchy CAPTCHA solver with Bitcoin online, you know, it's a lot of like pain. And then at the final point, you finally have something
working. And you've just gone through so much effort. And I think like developers definitely
underestimate the complexity of running headless browsers in production. As you scale that out
even more, you know, you have to start thinking about like, well, how do I secure this? Like,
this browser is going to any website,
maybe it's customer input.
What if it downloads a bad file?
What if someone uses an exploit?
Chromium is open source, right?
So every time there's a security patch,
people reverse engineer the security patch
against former versions.
So you have to constantly be updating your Chromium binary.
And this is exposed to the open internet,
so if it goes to a bad website,
someone can try and get into your Amazon cloud.
So I think there's a lot of hard like,
kind of like faults that people run into
as they try and operationalize.
A simple script that ran on their shiny MacBook Pro
in their home, Wi-Fi connection,
it's like, it works fine on my machine.
That doesn't really connect with running this in production
and that's where we see a lot of the frustration.
It's becoming easier and easier to build these scripts, but actually running them in production
at scale is like a major problem for our customers.
I'm curious, can you help us understand the Delta of like what, you know, okay, so when
back in 2014, I use Sauce Labs a lot, which was backed by Selenium, right?
And like Sauce Labs is still, I was just looking at their website, they're still, they exist. Right?
Sauce Labs is the big monkey in the room, for lack of a better word, or the big elephant in the room.
What was your view of, hey, there actually is need
in this space and they're not fulfilling it?
I'm curious because it's both, I'm sure the answer here
is actually highly technical and infrastructure-related one,
but there's also an interesting go-to-market thing,
which I think is just useful to talk about
when you're dealing with infrared in general.
The truth is the world of infrared is massive
and there's always these massive untapped wedge
niche markets that no one thinks about.
So I'm kind of curious to get your perspective
on that before we dive into another area. Yeah, I think I think like the
the simplest answer and I'll preface this by saying I have so much respect for the selenium team and
Sauce labs and what they've done for the industry without a doubt like they push the thing forward and
Testing is better off with sauce labs and we're not a testing company about space
We're a browser mission company, but whenseSpace, we're a Browse Automation company.
But when I use Sauce Labs and I use Versel or StreetStripe,
it is not even in the same league
of developer experience, right?
And I've grown up working at these companies
that have such a high bar for developer experience
that I knew that that's something that has to be
within your DNA as a company. From the day one, you have to care about the DX of
the product, the quality of your dashboard. If you're doing a PLG motion,
like browser-based is, you know, we sell to individual developers and then they
come use us a ton and we say, hey, let's get you on a better deal contract at
volume. Sauce Labs, more sales led. It's like, hey, we're gonna go hit up everybody
and go sell them directly. Sales led can lead to a more fragmented product
because you're incentivized to do the things that
help close deals.
With Browserbase, what we really want to do
is power as many of the builders as possible.
So we get a huge index.
And we have hundreds of customers after one year.
And what that means is we get to see a bunch of different stuff
that people are doing and really figure out,
how do we build the right features for everybody
and have this really cohesive product experience that's
centered around developer experience.
And our KPI is like number of signups
that are going to production deploys.
And it kind of centers around how do we help developers
get zero to one much faster.
In terms of like the two technical differences
between companies, like Sauce Labs built for testing,
support for many different browsers, support for mobile,
very anchored in the testing world.
Browserbase, people have built testing companies
on Browserbase, but really oriented more
towards browser automation.
Single browser, we're only Chrome.
We don't do mobile.
You can have a mobile viewport and fingerprint,
but it's not mobile first.
It's really like, we'll give you a browser
and give you everything you need to operationalize and build
production applications on top of headless browsers.
And you're not going to have to think about using different browsers. You don't really need that if you're just trying to automate a website,
you just want a browser that works everywhere every time.
So I think you provided really good.
Both the views of like the previous browser based companies, I would say
that definitely more focused on testing than any other spaces.
And sort of like the infrastructure required to even run browser in production.
I guess I'm really curious today, in 2025, seems like a lot of the automations people
are building, or in the past I feel like a lot of browser stuff was more scraping, you
know?
I think probably the most common is I want to get some information out.
I just want to go figure out how to get XPath selectors
to run in a very reliable way and repeatably.
But today, I guess with Stagehand and these frameworks,
even going to your website, the very first thing
is write a prompt.
It'll do something for you.
It's all really like AI is like the center of everything now. So, May, can you talk about maybe what are the things
beyond just getting your browser to run reliably and beyond just able to like debug things from a
session interceptor and all this stuff. Is there anything particular that, hey, we need to add
these primitives because AI is doing things a little bit more differently now. Is there anything
particular that these things are doing?
Or do you think it's more like an onion?
Like, hey, we have basically the same foundations,
but stagehand is our AI framework or something like that.
Maybe helping understand, is there something that AI is changing
how browsers be interacted at all?
Yeah, there's a ton of differences.
And, you know, frankly, we've only done the low-hanging fruit.
Like there's so much on our roadmap still that's really oriented around supporting, you know,
our browser automation framework, Stagehand, as well as like other popular AI-powered browser automation frameworks.
That is yet to come. But kind of directionally,
we had to go build this fundamental infrastructure because that's the building blocks on which you can actually make
reliable scalable production applications. Like we are so confident that
browser automation is going to be necessary for a lot of customers, a lot of people who are building new companies and a lot of
established companies, that one of the first things we did is like how the hell are we gonna scale this thing to many regions?
You know, we're pushing the limits of Kubernetes clusters. You have to start sharding clusters at one point
because you have so many individual pods running. Those are hard engineering problems to solve.
And doing that with, you know, firecracker and Kata containers, you know, all these things around,
like, performant VMs. It's pretty still early days for some of those runtimes. And kind of,
you're going to run into some sharp edges as you productionize them at scale like BrowserBase has. And that's really
year one of BrowserBase is like, let's make this the best browser infrastructure out there. And we
still have a lot of work to do. Now in parallel, we built Stagehand. And Stagehand is kind of our
response to seeing so many of our customers have to reproduce the same thing where they're looking
at the DOM, the HTML on the page,
they're putting it into an LLM,
and then they're trying to generate some code to control that DOM.
Now, there's a lot of interesting techniques people have done here.
Some people will take a screenshot of
a modified DOM with a bunch of labels on it.
This is called set of mark prompting and put that into
a vision language model of LLM and then have it say,
this is the box you want to click on,
and then generate the code that way.
Okay. We saw a bunch of customers doing that.
We saw a bunch of customers doing a DOM,
and then they may turn it to markdown,
and then they prompt engineer the DOM so it's more accurate,
and then generating code off
and playwright code to go click on that button.
So for us, like Stagehand is really taking
all of those best practices that we've seen in
papers from our customers, like in the WhatsApp group chats about, you know, web agents, and
putting it into an open source framework that allows you to kind of use these three primitives,
act, extract, and observe to take actions on a page, click the button, extract structured data
from a page, extract the things that browser-based supports into the array of strings.
Then observe, which will actually give you a list of
possibilities that you can prompt for this page to
help ground any agent approach where it's like
maybe there's a high-level goals by the shoes.
Observe, what's the next action should take from this list to go by the shoes?
Oh, you want to go to the search bar and search for shoes.
Okay, do that, observe again.
So you can build this kind of agent loop.
Act, observe, and extract.
And we made this open source because, one, it just
felt like it was super important.
We've taken a bunch of inspiration from people.
We also come up with some interesting stuff.
I think we've really pushed the boundaries on some stuff
with turning the DOM into an accessibility tree,
using Chrome's kind of native alley tree functionality.
And that's been really cool kind of stuff we've started doing.
But making it open source because we really
worked with the community, but also we just
know that developers don't want to tie themselves
to a closed source framework.
They want to be able to change it.
And I think the overwhelming feedback I heard from developers
about trying AI software out there is that, like,
you're at the constraints of the developer who's
tuning the prompts behind the scenes.
And instead, what if we gave a framework to developers where they can actually fork it,
change it, they can write their own prompts to get more reliability.
And most importantly, this is the key principle of what stage hand is, we're trying to add
more determinism and reliability to browser automation with AI.
Right now with things like Oper operator or other frameworks out there,
it's generally you're giving a single prompt and just letting it do its thing.
It can go down four different paths to go by that set of shoes.
If you're building production AI applications,
you really want to have repeatability.
That's why we have all these observability features,
so you can actually see what's happening and be able
to consistently check that your agents is doing the right thing.
But also in your framework, you probably want to have more guardrails around how it goes
out and actually automate something on the web.
You may want to say, you know, you're going to be given a website that's like any startup
website and your goal is to book a demo call.
So what are the steps?
Observe, find the demo call button.
Act, click on the demo call button.
Act, go book the next available demo call.
Here's my email, my name, my phone number. And giving it those guardrails actually tends to improve reliability.
Now it tends to cater us to developers who actually are trying to more automate like a SOP, a standard operating protocol.
They have like this workflow they want to run on every website in the world or any website
they're given by a user and they kind of know the steps they want to take so they website in the world or any website they're given by a user.
And they kind of know the steps they want to take
so they can get more reliability out of it.
They don't want to give it some open-ended thing
because they can't trust that.
That's where staging comes in.
It's one of the only frameworks that kind of like
self-imposes more constraints to give itself
more reliability when building production applications.
We actually just crossed 200,000 NPM downloads in a month,
which is awesome because we just launched it in like November.
And we've seen really cool results,
like someone in our community channel at the Uniswap team,
and they just switched all their end-to-end tests over to Stagehand
because it was proving to be more reliable for handling some of these fuzzy changes that were happening
when they were kind of building these tests.
So I think it's been really cool to see that developers who are building
AI web automation in production are using Stagehand. It certainly has a little bit of a higher learning
curve than some of the stuff out there. It's not going to take a single prompt. You kind of have to
lay stuff out. But we think that learning curve is necessary for actually building the applications
that work. And I think going back to your question you had a long time ago, you were like, well,
what's changed this year? With the model quality continuing to get better,
we're seeing more and more stuff work reliably.
The scope of these agents can really increase,
and the possibilities for automation are really just rapidly increasing.
I mean, that's pretty compelling.
So I have like a final question in terms of how you think about the future of agents.
I mean, you kind of started at the beginning when I asked the question,
you kind of framed operators as generalized agent, right?
With like, and at the top level, what operate question, you kind of framed operators as generalized agent, right? With like, and at the top level would operate
like the magic of operators or like the planning component,
which is like, okay, you give me one prompt
and I build you this whole plan,
hey, and then I'm just gonna go and do all this stuff.
And, you know, open AI is operator like motion is like,
let's just focus on making the planning incredible
because the better plan will result in, you know,
better outcomes is, you know, broadly the idea.
It sounds like what you're focused on is,
ignore the top level planning thing,
that's not what we're going to do.
What we're going to do is we're going to make that
interaction between whatever the plan is,
which is what you build developer,
we're going to make the interaction between the plan
and the actual browser doing the stuff
the best possible layer, right?
Like we're just going to make that incredibly deterministic,
which means that like your top level,
whether you're using a planning agent to create a plan to tell the browser to do a thing or your hand writing the stuff in code,
like these are separate problems and we're gonna focus on this bottom one that's closest to the browser. Is that basically correct?
I'm putting that quote on the website, too. I don't know what you're doing today, Ian, but you're writing our next pitch.
Product marketing. I should have been a product marketer. I don't know what I was. You're killing it.
You're totally right.
For us, we actually believe that the planning step has become somewhat fungible.
People are using different models to do their planning step.
There's different models who are better at planning and reasoning.
And there's trade-offs like, do you want to spend money on a plan for this?
Developers are, from what I've seen, building a lot of their agent loops in-house.
They're doing a lot of custom prompting there.
They have different tolerances for, you know,
stickiness of the plan, temperature on the prompt.
And I think the, you know, when you're building an agent,
the likelihood of your agent succeeding is directly correlated
to the reliability of the low-level tools and actions.
So our goal is like, if we can make act, extract, observe,
like really, really accurate,
you're gonna have more accurate agents
because the raising step can then handle
when things don't go well.
But if you have a tool that has like low probability
of success, that's multiplicative across all your tool calls.
If each tool is at 50% chance of success,
your agent's gonna be pretty bad.
So you have to make the tools super reliable.
And I think that we've made a lot of progress around station by really focusing on that
kind of low-level atomic prompt that will actually convert to being a high-level success
in the browser automation.
I'm really curious to get your feedback.
I have this fun little question I keep asking myself about this sort of long-tail problem.
Because you position browser-based, specifically in AI as look,
there's a long tail list of millions, trillions,
whatever number of websites that don't have APIs
and never will.
But I am curious to hear from your perspective,
you know, as more of the web is automated
by things like operator or things that people built
using browser base or headless browsers,
like what do you think the response
from the website builders are?
Like what do you think happened to front end?
Does front end die?
Do we have smaller websites?
What's the future of the way that we actually build applications as a result of the fact
that the browser love it or hate it is being abstracted increasingly from the human?
And the UI specific was being increasingly abstracted away from the human.
How much time do we have for this question?
I can go on. We have as much time as we want time do we have for this question? I can go on.
Oh, we have as much time as we want. Yeah, yeah.
Great. Because I can go on for hours.
Let me answer with a few analogies first, and then I'll kind of go into my take.
I've been lucky to be able to spend some time with Jeff Lawson, the former CEO and founder of Twilio.
He's just been a great advisor to BrowserBase.
And when he was pitching Twilio, people would tell him,
well, okay, you're building this texting API, but SMS is going to go away.
It's going to be RCS or WhatsApp or something.
And so why build this?
Why build on a legacy protocol?
And he's like, sure, it's going to go away,
but it's going to be a long time.
And they built a multi-billion dollar revenue company out
of that, right?
So I think in terms of like the success of
browser base as a company, I have the same viewpoint that it's gonna take a long time
for us to be a new internet that's AI first. I think Neuralink might come sooner than us
rewriting the whole internet. And then at that point, you know, we're just all gonna
be dialing to each other's brains, right? I also think about this like kind of quote
from Elon Musk about, you know, humanoid robots, like why make robots look like people? Well,
the world was built for people, right?
And I think for a long time,
we're gonna have to build websites for people
and maybe agents.
And I think if computers are just as smart as we are,
if not smarter,
why would we have to build a different interface for them
when they can just use the same interface we do?
Now, as engineers, we think about efficiency.
Like, oh, we'll be more efficient through the protocol,
which I think is true, but I
think they're going to get really, really fast and really, really good at doing these
things.
And I think the people building the websites or the people using the websites are going
to be the constraint.
I imagine a future where there's probably going to be a requirement that every website
has a people version too, so people who don't have agents can use it.
So I don't think the web is gonna change as fast
as we think it will.
I do think it's gonna be a blend of first party APIs
for high volume use cases.
It doesn't make sense for everyone to book a flight
via a website.
If every agent is doing that,
we should have direct connections there.
It's gonna be way more efficient.
But people are still gonna be building websites
for a long time.
If anything, the rate of building websites
has increased with LLMs.
People can use vZero, they can use Bolt. They're like turning out new websites day and night. So I often get pushed
back on BrowseSpace. People say, we'll just build a new API layer for the internet. We'll have agents.txt.
We'll have all this stuff. And I think those are all helpful and they grease the wheels of this problem.
But it's going to be a long time before we have a new framework or anything coming out. I come from
the world of identity. That was my job at Twilio. I was like doing the login team, we were doing SSO.
We're still using SAML.
It's been 10 years, you know?
Like we're still using all these identity protocols.
And even like authentication for AI agents
is a big category people are talking about a lot.
I think it's gonna be a really hard problem to solve
because it's hard to get new frameworks adopted
and it's hard to rewrite legacy code.
I think it'll be a problem we face
when we're a billion dollar company, if not 10
billion dollar company.
And at that point, we can come back on the pod and see if these priors were right.
And hopefully the world's going the way we think it is.
Yeah, that's super interesting.
Maybe just directionally, you know, in the same realms of questions, LLMs are going to
generate a lot of websites and web apps.
We have a lot of these app builders out there
and we can see like 10X, 100X,
even thousands of them out there.
But I think what we've seen is
for AI to be able to consume these contents
for agents able to answer questions and stuff like that,
we've seen also the introduction of LM.txt, LM.flow.txt. Things are just able to answer questions and stuff like that. We've seen also the introduction of lm.txt, lm.fo.txt.
Like things are just able to actually help the agents
to consume information better from just pure websites.
If you just go documentation, hey, here's lm agents.
Go look at this instead, right?
Do you see maybe there's like an intermediate thing
for websites in the future?
Is it all just going to be lm.full.txt?
Do you see some other variations?
Because I feel like if you're read-only and you just want some low, very dense documentation
or text, you can do that.
But for visually guiding agents to do something better, like if I want agents to do something better. Like if I want agents to do something better on my website
and maybe being an API, it's not worth it to do a production separation here. Do you
see there's some kind of intermediate thing that would be helpful? Or do you see people
already start to think about that? Or do you think that's, hey, this is probably too messy
in the middle. There's not worth investing in this at all.
Well, browser base is definitely not a web scraping company for this reason.
I think the read-only is going to become just way more efficient.
You know, generally the way people do web scraping, headless browser is the last choice.
They're trying to curl a website, get the HTML first, and then maybe they're trying
to do maybe a more advanced type of curl with different headers if they're getting blocked.
And then they fall back to a fully hydrated version
of the page using a headless browser
if there's a lot of client-side scripts being loaded,
like on Airbnb.com or something.
So I think web scraping is really a race to zero.
You know, it's a commodity business.
It's like you're going to try and get the data
as cheap as possible.
And tools to make it easier for LLMs
to ingest web data from websites
are going to be really valuable
because people
do a ton of web scraping.
BrowserBase is like, we're not betting on web scraping
being the category that helps us.
If anything, as an entrepreneur, I
want to sell something for pennies
that people make dollars off of.
And automation is really more aligned with that
in a margins perspective.
We charge like $0.10 per browser hour.
And that's hopefully an hour of someone's time
is worth more than $0.10. I think it's worth like $10.10 per browser hour. And that's hopefully an hour of someone's time is worth more than $0.10.
I think it's worth like $10 at least, right?
If you're getting an hour's worth of people's work
out of this browser automation, I
think that's just much better from a margins perspective,
but also just a value creation perspective.
So for us, I think that you're right.
I agree that there's probably stuff like lms.techs.
And I think the team at Mintlify has
been doing a fantastic job of like,
let's just incorporate this into every docs out there. And those, and I think the team at Mintlify has been doing a fantastic job of like, just like, let's just incorporate this
into every docs out there.
And those platforms, like, I think the people best suited
to solve the auth problem are also like the auth platforms,
like Clark Stitch and Okta.
People who are publishing things like Vercell
could easily publish a lms.txt for every website
on the Vercell platform, right?
I think they're very well suited to do that
because read-only is like non-destructive.
But for write or actions on the web, it's really hard for you to come up with like a generic
solution for that. And I'm excited to see what happens. And if there is something happens,
then we'll react to that appropriately. But I haven't seen a way where you can have a generic
like LLMs.txt for controlling a website yet. Maybe Browseways can be the first to offer that.
We'll have to see.
Well, it's time for our favorite section here called the spicy future.
Spicy futures.
So Paul, tell us what you believe that most people don't believe in yet.
It's funny because I feel like I got all my hot takes out throughout the pod.
We weren't holding back on some of this stuff, right?
We talked about, you know, the future of the internet.
Maybe my hottest take is probably going to be around the future of authentication
and really the future of CAPTCHAs, right?
I think everyone thinks about this with browser automation.
They're like, well, what about the CAPTCHAs?
And BrowseBase does do CAPTCHA solving as a feature.
You know, CAPTCHAs are kind of a short-term problem because one, like, it's
going to be really hard to keep up with AI and in that new little mini games for it to
play. But two, they're mostly like fingerprint checking right now. The captures may be more
of a facade behind more sophisticated algorithms for checking if you're a bot or not. And I
think captures right now, they block good and bad bots, you know, they just block everybody.
And I think there's gonna be a world
where good bots will be able to be authenticated
and act on behalf of a person.
I actually truly believe that the internet's not gonna say,
we don't want bots anywhere.
They're gonna realize that agents and bots,
you can call them bots or agents,
they're the same thing, right?
They're gonna realize that agents actually add value
by using their product.
Maybe they're actually using Workday more because the HR IT person doesn't want to do
this.
He's a web agent to go automate Workday.
So I think these authentication providers and these capture companies are going to have
to figure out a way to identify proof of delegated personhood.
And I think a lot of the systems we have set up, like pass keys, which are very much tied
to your hardware, are going to need to be shared with your agent in some way.
And I don't think we're set up for that yet.
So my hot take is that there's gonna be a reckoning
with anti-bot versus agents,
where they both have to agree on some sort
of KYC intermediary, where I can say,
hey, I'm browser-based, I'm representing Tim at Essence,
here's his proof of personhood, will you let us through?
And some kind of KYC layer may be necessary
to allow good bots to roam the web
without being impacted by the bad bots.
And that may give up some privacy
because you have to say,
this person is using this browser for this thing.
And I think that's always been a hard part
with like proof of personhood online.
So challenging problem.
I'm optimistic it'll be solved.
I don't think there has to be an infinite cat and mouse game.
I think there is right now because that's just the way the web is.
AI has grown faster than the internet can react to it.
But you know, my hot spicy take in the future is like,
captures aren't going to be blocking bots for too much longer.
They may be letting some bots through.
And I've spent some time with, you know, some of the leading
anti-bot providers and I think they're open to that that. I think there's going to be some exciting partnerships
happening soon. That's a great take. It's something I think about on a pretty regular
basis. You know, like what's the future of identity for an agent and how does
how much the intermediary do we have the protocols? And we definitely don't. But I think like one of
the things we talked about that kind of resonated, we talked about so far
was this idea of like,
look, there's this massive long tail websites and they're always going to need like, no
one's gonna go back and fix this thing because it just isn't like a reason to.
But you know, the, the best websites, you know, the ones we spend all of our time, the
ones that drive lots of revenue.
So they're, you know, the top 1000 or maybe 10,000 websites are going to, you know, want
to build agent-first experiences.
Because they'll just eventually decide,
hey, look, the distribution channel has moved from Google to, let's say, OpenAI operator.
Or maybe Google just says, okay, we're still the front door of the internet,
but we've distracted away. You don't going to the website on behalf of Paul or it's Google going on the website on behalf of Ian and like it's a completely different
interaction mechanism and more importantly the broadness of the UI and the UX is last because
what they're actually more focused on it's just like we have SEO they're focused on like how do
I optimize for the fact that like the front door of the internet has changed and I won't have the
best possible experience in the front door of the internet has changed,
and so things like operator or whatever We're very far from that in the sense that we have to start seeing some patterns and there has to be changes in the way
that people work.
And so things like Operator or whatever have to become more
sophisticated.
But I definitely agree with your hot spicy tag, that's the
future.
I think the economic reason that will drive the best in class
website to do those things is because they don't care about
the eyeballs, they care about the interaction with the user
and the money at the end of the day. And as long as they don't lose their brand value,
right? Like I think the one thing you can take away from the fact that Shopify is like a hundred
billion dollar company is the fact that brands actually care about their brand and the experience
that their buyers have. And that's the net value. That's why Nike's Nike, right? Like it's you're
buying a brand. It's why Lululemon works. It's why all these things work. You're buying a brand
and experience and you're sending a message. The's why Lululemon works. It's why all these things work. You're buying a brand and experience. You're sending a message.
The brand value is very important.
So we I'm really interested to see like what happens
between that interplay between a like deep consolidation of UI UX,
but also brands still wanting to be brands.
Yeah. You know, I really doubt that you're going to let your AI agent
go shop for a brand that you love.
Like, I think that shopping experience is still very, gonna be human,
and you're gonna do it on your phone or on a website.
I don't think those can't coexist.
If I need to go buy laundry detergent,
I don't think I care about the brand experience
of my laundry detergent.
I just wanna buy the one that's the best bang for the buck.
So it's super natural to kinda think about the future
of AI plus people as being like these two outcomes,
but it's gonna be in the messy middle, you know? And I think it's gonna take time for it to figure out, kind of think about the future of AI plus people as being like these two outcomes, but
it's going to be in the messy middle, you know, and I think it's going to take time
for it to figure out. And I'm actually, maybe I'm too optimistic, but that's how you're
a founder, right? You got to be optimistic about the future. But all of the signs I'm
seeing is that we're all open to doing less tedious work online and we want to do the
work that matters. I want to shop with, do that analogy. I want to do the work that matters. I want to shop, if you do that analogy, I want to do the shopping that's fun for me. I don't want to do the shopping that's not fun
for me. I want to book my trip and like look at all the flights and have fun doing that. I don't
want to book my business trip where I just need to get out there by a certain time, right? It's not
eliminating all of the work we're doing. It's just leaving the fun stuff that we care about that
people will really cater to. So I think that's just like a really exciting feature. The feature is going to be super cool and it's really great
to be building a small part of that at BrowserBase.
Awesome.
Well, we have so much more we could ask,
but for the sake of time,
and I think we have so much good stuff we already got from you,
tell us where can people find out more about BrowserBase or you?
Like, should it go Browserbase.com?
What are the places to get using this awesome product?
Absolutely. So if you want to learn more about Browserbase, it's just Browserbase.com.
And if you want to learn about Stagehand, our open source AI browser automation framework, it's Stagehand.dev.
And check it out. It's actually really easy to get started.
You do npx create-browser-app
and you can have this cool little interactive
browser automation thing.
I think Ian's running it right now as I'm looking at him.
So, you know, it's pretty fun to try.
We've a great community in our Slack.
So feel free to join.
And if you have any burning questions
about browser automation, my Twitter is at pk underscore ivy.
Hit me up.
Awesome. Thank you so much, Paul.
It's been such a pleasure.
Awesome guys.
Yeah, happy to be here.
Thanks so much.