No Priors: Artificial Intelligence | Technology | Startups - Launching AI products with Braintrust’s CEO Ankur Goyal
Episode Date: October 8, 2024Today on No Priors, Elad is joined by Ankur Goyal, founder and CEO of Braintrust. Braintrust enables companies like Notion, Airtable, Instacart, Zapier, and Vercel to deploy AI solutions at scale by e...fficiently evaluating and managing complex, non-deterministic AI applications. Ankur shares his insights into emerging trends in the use of AI tooling and coding languages, the rise of open-source, and the future of data infrastructure. Ankur also reflects on building resilient AI products, his philosophy on coding as a CEO, and the importance of a startup’s initial customer base. Sign up for new podcasts every week. Email feedback to show@no-priors.com Follow us on Twitter: @NoPriorsPod | @Saranormous | @EladGil | @Ankrgyl Show Notes: (0:00) Introduction (0:38) Ankur’s path to Braintrust (3:05) Braintrust’s solution (5:46) AI tooling trends (7:58) Instruction tuning vs. fine-tuning (8:57) Open-source AI adoption (10:42) Future of data infrastructure and synthetic data (14:45) Designing technical interviews (18:04) Rethinking agent-based approaches (19:34) Building out an AI team (23:35) Typescript as the language of AI (25:12) The shift away from using frameworks (26:02) Vendor consolidation among enterprises (27:16) Coding as a CEO (30:16) Collaborating with customers (33:00) Future of Braintrust and evals
Transcript
Discussion (0)
So today I know priors, we have Anker Goyle, the co-founder and CEO of Brain Trust.
Anker was previously vice president of engineering at Single Store and was the founder and CEO of Empira, an AI company acquired by Figma.
Brain Trust is an end-to-end enterprise platform for building AI applications.
They help companies like Notion, Airt, Instacart, Zapier, Versal, and many more with evals, observability, and prompt development for their AI products.
And BrainTrust just raised $36 million from Andreessen Horwitz and others.
Anker, thank you so much for joining us today.
I know Pryor's.
Very excited to be here.
Can you tell us a little bit more about Brain Trust, what the product does?
And, you know, we could talk a little bit how you got started in this area and AI more generally.
Yeah, for sure.
So I have been working on AI since what one might now think of as ancient history.
Back in 2017, when we started working on Impera, you know, things were talking about
totally different. But still, it was really hard to ship products that work. And so we built
tooling internally as we developed our AI products to help us evaluate things, collect real
user data, use it to do better evils and so on. Fast forward a few years, Figma acquired us,
and we actually ended up having exactly the same problems and building pretty much the same tooling.
And I thought that was interesting for a few reasons, some of which you pointed out, by the way,
when we were hanging out and chatting about stuff.
But one, Impero was kind of pre-LLM.
My time at Figma was post-LLM,
but these problems were the same.
And I think there's some longevity that's implied by that.
You know, problems that existed pre-LLM
probably are going to exist in LLM land for a while.
And the second thing is that, you know,
having built the same tooling essentially twice,
it was clear that there was a pretty consistent need.
And so, you know, I have very fond memories
of the two of us hanging out and talking to a bunch of folks like, you know,
Brian and Mike at Zapier and Simon at Notion and, you know, many others.
And, you know, I've been in a lot of user interviews over time.
I've never seen anything resonate like the early ideas around brain trust
and really everyone's desire to have a good solution to the eval problem.
So we got to work and built, honestly, a pretty crappy initial prototype.
But people started using it.
And, you know, brain trust just over a year later has now kind of iterated from people's
feedback and, you know, complaints and ideas into something.
I think that's really powerful.
And yeah, that's how we kind of got started.
Yeah, I remember in the early conversations we had around the company or the idea, I should
say, it was meant to even potentially be open source.
And it was the first time that I was involved with some sort of customer call and people
would say, we don't want you to open source it, which I found really surprisingly.
People really pushed on, we want this to exist.
for a long time. We want to be able to pay for it. And so there was that kind of really
interesting market pull. Why do you think there was so much interest or need for this or demand
for it? Or, you know, what does Brain Trust do? And how does that really impact your
customers? You know, many of our customers had actually built, early customers had built
like internal versions of Brain Trust before we engaged with them. And there's a couple of things
that sort of came out of that. One is it helped them gain an appreciation for how hard the
problem is. Evils sound really easy. Oh, it's just a for loop, you know, and then I look at,
I console.org the, you know, for loop as I go and I look at the results. But the reality is, like,
you know, the faster you can eval, the faster you can look at eval results, which start to get
really complicated as you start doing things with agents and so on, the faster you can actually
iterate and build stuff. It is actually a pretty hard problem to do evils well. And many of our
early customers who were kind of like the pioneers in AI engineering had learned that the hard
way. And I think the other problem is that, you know, folks, especially folks, you know, like Brian,
for example, they saw that AI would be a pervasive technology throughout the whole org,
not just a project that, you know, Brian might babysit and work on with one team. And having a really
consistent and standardized, you know, way of doing things was really important. I remember early
on, Brian pointed me to the Vercell docs, and he said, one of the things I love about this is
that when new engineers are building UI now, they read these docs and they kind of learn the
right way to build web applications. And you have that opportunity with AI. And I found that
actually really motivating and, you know, really influenced how we think about things.
It makes a lot of sense. I guess like if you're swapping out, you know, GPT for, for
Claude or you're making a change in model or you're changing a prompt and it just helps you
really understand how that propagates and what sets of outcomes for users are better, what sets are
worse and kind of troubleshoot them. And then it feels like you've built a whole other
series of products around that that really helps support that. One of the biggest things when
you're building AI products is this uncertainty about quality. So you might, for example,
get really excited about a feature, build a prototype. It works on a few examples. You ship it to some
users and you realize it actually doesn't work very well. And it's just really hard to go from
that prototype into something that systematically works in an excellent way. And I think what we
have helped companies do is basically like demystify that process. So instead of having a bunch of
anxiety about, hey, I ship something, I don't know if I'm ever going to get it to be able to work
well. You can implement some e-vals and brain trust and then sort of turn the crank and get really,
really good outputs. You know, you work with a lot of the companies that I feel are the
earliest adopters of AI into their own products. In other words, they've actually shipped
products with AI in them, and they're sort of that first wave. It's Notion, Airtable, you know,
Zapier, people like that for sell. What proportion of your customers do you think are adopting
some of the things that people are talking about a lot? And so that would be things like fine-tuning
or rag or building agents. Like, do you think that's a very common set of things that be? Or
do you think that's just kind of hype? Because I think you have a very clear picture of at
one segment of the enterprise market in terms of what people are actually doing.
Unambiguously, people are doing RAG. So that one is, it's like simple and obvious.
Probably around 50% of the use cases that we see in production involve RAG of some sort.
Fine-tuning is interesting. I think, you know, a lot of people think of fine-tuning as an outcome,
but it's actually really a technique. And the outcome that people are looking for is automatic optimization.
of their workloads.
Fine-tuning is one way of doing that,
and it is a very, very difficult way
of automatically optimizing your use case.
I think we, with our customers,
have re-benchmarked fine-tuning on their workloads.
I would say every two to three months,
and there was a period of time
when GBT 3.5 fine-tuning came out
before GBT4 was easy to,
execute. Now it's extremely cheap, actually, to run GPT-40. But there's this kind of period where it's
really hard to have GPT4 access. And GPT3.5 fine-tuning was a way of, it's like the only
lever, you know, for some use cases to improve quality. But since then, you know, honestly,
I think almost, if not all of our customers have moved off of fine-tune models onto instruction
tune models and are seeing really good performance.
We even talked about that early on.
I remember when we were thinking about brain trusts,
we thought like, oh, boy, you know,
everyone's going to need to use this to fine-tune models.
And that was one of the first features we were thinking about building.
And, you know, no one is, no one's really doing it.
Could you explain just for that the listeners,
like the difference between instruction tuning and fine-tuning?
Yeah, I mean, I think it's kind of like the difference between writing Python code
and creating an FPGA or something.
So with instruction tuning,
all you do is modify the prompt to include examples of how it should behave.
You know, in some ways, it's actually very similar to fine-tuning.
You're collecting data that guides how the model should behave,
and then you're feeding it into a process that kind of nudges the model towards behaving that way.
Fine-tuning is a much lower-level thing where you're actually, like, modifying or supplementing
the weights in a model so that it, you know, learns from those examples.
And because it's so much lower level, it tends to be a lot slower, more expensive.
You know, there's a lot of ways you can injure the model while you're fine-tuning and actually
make it worse on, you know, real-world use cases. And so it's just a lot tougher to get right.
And then do you see a lot of open source adoption or mainly people using proprietary models
and other other early technologies that you see people adopting right now?
We are very close to a watershed moment for open source models. Like we saw at the watershed moment for
anthropic when Claude 3 came out, especially Cloud 3-5 Sonnet has really taken off.
We are very close to that, I think, with Lama 3-1, but we're not there yet.
So we see very limited practical adoption of open-source models, but I think more interest
than ever.
And I think a lot of what you're seeing is also just things that are in production, right?
And so to some extent, there's a lot of discussion in the developer communities around
what people are using and adopting and playing with.
And then I think you're really focused on the market of enterprises that are shipping AI products.
And, you know, obviously it can be used by hackers and developers as well, but a lot of your usage as well as people who have things in production.
And so it kind of reflects the state of the world for live systems at scale.
I am a developer and I love open source software.
And I have a very difficult time with the fact that every time I use an open AI model, I'm paying a fee per token.
But then I actually look at the numbers.
And, of course, I've looked at them with our customers, too.
And, you know, in some cases, it's just negligibly cheap.
And in the cases where it's pretty expensive, the ROI is actually really high.
And so most of our customers are really, really focused on providing the best possible user experience for their customers
and the fastest iteration speed for their developers.
And everything else is secondary.
So I think until open source can really move the needle on one of those two axes,
it's going to be tough for it to be adopted broadly.
The other place you spend a lot of your career is on sort of databases and data infrastructure
and things like that.
So the BP engineering at single store, which I think was renowned for really having an
exceptional database-centric team.
How do you think about the data infrastructure that exists for the AI world today?
What's needed?
What's lacking?
What works well?
What doesn't work?
The shift is that people have hoarded lots and lots of semi-useful data in data warehouses.
prior to LLMs, there was actually this whole industry around AI where companies like
Data Robot, for example, would come in and help you train models based on these proprietary
structured data that you've collected in your super proprietary data warehouse.
And I think the big insight or the crazy, you know, non-intuitive thing about LLMs is that
something trained on the internet outperforms.
what an enterprise can produce with their own data trained on data in a data warehouse.
And I think not only is the nature of like the data processing problem different,
but the value of data is actually, you know, and how we think about the value of data is very,
very different. Like just hoarding data about your, you know, claims history or transaction history,
it might not actually be that useful. The real question is like, how do you, you know,
construct a model, which is really good at reasoning about the problems that you're working on.
And I think the way that enterprises will collect data and leverage it into these AI processes
does not look like doing ETL on a data warehouse that's running in Amazon or something
like that. I think it's going to totally change. And I've seen, you know, like a lot of the data
that gets stored in brain trust through people's logs.
actually never makes it to a data warehouse. And, you know, people, they just don't really care
about that because, you know, if they put it in a data warehouse, what are they going to do with
it? What do you think is missing from a data infrastructure perspective? So I think, you know,
to your point, there's a couple different steps. There's some sort of data cleaning step.
There's some storage layer. There's, you know, there's different forms of labeling, et cetera.
How do you think all these pieces kind of evolve over the next couple years? And then I guess related
to that, the other topic people have been talking a lot about is synthetic data and how important
that will be in the future. I'm sort of curious your views on these different areas.
purely from a data standpoint, it's important to think about what you're going to do with the data
and then how the infrastructure enables that. So, you know, data warehouse is really designed
for ad hoc exploration on structured data, which is, it's just neither of those two things
is relevant in AI land. You're dealing with lots and lots of text, and you're not exploring it
ad hoc using SQL queries. What we see actually as kind of what the most advanced companies are
doing is actually using embeddings and models themselves to help them sift through tons and
tons of data and find, for example, customer support tickets, which are not well represented
in the data that they're using for their e-vals or not well represented in their fine-tuning
datasets and trying to find those examples and use them. So I think the workload is going to shift.
And I actually think, like, LLMs and, you know, specifically embeddings are going to be core to how people actually query data, not, you know, traditional algebraic relational indexes.
That's going to be a huge shift.
And, you know, there's this huge debate about vector databases and will traditional databases do vector database things.
I think that debate's kind of silly.
I think, you know, relational databases are perfectly capable of adding HNSW indices to them.
What will really be disrupted is the OLAP.
workload. So relational, you can't just slap, you know, semantic search and stuff into the
architecture of a traditional data warehouse. I think that is actually a much deeper set of things
that will need to change than the OLTP workload. This is your, in some sense, third startup
experience, right? You joined MemSQL slash single store quite early. You started in PIRO, which
Figma acquired. You're not doing brain trust. What are the common things that you've taken with you as you've
done this new startup. What are the things that you've implemented early? What are the things that
you've avoided? One of the things that I honestly took for granted at MemSQL, but we've
kind of re-implemented at Brain Trust, is having a really hard technical interview. You know,
MemSQL, maybe we pushed it a little bit too far, but it was really known for really strong
technical excellence. And I think our interview reflected that. So that was actually one of the
first things that we did. Manu and I spent probably like two or three days.
working through a bunch of really, really hard interview questions. And I think it's just important
that you hold the technical bar really high and try to find people that are attracted to it.
Actually, for example, if you do a front-end interview at BrainTrust, one of the questions involves
writing some C+++, and we lose a lot of candidates because of that question. But it's a good
signal that maybe Brain Trust isn't the right place for you to work because we do like to hire
people who are willing to, you know, jump around in areas of the stack that they're
unfamiliar with. So, you know, I think that's one of the, that's one of the biggest things that
we've carried over. Another thing that I think we did really well at both Empira and MSQL is have
an obsessive relationship with our customers and just really, really focus on making them
successful. It's sometimes really hard to prioritize customer feedback and think about, you know,
10 customers are asking for 10 different things. What do I do?
So what we've done at Brain Trust is actually be very deliberate about which customers we prioritize,
especially early on, and sort of hypothesized that the Zapiers and notions and so on of the world
would have pretty similar use cases.
And so if you focus on these kinds of customers, then when they ask for stuff, you can pretty
readily assume that other similar customers are going to have the same problem.
And that's allowed us to be very, very customer-centric while building a product that
repeats itself for more customers. And now what we're seeing is that, you know, the next wave
of companies that are building with AI, both startups and more traditional enterprises, they actually
want to be engineering things like the products that they admire, most of which use brain
trust. And so a lot of those best practices are now built into the product and kind of the
next batch of companies is able to consume them right out of the box. Yeah, it's kind of interesting.
I feel like even early on as companies were first adopting LLMs for actual live
products, they would all follow kind of the same startup journey, or I should say, technical
journey, right? Initially, they'd look into, at least back then, they'd look into fine-tuning
or some open-sverse model or something else. They'd eventually realize they should just be using
GPT4, which was the primary model at the time. And then they'd go through this big loop of
starting to build internal tools and then realize that really their focus should be on product.
And, you know, it was the exact same journey. And I remember in their early brain trust customer
conversations, you talk to them and they'd say, oh, we don't need this. And then three months
later, they'd call and say, okay, we really need this. And it was always roughly the same
time frame. Are you seeing any common patterns today in terms of, okay, companies that are now a
year or 18 months into their journey using LLMs, like they always have the same thing come up?
There's a couple things. So one is companies that are fairly deep into their journey, they have like
one or two North Star products that are pretty mature and they're trying to figure out how to get
those products to the next stage. Probably the most consistent thing I've seen is companies
kind of walking back from the illusion that totally free-form agents will solve all of their
problems. So I think maybe like two or three months ago, many of the pioneering companies
went way down the agent rabbit hole. And they kind of realized like, wow, this is actually not,
this is not the right approach. It's so hard to control performance. The error rate
are really high and they compound really quickly.
And so, you know, most of those companies have kind of walked back and tried to build a
different architecture where the control flow is actually managed deterministically by their
code, but they make LLM calls kind of like throughout the entire architecture of the product.
And so that's probably the biggest thing that we're seeing now is I don't know if there's a good
term for it yet, but maybe this kind of pervasive.
AI engineering throughout a product rather than trying to shove everything into the, you know,
wild loop of an agent. Yeah, the other thing that I've heard you talk about in the past is
the evolving role of what an AI team does at a company. And so I think if you go back a couple
years, people were doing machine learning and they'd hire a big ML ops team. And then the types of
things that they'd be doing day to day were very different from what they do today in the context
of adopting AI and even how you think about the role and who to hire maybe has shifted a bit.
Could you talk a little bit about what you've used the evolution of the role of the data science team, the data team, the ML or EI team, etc?
Yeah, I think what's really interesting is many of the early adopters of LLMs didn't have any ML staff when ChatGPT came out, you know, what is it now, almost two years ago.
And those companies were able to move really quickly because they kind of started with a fresh slate.
Many of the smart folks that I know that are classical machine learning people or data scientists have now come around.
But actually there was this big sort of resistance among them early on that LLMs are, they're not good at the things that we're trying to solve or maybe it's a scam or something like that.
Do you think that was just like a different problem set in terms of traditional ML and the applications of it are different from what Jenny I can do?
Or do you think it was something else?
Well, I went through this myself watching the technology that we built to do document extraction.
at Empira become, you know, totally irrelevant. And I personally, I think it's an emotional thing.
Like you try GBT3 for the first time. And first of all, you know, back then at least,
it was kind of snarky. And so that was a little bit irritating. And it was also just way better
at everything than anything you could possibly train. And I think that is so fundamentally disruptive
to, you know, a lot of companies, a lot of people's individual identity, it just is not easy
to wrap your head around if you've been doing AI and ML for a while. So I think it was largely
an emotional thing. You could argue that there's a cost, security, privacy, whatever element of
it, but the companies that were sort of on the leading edge, they're able to figure that out
pretty quickly. You know, now I think more companies have come along the journey, and I've seen a lot of
really smart ML and data science people embrace LLMs and bring a lot of the sort of rigor
that is still relevant around evils and measurement and, you know, prototyping and so on
and become these like AI platform teams. Usually it's a combination of people with product
engineering backgrounds and, you know, a few folks with statistics or data science backgrounds.
And they start by building kind of like a marquee product for the company and then they evolve
into a platform team that enables, like, the N-plus-first project to be really successful.
We see a lot of these teams forming, you know, as AI becomes more pervasive.
So if you were to enterprise company right now and you were to try and adopt AI or LLMs,
like, who would you have to hire or what sort of capabilities would you move over into sort of
this platform team?
I would start with a group of really smart product engineers because the first thing you need,
to ask yourself is what parts of my product or whatever I'm offering can be cannibalized
or completely changed by modern AI. Product engineers are generally the best people to think about
that. You can get really far with a really good UI and very basic AI engineering that sort
of proves out a concept. I think we've seen a number of good examples of that. I know, for example,
V0 is a truly incredible piece of engineering at this point. Both,
from an AI standpoint and also from a UI standpoint. But early on, you know, it was pretty simple
and that's the right way to start. And then I think as you find product market fit, it's sort of
the right time to think about, you know, more rigor, think about fine tuning. Maybe you should
use open source models for cost or whatever, although I think not many people are far along that
journey. I think you said something like TypeScript is a language of AI and Python is a language
of machine learning.
Yeah.
Could you extrapolate more on that?
First of all, a vast majority of our customers use TypeScript.
And, you know, early on, some of our customers were dealing with, like, should we use
TypeScript or Python?
And some teams are using TypeScript.
Some teams are using Python.
Now, almost everyone, including people that used to write Python primarily is using TypeScript.
And I think that's going to continue forward.
There's a few reasons for that.
One is TypeScript is the language of product engineering.
and product engineers are the ones who are driving most of the AI innovation,
at least in the world that we participate in.
And so they're just literally pulling the AI ecosystem into their world,
and that is driving a lot of TypeScript stuff.
Another thing is that TypeScript as a language is inherently better suited
for AI workloads because of the type system.
So the type system basically allows you to launder the crazy stuff.
you know, the crazy stuff that comes out of an AI model into a well-defined structure that the rest of your
software system can use. Python has a pretty immature type system. You know, they're improving,
and I always get trolled on Twitter when I post about this by people who make, you know, somewhat
valid arguments. But TypeScript is just a much, much better language for writing software that
deals with uncertain shapes of data. I think that's actually kind of its whole point.
So I think it is actually literally a better suited language for working with AI.
Have you seen any other shifts in terms of usage of specific languages or tooling or other things that's happened with this wave of AI?
Yeah, I think the biggest thing I've seen over the past six months is people dropping the use of frameworks.
So early on, I think people thought that AI is this really unique thing.
And just like, you know, Ruby on Rails or whatever, we're going to, you know,
need to build new kinds of applications with new kinds of frameworks to be able to build AI
software. And really, I think people have walked back from that and they now think of AI as
kind of like a core part of their software engineering as a whole. And so AI is now kind of like
pervasively spreading throughout people's code base. And it's not constrained to what you can
create with, you know, a single framework. Outside of the areas that brain trust touches from a
tooling perspective. What do you think are other interesting emerging either platforms or
approaches or products or infrastructure that people are starting to use? I think what we've seen
from a lot of our customers is a consolidation of vendors. And this is very, very, very much
driven by AWS. So AWS has its mojo back now that they have Anthropic on bedrock.
And Anthropic is, you know, especially Cloud 3 and 35, are really, really good.
And so because, you know, many companies were consolidating their vendors prior to AI.
AWS is so dominant.
And now you can actually consolidate a lot of your AI stuff on AWS as well.
We're seeing pretty dramatic vendor consolidation.
There's some companies that we talk to and their AI vendors are, it's literally open AI,
AWS, and brain trust.
And pretty much everything else has consolidated away.
So, you know, it'll be interesting to.
see what happens. I certainly wouldn't underestimate, you know, AWS and the hyperscalers,
especially on the infrastructure side. One of the things that I think is striking is how much time
you still spend coding a CEO. And there's a number of CEOs of different companies who continue
to write code over the course of their careers of varying degrees, you know, like Tobias Shopify
would be an interesting example of that. How do you think about time spent coding versus marketing
versus doing other things for the company and why focus there? My perspective on this has changed
a lot over time. When I was much younger, I started leading the engineering team at single store
and then became a CEO. And people give you the conventional advice about what you should do with your
time and who you should hire and stuff like that. And first, I think, you know, the profile of CEOs is
changing. And second, I think the market is changing. So in the world that we are in, which is enterprise
software, people really, really care about the polish of the UI that they're using. I think
companies like Notion, for example, have really driven people's taste on those products. But when
many VCs were having their formative experiences and observing the patterns that they would
eventually mandate among their portfolio companies, things were very different. You know, IT bought
enterprise software and they bought it based on you know checklist that product managers came up with um so
i think a lot of this has changed and for me it just feels very natural to um you know participate in
that change by uh being very very deep in the product and as hard as i've tried over the past
you know decade plus i just can't i think i'm just literally addicted to writing code it is the
fastest most efficient and most pleasurable way for me to participate in what we're doing
as a company. And so instead of trying to change that, which I've done, at Brain Trust,
we've kind of engineered the company to support me spending a lot of time writing code.
For example, one of the first people we hired was Albert, who was formerly an investor and
investment banker. Before that, he's incredibly good at everything from, you know, selling, marketing,
dealing with ops, helping with recruiting, and, you know, working with.
with him has kind of freed me up to spend a lot more time doing that kind of thing. Whereas at
Empira, I spent probably like half or more of my day doing those things. Yeah, we had Jensen
weighing on from Nvidia on NoPriars previously. And I thought one perspective that he's sure that
you don't hear very much, which you're now echoing, is you should really architect the company
around the CEO versus just follow the same pattern every time of what the right thing for the company
is. And obviously there's a urge we just have to do the same thing every time, like, you know,
sales comp. It really doesn't make sense to try and you event that. And everybody always tries
for their first startup. And by the second startup, they're like, why did I even try that?
You know, it just kind of works. But the flip side of it is there are certain things that delegate
or not. There are certain things in micromanage versus not. And it really varies by the person
and what they love doing and, you know, all the rest of it. Are there other big differences
between how you've approached Brain Trust and Empira, for example, your prior startup?
Another thing that we're really bullish on at Brain Trust is people,
being in the office and being really comfortable, being interrupt-driven. These are two battles
that were very difficult for us at Empira, because we weren't very firm about it. I think the
second one is actually a little bit more interesting. At Brain Trust, if a customer complains
about something or they find something about our UI annoying or they have an idea, we almost
always fix it immediately. And that is something that for a lot of engineers is very uncomfortable.
comfortable. But for the right engineers, they've been craving that experience, you know, their
entire career. And so we, we handpick those people that want to be in that environment. And then again,
we engineer our roadmap and think about how we allocate our time and so on to actually be able
to support that. And I think it's one of the key sort of things that that has made the product
really good and also creates a lot of love with our customers. Not everyone has to have the same
edge, but I think you have to have some edge. And so we identified that as something we really
cared about early on. And again, you know, kind of like recruited a team of people who really
want to do that. Yeah. And I guess like that's translated into sort of customer adoption and
some of the logos you've landed. Are there other things that have helped drive customer
acquisition? And, you know, there have been unique ways that you've approached go to market?
Yeah. I mean, I think I went to the, you know, Elad School of Hard Knocks and learned a bunch of
stuff early on from you. But, you know, really the thing that we did was we made that list of like
50 people who we thought were leading the way in AI and said, you know, let's try to figure out a way
to get to these people and either recruit them as investors or as customers. And I think that was
probably one of the most important, if not the most important things that we did. Some people, for
example, were excited about brain trust. We had known them for a while. They invested and they said,
you know what, we've already built our own version of this internally or we don't care about this,
but we think other people will need it. So we'd love to invest. And actually, many of those people
have now come around and started using brain trust too. So just being very deliberate about
who our target market was. I mean, 50 companies is not a huge TAM in some ways, but those
companies are very influential and they've led to many more customers now. So I think that was the most
important thing. Yeah, it feels like people really misdefine their initial customer envelope or people
that they want to target. And so they either go too broad, you know, or do everything from Fortune 500
to, you know, small startups. And then they're not really building for any specific user or they go
way too specific, maybe even in a segment that just isn't worth pursuing. And so it's really
interesting to see how people think about that. Yeah. Could you tell me a little bit more about what
you've used a future or brain trust? How does it evolve as like a product and platform? And then how does it
changes, AI changes? Has it all eval eventually done by machines? Or, you know, what does the future
hold for us? Yeah, I asked myself that question, you know, every month or so, and surprisingly little
changes. But, you know, Brain Trust, we started out by solving the Eval problem, and I think we did that
really well. And what we realized is that there's actually this whole platform that people want.
One of our customers actually Airtable early on, they used our Eval's product to do observability.
So they literally would create experiments every day as if there were e-vals and just dump their logs into those experiments.
It's pretty obvious when someone starts doing that, that they're trying to do observability in your product.
And we dug into why.
And it turns out that in AI, the whole point of observability is to collect data into data sets that you can use to do e-vals.
And then again, eventually fine-tune models or more advanced things.
But still, e-vals is the most important element.
And the next thing that happened is that, you know, some of our customers said,
hey, actually, I'm already doing, you know, observability and e-vals and stuff in
BrainTrust. I'm spending so much time in this product. Why do I have to go back to my
IDE, which, by the way, knows nothing about my e-vals, it knows nothing about my logs.
Can I work on prompts in BrainTrust? Can I repro what I'm seeing live? Can I save the prompts
and then auto-deploy them to my, you know, production environment? That actually, it scared the
crap out of me thinking, you know, just from my, you know, traditional now old school engineering
perspective. But it's what people wanted. And, you know, I was talking to Martine, who just became
a Brain Trust daily active user quite recently. And, you know, he spends like half his day now
tinkering with prompts in AI town in Brain Trust. And so even like old school engineers, you know,
like us, it's definitely the right way to do things. And I sort of see Brain Trust.
just evolving into this kind of hybrid between, you know, in some ways it's kind of like GitHub,
you know, create prompts. Now you can create more advanced functionality with Python code and
typescript code and stitch it together with your prompts in the product all the way through
to, you know, evals and observability. And I think we're really excited about building a universal
developer platform for AI. In terms of quality, having lived through the
pre-LLM era, I actually think a lot of the anxieties and predictions about quality are exactly
the same as they were pre-LLM. Even, you know, when we were doing document processing stuff at
Empira, people were like, oh, hey, all documents will be perfectly extracted within six months from now.
And LLMs, by the way, are amazing, but document processing is still not a totally solved problem.
And I think it's because people will take whatever technology they have and push it to its extreme.
There are things that people are trying to do today that are past the extreme.
Like auto-GPT is a great example of something that is, I think, a really productive experiment in pushing AI past what it can reasonably do.
But, you know, people are always going to push things to their extreme.
AI is an inherently non-deterministic thing.
And so I think evils are still going to be there.
We might just be evaluating, you know, more and more complex and interesting problems.
And then in what world do you think AI will play?
and e-vowling itself? I mean, AI already evils itself. So very similar to traditional math,
I think, you know, if you're doing like a math homework assignment, it's way easier if someone
gives you a proof to validate the proof than it is to actually generate a proof in the first
place. And sort of the same principle works for LLMs. It's way easier for an LLM, especially a
frontier model, to look at the work of, you know, itself or another LLM.
and accurately assess it.
And so that's already the case.
I think probably more than half of the evils
that people do in BrainTrust are LLM-based.
I think some of the interesting things
that are happening as LLMs are getting better
and as GPT4 quality is getting cheaper
is that people are actually starting to do
LLM-based e-vals on their logs.
So one of the really cool things
that you can now do in BrainTrust
is you can write LLM and code-based evaluators
and then run them automatically on some fraction of your logs.
Sometimes that actually even allows you to evaluate things
that you're not allowed to look at.
And so the, you know, the LLM is allowed to read PII
and, you know, crunch through something
and tell you whether, you know, your use case is working or not,
but maybe no developer or person at the company is.
And so I think that is a really interesting unlock
and probably represents what people will be doing
over at least the next year.
Super interesting.
Hey, Alancra, thank you so much for joining us today.
Thanks for having me.
Find us on Twitter at No PryorsPod.
Subscribe to our YouTube channel if you want to see our faces.
Follow the show on Apple Podcasts, Spotify, or wherever you listen.
That way you get a new episode every week.
And sign up for emails or find transcripts for every episode at no-dashpriers.com.