No Priors: Artificial Intelligence | Technology | Startups - Launching AI products with Braintrust’s CEO Ankur Goyal

Starting point is 00:00:00 So today I know priors, we have Anker Goyle, the co-founder and CEO of Brain Trust. Anker was previously vice president of engineering at Single Store and was the founder and CEO of Empira, an AI company acquired by Figma. Brain Trust is an end-to-end enterprise platform for building AI applications. They help companies like Notion, Airt, Instacart, Zapier, Versal, and many more with evals, observability, and prompt development for their AI products. And BrainTrust just raised $36 million from Andreessen Horwitz and others. Anker, thank you so much for joining us today. I know Pryor's. Very excited to be here.

Starting point is 00:00:38 Can you tell us a little bit more about Brain Trust, what the product does? And, you know, we could talk a little bit how you got started in this area and AI more generally. Yeah, for sure. So I have been working on AI since what one might now think of as ancient history. Back in 2017, when we started working on Impera, you know, things were talking about totally different. But still, it was really hard to ship products that work. And so we built tooling internally as we developed our AI products to help us evaluate things, collect real user data, use it to do better evils and so on. Fast forward a few years, Figma acquired us,

Starting point is 00:01:19 and we actually ended up having exactly the same problems and building pretty much the same tooling. And I thought that was interesting for a few reasons, some of which you pointed out, by the way, when we were hanging out and chatting about stuff. But one, Impero was kind of pre-LLM. My time at Figma was post-LLM, but these problems were the same. And I think there's some longevity that's implied by that. You know, problems that existed pre-LLM

Starting point is 00:01:44 probably are going to exist in LLM land for a while. And the second thing is that, you know, having built the same tooling essentially twice, it was clear that there was a pretty consistent need. And so, you know, I have very fond memories of the two of us hanging out and talking to a bunch of folks like, you know, Brian and Mike at Zapier and Simon at Notion and, you know, many others. And, you know, I've been in a lot of user interviews over time.

Starting point is 00:02:10 I've never seen anything resonate like the early ideas around brain trust and really everyone's desire to have a good solution to the eval problem. So we got to work and built, honestly, a pretty crappy initial prototype. But people started using it. And, you know, brain trust just over a year later has now kind of iterated from people's feedback and, you know, complaints and ideas into something. I think that's really powerful. And yeah, that's how we kind of got started.

Starting point is 00:02:43 Yeah, I remember in the early conversations we had around the company or the idea, I should say, it was meant to even potentially be open source. And it was the first time that I was involved with some sort of customer call and people would say, we don't want you to open source it, which I found really surprisingly. People really pushed on, we want this to exist. for a long time. We want to be able to pay for it. And so there was that kind of really interesting market pull. Why do you think there was so much interest or need for this or demand for it? Or, you know, what does Brain Trust do? And how does that really impact your

Starting point is 00:03:11 customers? You know, many of our customers had actually built, early customers had built like internal versions of Brain Trust before we engaged with them. And there's a couple of things that sort of came out of that. One is it helped them gain an appreciation for how hard the problem is. Evils sound really easy. Oh, it's just a for loop, you know, and then I look at, I console.org the, you know, for loop as I go and I look at the results. But the reality is, like, you know, the faster you can eval, the faster you can look at eval results, which start to get really complicated as you start doing things with agents and so on, the faster you can actually iterate and build stuff. It is actually a pretty hard problem to do evils well. And many of our

Starting point is 00:03:51 early customers who were kind of like the pioneers in AI engineering had learned that the hard way. And I think the other problem is that, you know, folks, especially folks, you know, like Brian, for example, they saw that AI would be a pervasive technology throughout the whole org, not just a project that, you know, Brian might babysit and work on with one team. And having a really consistent and standardized, you know, way of doing things was really important. I remember early on, Brian pointed me to the Vercell docs, and he said, one of the things I love about this is that when new engineers are building UI now, they read these docs and they kind of learn the right way to build web applications. And you have that opportunity with AI. And I found that

Starting point is 00:04:37 actually really motivating and, you know, really influenced how we think about things. It makes a lot of sense. I guess like if you're swapping out, you know, GPT for, for Claude or you're making a change in model or you're changing a prompt and it just helps you really understand how that propagates and what sets of outcomes for users are better, what sets are worse and kind of troubleshoot them. And then it feels like you've built a whole other series of products around that that really helps support that. One of the biggest things when you're building AI products is this uncertainty about quality. So you might, for example, get really excited about a feature, build a prototype. It works on a few examples. You ship it to some

Starting point is 00:05:16 users and you realize it actually doesn't work very well. And it's just really hard to go from that prototype into something that systematically works in an excellent way. And I think what we have helped companies do is basically like demystify that process. So instead of having a bunch of anxiety about, hey, I ship something, I don't know if I'm ever going to get it to be able to work well. You can implement some e-vals and brain trust and then sort of turn the crank and get really, really good outputs. You know, you work with a lot of the companies that I feel are the earliest adopters of AI into their own products. In other words, they've actually shipped products with AI in them, and they're sort of that first wave. It's Notion, Airtable, you know,

Starting point is 00:05:56 Zapier, people like that for sell. What proportion of your customers do you think are adopting some of the things that people are talking about a lot? And so that would be things like fine-tuning or rag or building agents. Like, do you think that's a very common set of things that be? Or do you think that's just kind of hype? Because I think you have a very clear picture of at one segment of the enterprise market in terms of what people are actually doing. Unambiguously, people are doing RAG. So that one is, it's like simple and obvious. Probably around 50% of the use cases that we see in production involve RAG of some sort. Fine-tuning is interesting. I think, you know, a lot of people think of fine-tuning as an outcome,

Starting point is 00:06:36 but it's actually really a technique. And the outcome that people are looking for is automatic optimization. of their workloads. Fine-tuning is one way of doing that, and it is a very, very difficult way of automatically optimizing your use case. I think we, with our customers, have re-benchmarked fine-tuning on their workloads. I would say every two to three months,

Starting point is 00:07:04 and there was a period of time when GBT 3.5 fine-tuning came out before GBT4 was easy to, execute. Now it's extremely cheap, actually, to run GPT-40. But there's this kind of period where it's really hard to have GPT4 access. And GPT3.5 fine-tuning was a way of, it's like the only lever, you know, for some use cases to improve quality. But since then, you know, honestly, I think almost, if not all of our customers have moved off of fine-tune models onto instruction tune models and are seeing really good performance.

Starting point is 00:07:43 We even talked about that early on. I remember when we were thinking about brain trusts, we thought like, oh, boy, you know, everyone's going to need to use this to fine-tune models. And that was one of the first features we were thinking about building. And, you know, no one is, no one's really doing it. Could you explain just for that the listeners, like the difference between instruction tuning and fine-tuning?

Starting point is 00:08:04 Yeah, I mean, I think it's kind of like the difference between writing Python code and creating an FPGA or something. So with instruction tuning, all you do is modify the prompt to include examples of how it should behave. You know, in some ways, it's actually very similar to fine-tuning. You're collecting data that guides how the model should behave, and then you're feeding it into a process that kind of nudges the model towards behaving that way. Fine-tuning is a much lower-level thing where you're actually, like, modifying or supplementing

Starting point is 00:08:36 the weights in a model so that it, you know, learns from those examples. And because it's so much lower level, it tends to be a lot slower, more expensive. You know, there's a lot of ways you can injure the model while you're fine-tuning and actually make it worse on, you know, real-world use cases. And so it's just a lot tougher to get right. And then do you see a lot of open source adoption or mainly people using proprietary models and other other early technologies that you see people adopting right now? We are very close to a watershed moment for open source models. Like we saw at the watershed moment for anthropic when Claude 3 came out, especially Cloud 3-5 Sonnet has really taken off.

Starting point is 00:09:19 We are very close to that, I think, with Lama 3-1, but we're not there yet. So we see very limited practical adoption of open-source models, but I think more interest than ever. And I think a lot of what you're seeing is also just things that are in production, right? And so to some extent, there's a lot of discussion in the developer communities around what people are using and adopting and playing with. And then I think you're really focused on the market of enterprises that are shipping AI products. And, you know, obviously it can be used by hackers and developers as well, but a lot of your usage as well as people who have things in production.

Starting point is 00:09:52 And so it kind of reflects the state of the world for live systems at scale. I am a developer and I love open source software. And I have a very difficult time with the fact that every time I use an open AI model, I'm paying a fee per token. But then I actually look at the numbers. And, of course, I've looked at them with our customers, too. And, you know, in some cases, it's just negligibly cheap. And in the cases where it's pretty expensive, the ROI is actually really high. And so most of our customers are really, really focused on providing the best possible user experience for their customers

Starting point is 00:10:26 and the fastest iteration speed for their developers. And everything else is secondary. So I think until open source can really move the needle on one of those two axes, it's going to be tough for it to be adopted broadly. The other place you spend a lot of your career is on sort of databases and data infrastructure and things like that. So the BP engineering at single store, which I think was renowned for really having an exceptional database-centric team.

Starting point is 00:10:53 How do you think about the data infrastructure that exists for the AI world today? What's needed? What's lacking? What works well? What doesn't work? The shift is that people have hoarded lots and lots of semi-useful data in data warehouses. prior to LLMs, there was actually this whole industry around AI where companies like Data Robot, for example, would come in and help you train models based on these proprietary

Starting point is 00:11:19 structured data that you've collected in your super proprietary data warehouse. And I think the big insight or the crazy, you know, non-intuitive thing about LLMs is that something trained on the internet outperforms. what an enterprise can produce with their own data trained on data in a data warehouse. And I think not only is the nature of like the data processing problem different, but the value of data is actually, you know, and how we think about the value of data is very, very different. Like just hoarding data about your, you know, claims history or transaction history, it might not actually be that useful. The real question is like, how do you, you know,

Starting point is 00:12:05 construct a model, which is really good at reasoning about the problems that you're working on. And I think the way that enterprises will collect data and leverage it into these AI processes does not look like doing ETL on a data warehouse that's running in Amazon or something like that. I think it's going to totally change. And I've seen, you know, like a lot of the data that gets stored in brain trust through people's logs. actually never makes it to a data warehouse. And, you know, people, they just don't really care about that because, you know, if they put it in a data warehouse, what are they going to do with it? What do you think is missing from a data infrastructure perspective? So I think, you know,

Starting point is 00:12:46 to your point, there's a couple different steps. There's some sort of data cleaning step. There's some storage layer. There's, you know, there's different forms of labeling, et cetera. How do you think all these pieces kind of evolve over the next couple years? And then I guess related to that, the other topic people have been talking a lot about is synthetic data and how important that will be in the future. I'm sort of curious your views on these different areas. purely from a data standpoint, it's important to think about what you're going to do with the data and then how the infrastructure enables that. So, you know, data warehouse is really designed for ad hoc exploration on structured data, which is, it's just neither of those two things

Starting point is 00:13:23 is relevant in AI land. You're dealing with lots and lots of text, and you're not exploring it ad hoc using SQL queries. What we see actually as kind of what the most advanced companies are doing is actually using embeddings and models themselves to help them sift through tons and tons of data and find, for example, customer support tickets, which are not well represented in the data that they're using for their e-vals or not well represented in their fine-tuning datasets and trying to find those examples and use them. So I think the workload is going to shift. And I actually think, like, LLMs and, you know, specifically embeddings are going to be core to how people actually query data, not, you know, traditional algebraic relational indexes. That's going to be a huge shift.

Starting point is 00:14:13 And, you know, there's this huge debate about vector databases and will traditional databases do vector database things. I think that debate's kind of silly. I think, you know, relational databases are perfectly capable of adding HNSW indices to them. What will really be disrupted is the OLAP. workload. So relational, you can't just slap, you know, semantic search and stuff into the architecture of a traditional data warehouse. I think that is actually a much deeper set of things that will need to change than the OLTP workload. This is your, in some sense, third startup experience, right? You joined MemSQL slash single store quite early. You started in PIRO, which

Starting point is 00:14:51 Figma acquired. You're not doing brain trust. What are the common things that you've taken with you as you've done this new startup. What are the things that you've implemented early? What are the things that you've avoided? One of the things that I honestly took for granted at MemSQL, but we've kind of re-implemented at Brain Trust, is having a really hard technical interview. You know, MemSQL, maybe we pushed it a little bit too far, but it was really known for really strong technical excellence. And I think our interview reflected that. So that was actually one of the first things that we did. Manu and I spent probably like two or three days. working through a bunch of really, really hard interview questions. And I think it's just important

Starting point is 00:15:31 that you hold the technical bar really high and try to find people that are attracted to it. Actually, for example, if you do a front-end interview at BrainTrust, one of the questions involves writing some C+++, and we lose a lot of candidates because of that question. But it's a good signal that maybe Brain Trust isn't the right place for you to work because we do like to hire people who are willing to, you know, jump around in areas of the stack that they're unfamiliar with. So, you know, I think that's one of the, that's one of the biggest things that we've carried over. Another thing that I think we did really well at both Empira and MSQL is have an obsessive relationship with our customers and just really, really focus on making them

Starting point is 00:16:18 successful. It's sometimes really hard to prioritize customer feedback and think about, you know, 10 customers are asking for 10 different things. What do I do? So what we've done at Brain Trust is actually be very deliberate about which customers we prioritize, especially early on, and sort of hypothesized that the Zapiers and notions and so on of the world would have pretty similar use cases. And so if you focus on these kinds of customers, then when they ask for stuff, you can pretty readily assume that other similar customers are going to have the same problem. And that's allowed us to be very, very customer-centric while building a product that

Starting point is 00:16:56 repeats itself for more customers. And now what we're seeing is that, you know, the next wave of companies that are building with AI, both startups and more traditional enterprises, they actually want to be engineering things like the products that they admire, most of which use brain trust. And so a lot of those best practices are now built into the product and kind of the next batch of companies is able to consume them right out of the box. Yeah, it's kind of interesting. I feel like even early on as companies were first adopting LLMs for actual live products, they would all follow kind of the same startup journey, or I should say, technical journey, right? Initially, they'd look into, at least back then, they'd look into fine-tuning

Starting point is 00:17:35 or some open-sverse model or something else. They'd eventually realize they should just be using GPT4, which was the primary model at the time. And then they'd go through this big loop of starting to build internal tools and then realize that really their focus should be on product. And, you know, it was the exact same journey. And I remember in their early brain trust customer conversations, you talk to them and they'd say, oh, we don't need this. And then three months later, they'd call and say, okay, we really need this. And it was always roughly the same time frame. Are you seeing any common patterns today in terms of, okay, companies that are now a year or 18 months into their journey using LLMs, like they always have the same thing come up?

Starting point is 00:18:10 There's a couple things. So one is companies that are fairly deep into their journey, they have like one or two North Star products that are pretty mature and they're trying to figure out how to get those products to the next stage. Probably the most consistent thing I've seen is companies kind of walking back from the illusion that totally free-form agents will solve all of their problems. So I think maybe like two or three months ago, many of the pioneering companies went way down the agent rabbit hole. And they kind of realized like, wow, this is actually not, this is not the right approach. It's so hard to control performance. The error rate are really high and they compound really quickly.

Starting point is 00:18:54 And so, you know, most of those companies have kind of walked back and tried to build a different architecture where the control flow is actually managed deterministically by their code, but they make LLM calls kind of like throughout the entire architecture of the product. And so that's probably the biggest thing that we're seeing now is I don't know if there's a good term for it yet, but maybe this kind of pervasive. AI engineering throughout a product rather than trying to shove everything into the, you know, wild loop of an agent. Yeah, the other thing that I've heard you talk about in the past is the evolving role of what an AI team does at a company. And so I think if you go back a couple

Starting point is 00:19:37 years, people were doing machine learning and they'd hire a big ML ops team. And then the types of things that they'd be doing day to day were very different from what they do today in the context of adopting AI and even how you think about the role and who to hire maybe has shifted a bit. Could you talk a little bit about what you've used the evolution of the role of the data science team, the data team, the ML or EI team, etc? Yeah, I think what's really interesting is many of the early adopters of LLMs didn't have any ML staff when ChatGPT came out, you know, what is it now, almost two years ago. And those companies were able to move really quickly because they kind of started with a fresh slate. Many of the smart folks that I know that are classical machine learning people or data scientists have now come around. But actually there was this big sort of resistance among them early on that LLMs are, they're not good at the things that we're trying to solve or maybe it's a scam or something like that.

Starting point is 00:20:33 Do you think that was just like a different problem set in terms of traditional ML and the applications of it are different from what Jenny I can do? Or do you think it was something else? Well, I went through this myself watching the technology that we built to do document extraction. at Empira become, you know, totally irrelevant. And I personally, I think it's an emotional thing. Like you try GBT3 for the first time. And first of all, you know, back then at least, it was kind of snarky. And so that was a little bit irritating. And it was also just way better at everything than anything you could possibly train. And I think that is so fundamentally disruptive to, you know, a lot of companies, a lot of people's individual identity, it just is not easy

Starting point is 00:21:20 to wrap your head around if you've been doing AI and ML for a while. So I think it was largely an emotional thing. You could argue that there's a cost, security, privacy, whatever element of it, but the companies that were sort of on the leading edge, they're able to figure that out pretty quickly. You know, now I think more companies have come along the journey, and I've seen a lot of really smart ML and data science people embrace LLMs and bring a lot of the sort of rigor that is still relevant around evils and measurement and, you know, prototyping and so on and become these like AI platform teams. Usually it's a combination of people with product engineering backgrounds and, you know, a few folks with statistics or data science backgrounds.

Starting point is 00:22:04 And they start by building kind of like a marquee product for the company and then they evolve into a platform team that enables, like, the N-plus-first project to be really successful. We see a lot of these teams forming, you know, as AI becomes more pervasive. So if you were to enterprise company right now and you were to try and adopt AI or LLMs, like, who would you have to hire or what sort of capabilities would you move over into sort of this platform team? I would start with a group of really smart product engineers because the first thing you need, to ask yourself is what parts of my product or whatever I'm offering can be cannibalized

Starting point is 00:22:48 or completely changed by modern AI. Product engineers are generally the best people to think about that. You can get really far with a really good UI and very basic AI engineering that sort of proves out a concept. I think we've seen a number of good examples of that. I know, for example, V0 is a truly incredible piece of engineering at this point. Both, from an AI standpoint and also from a UI standpoint. But early on, you know, it was pretty simple and that's the right way to start. And then I think as you find product market fit, it's sort of the right time to think about, you know, more rigor, think about fine tuning. Maybe you should use open source models for cost or whatever, although I think not many people are far along that

Starting point is 00:23:34 journey. I think you said something like TypeScript is a language of AI and Python is a language of machine learning. Yeah. Could you extrapolate more on that? First of all, a vast majority of our customers use TypeScript. And, you know, early on, some of our customers were dealing with, like, should we use TypeScript or Python? And some teams are using TypeScript.

Starting point is 00:23:53 Some teams are using Python. Now, almost everyone, including people that used to write Python primarily is using TypeScript. And I think that's going to continue forward. There's a few reasons for that. One is TypeScript is the language of product engineering. and product engineers are the ones who are driving most of the AI innovation, at least in the world that we participate in. And so they're just literally pulling the AI ecosystem into their world,

Starting point is 00:24:20 and that is driving a lot of TypeScript stuff. Another thing is that TypeScript as a language is inherently better suited for AI workloads because of the type system. So the type system basically allows you to launder the crazy stuff. you know, the crazy stuff that comes out of an AI model into a well-defined structure that the rest of your software system can use. Python has a pretty immature type system. You know, they're improving, and I always get trolled on Twitter when I post about this by people who make, you know, somewhat valid arguments. But TypeScript is just a much, much better language for writing software that

Starting point is 00:25:01 deals with uncertain shapes of data. I think that's actually kind of its whole point. So I think it is actually literally a better suited language for working with AI. Have you seen any other shifts in terms of usage of specific languages or tooling or other things that's happened with this wave of AI? Yeah, I think the biggest thing I've seen over the past six months is people dropping the use of frameworks. So early on, I think people thought that AI is this really unique thing. And just like, you know, Ruby on Rails or whatever, we're going to, you know, need to build new kinds of applications with new kinds of frameworks to be able to build AI software. And really, I think people have walked back from that and they now think of AI as

Starting point is 00:25:47 kind of like a core part of their software engineering as a whole. And so AI is now kind of like pervasively spreading throughout people's code base. And it's not constrained to what you can create with, you know, a single framework. Outside of the areas that brain trust touches from a tooling perspective. What do you think are other interesting emerging either platforms or approaches or products or infrastructure that people are starting to use? I think what we've seen from a lot of our customers is a consolidation of vendors. And this is very, very, very much driven by AWS. So AWS has its mojo back now that they have Anthropic on bedrock. And Anthropic is, you know, especially Cloud 3 and 35, are really, really good.

Starting point is 00:26:34 And so because, you know, many companies were consolidating their vendors prior to AI. AWS is so dominant. And now you can actually consolidate a lot of your AI stuff on AWS as well. We're seeing pretty dramatic vendor consolidation. There's some companies that we talk to and their AI vendors are, it's literally open AI, AWS, and brain trust. And pretty much everything else has consolidated away. So, you know, it'll be interesting to.

Starting point is 00:27:04 see what happens. I certainly wouldn't underestimate, you know, AWS and the hyperscalers, especially on the infrastructure side. One of the things that I think is striking is how much time you still spend coding a CEO. And there's a number of CEOs of different companies who continue to write code over the course of their careers of varying degrees, you know, like Tobias Shopify would be an interesting example of that. How do you think about time spent coding versus marketing versus doing other things for the company and why focus there? My perspective on this has changed a lot over time. When I was much younger, I started leading the engineering team at single store and then became a CEO. And people give you the conventional advice about what you should do with your

Starting point is 00:27:47 time and who you should hire and stuff like that. And first, I think, you know, the profile of CEOs is changing. And second, I think the market is changing. So in the world that we are in, which is enterprise software, people really, really care about the polish of the UI that they're using. I think companies like Notion, for example, have really driven people's taste on those products. But when many VCs were having their formative experiences and observing the patterns that they would eventually mandate among their portfolio companies, things were very different. You know, IT bought enterprise software and they bought it based on you know checklist that product managers came up with um so i think a lot of this has changed and for me it just feels very natural to um you know participate in

Starting point is 00:28:39 that change by uh being very very deep in the product and as hard as i've tried over the past you know decade plus i just can't i think i'm just literally addicted to writing code it is the fastest most efficient and most pleasurable way for me to participate in what we're doing as a company. And so instead of trying to change that, which I've done, at Brain Trust, we've kind of engineered the company to support me spending a lot of time writing code. For example, one of the first people we hired was Albert, who was formerly an investor and investment banker. Before that, he's incredibly good at everything from, you know, selling, marketing, dealing with ops, helping with recruiting, and, you know, working with.

Starting point is 00:29:27 with him has kind of freed me up to spend a lot more time doing that kind of thing. Whereas at Empira, I spent probably like half or more of my day doing those things. Yeah, we had Jensen weighing on from Nvidia on NoPriars previously. And I thought one perspective that he's sure that you don't hear very much, which you're now echoing, is you should really architect the company around the CEO versus just follow the same pattern every time of what the right thing for the company is. And obviously there's a urge we just have to do the same thing every time, like, you know, sales comp. It really doesn't make sense to try and you event that. And everybody always tries for their first startup. And by the second startup, they're like, why did I even try that?

Starting point is 00:30:03 You know, it just kind of works. But the flip side of it is there are certain things that delegate or not. There are certain things in micromanage versus not. And it really varies by the person and what they love doing and, you know, all the rest of it. Are there other big differences between how you've approached Brain Trust and Empira, for example, your prior startup? Another thing that we're really bullish on at Brain Trust is people, being in the office and being really comfortable, being interrupt-driven. These are two battles that were very difficult for us at Empira, because we weren't very firm about it. I think the second one is actually a little bit more interesting. At Brain Trust, if a customer complains

Starting point is 00:30:44 about something or they find something about our UI annoying or they have an idea, we almost always fix it immediately. And that is something that for a lot of engineers is very uncomfortable. comfortable. But for the right engineers, they've been craving that experience, you know, their entire career. And so we, we handpick those people that want to be in that environment. And then again, we engineer our roadmap and think about how we allocate our time and so on to actually be able to support that. And I think it's one of the key sort of things that that has made the product really good and also creates a lot of love with our customers. Not everyone has to have the same edge, but I think you have to have some edge. And so we identified that as something we really

Starting point is 00:31:26 cared about early on. And again, you know, kind of like recruited a team of people who really want to do that. Yeah. And I guess like that's translated into sort of customer adoption and some of the logos you've landed. Are there other things that have helped drive customer acquisition? And, you know, there have been unique ways that you've approached go to market? Yeah. I mean, I think I went to the, you know, Elad School of Hard Knocks and learned a bunch of stuff early on from you. But, you know, really the thing that we did was we made that list of like 50 people who we thought were leading the way in AI and said, you know, let's try to figure out a way to get to these people and either recruit them as investors or as customers. And I think that was

Starting point is 00:32:08 probably one of the most important, if not the most important things that we did. Some people, for example, were excited about brain trust. We had known them for a while. They invested and they said, you know what, we've already built our own version of this internally or we don't care about this, but we think other people will need it. So we'd love to invest. And actually, many of those people have now come around and started using brain trust too. So just being very deliberate about who our target market was. I mean, 50 companies is not a huge TAM in some ways, but those companies are very influential and they've led to many more customers now. So I think that was the most important thing. Yeah, it feels like people really misdefine their initial customer envelope or people

Starting point is 00:32:50 that they want to target. And so they either go too broad, you know, or do everything from Fortune 500 to, you know, small startups. And then they're not really building for any specific user or they go way too specific, maybe even in a segment that just isn't worth pursuing. And so it's really interesting to see how people think about that. Yeah. Could you tell me a little bit more about what you've used a future or brain trust? How does it evolve as like a product and platform? And then how does it changes, AI changes? Has it all eval eventually done by machines? Or, you know, what does the future hold for us? Yeah, I asked myself that question, you know, every month or so, and surprisingly little changes. But, you know, Brain Trust, we started out by solving the Eval problem, and I think we did that

Starting point is 00:33:31 really well. And what we realized is that there's actually this whole platform that people want. One of our customers actually Airtable early on, they used our Eval's product to do observability. So they literally would create experiments every day as if there were e-vals and just dump their logs into those experiments. It's pretty obvious when someone starts doing that, that they're trying to do observability in your product. And we dug into why. And it turns out that in AI, the whole point of observability is to collect data into data sets that you can use to do e-vals. And then again, eventually fine-tune models or more advanced things. But still, e-vals is the most important element.

Starting point is 00:34:11 And the next thing that happened is that, you know, some of our customers said, hey, actually, I'm already doing, you know, observability and e-vals and stuff in BrainTrust. I'm spending so much time in this product. Why do I have to go back to my IDE, which, by the way, knows nothing about my e-vals, it knows nothing about my logs. Can I work on prompts in BrainTrust? Can I repro what I'm seeing live? Can I save the prompts and then auto-deploy them to my, you know, production environment? That actually, it scared the crap out of me thinking, you know, just from my, you know, traditional now old school engineering perspective. But it's what people wanted. And, you know, I was talking to Martine, who just became

Starting point is 00:34:50 a Brain Trust daily active user quite recently. And, you know, he spends like half his day now tinkering with prompts in AI town in Brain Trust. And so even like old school engineers, you know, like us, it's definitely the right way to do things. And I sort of see Brain Trust. just evolving into this kind of hybrid between, you know, in some ways it's kind of like GitHub, you know, create prompts. Now you can create more advanced functionality with Python code and typescript code and stitch it together with your prompts in the product all the way through to, you know, evals and observability. And I think we're really excited about building a universal developer platform for AI. In terms of quality, having lived through the

Starting point is 00:35:39 pre-LLM era, I actually think a lot of the anxieties and predictions about quality are exactly the same as they were pre-LLM. Even, you know, when we were doing document processing stuff at Empira, people were like, oh, hey, all documents will be perfectly extracted within six months from now. And LLMs, by the way, are amazing, but document processing is still not a totally solved problem. And I think it's because people will take whatever technology they have and push it to its extreme. There are things that people are trying to do today that are past the extreme. Like auto-GPT is a great example of something that is, I think, a really productive experiment in pushing AI past what it can reasonably do. But, you know, people are always going to push things to their extreme.

Starting point is 00:36:23 AI is an inherently non-deterministic thing. And so I think evils are still going to be there. We might just be evaluating, you know, more and more complex and interesting problems. And then in what world do you think AI will play? and e-vowling itself? I mean, AI already evils itself. So very similar to traditional math, I think, you know, if you're doing like a math homework assignment, it's way easier if someone gives you a proof to validate the proof than it is to actually generate a proof in the first place. And sort of the same principle works for LLMs. It's way easier for an LLM, especially a

Starting point is 00:37:00 frontier model, to look at the work of, you know, itself or another LLM. and accurately assess it. And so that's already the case. I think probably more than half of the evils that people do in BrainTrust are LLM-based. I think some of the interesting things that are happening as LLMs are getting better and as GPT4 quality is getting cheaper

Starting point is 00:37:22 is that people are actually starting to do LLM-based e-vals on their logs. So one of the really cool things that you can now do in BrainTrust is you can write LLM and code-based evaluators and then run them automatically on some fraction of your logs. Sometimes that actually even allows you to evaluate things that you're not allowed to look at.

Starting point is 00:37:44 And so the, you know, the LLM is allowed to read PII and, you know, crunch through something and tell you whether, you know, your use case is working or not, but maybe no developer or person at the company is. And so I think that is a really interesting unlock and probably represents what people will be doing over at least the next year. Super interesting.

Starting point is 00:38:04 Hey, Alancra, thank you so much for joining us today. Thanks for having me. Find us on Twitter at No PryorsPod. Subscribe to our YouTube channel if you want to see our faces. Follow the show on Apple Podcasts, Spotify, or wherever you listen. That way you get a new episode every week. And sign up for emails or find transcripts for every episode at no-dashpriers.com.

Your Ad Here

No Priors: Artificial Intelligence | Technology | Startups - Launching AI products with Braintrust’s CEO Ankur Goyal

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.