No Priors: Artificial Intelligence | Technology | Startups - Baseten CEO Tuhin Srivastava on the AI Inference Crunch, Custom Models, and Building the Inference Cloud

Starting point is 00:00:00 Hi, listeners. Today, Alad and I are here with Tuhin Sribastava, the founder and CEO of Base 10, the AI Inference Cloud. We're here to talk about capacity constraints for AI compute, why inference is the last market, how the workload is changing, the open source, and perhaps multi-chip future, and what 30x scale in a year looks like. Tujan, welcome back. Hi. Good to see it. Thanks for having me. All right. You are in one of the... craziest markets, AI in France. It's very important. There's a lot going on. You guys have grown 30X over the last year. And I think I can say you're expecting to do more than a billion

Starting point is 00:00:44 dollars in revenue this year. What's going on? Tell us about scale. Yeah. No, it's been nuts. I think what's happened over the last, honestly, at 24 months, but it just kind of keeps getting bigger and bigger is that I think everyone is realizing that you can For AI everywhere, you have all these great options available from close source, open source models. The open source models have crossed some sort of chasm in terms of their baseline capability. And then I think RL techniques and post-training is for specialized models has become mainstreaming up.

Starting point is 00:01:25 And there's enough examples of it working. The customers are realizing they can kind of own their inference. more and more. And what that's meant for us is more, you know, the long-tailed models coming through, customers in-houseing a lot of that intelligence themselves. And as the application layer just gets, you know, bigger and bigger and bigger, and that's growing when we are, we're just someone index on that and we've been around to be able to collect the demand. There's an existential question in here that I think everybody is continually asking of, Does the independent application layer get to exist at all versus the labs?

Starting point is 00:02:04 Like, how do you have to believe this? Why do you believe it? Yeah, look, I think it'd be a sad thing if it didn't exist in general. And I think that's like my, but, you know, sadness is fine. Sadness is fine. But that's not the reason why I think the application layer will exist. I think the application layer will exist for a number of reasons. One is because, you know, I think this idea,

Starting point is 00:02:30 that what is valuable to a company is, you know, the user signal that they can gather, that only they can gather. And to the extent that that is encoded in a model, I think a lot of their business will be at risk, but to the extent that it is encoded in workflows, that is where they will be able to develop most. So a good example of that is, say, a company like a bridge where the company, like a bridge, where The clinicians edits of the notes and what they do with those notes after the fact and the thing that happens inside the EMA three steps down. And that becomes a workflow that only...

Starting point is 00:03:11 Can you explain what A Bridge does? Sorry, A bridge is an ambient scribe that is used by physicians in almost all hospitals in the U.S. I think a lot of an investor. Great, shift's amazing, great company, great team, great product. And they've basically got this very, very deep integration into hospitals and to clinician workflows. And my argument would be here is that actually, you know, it's very, very hard for a frontier model company to go to either way at that because they just don't have access to that user

Starting point is 00:03:50 signal. And what will happen over time is the folks who have access to that user signal can start to to post-trained models on that reward signal and start to get long horizon. genetic models running that. And I think to the extent that that is possible and that signal is differentiated and unique and is somewhat rare to get access to, there will be an application layer. And I think, you know, support company is another example of that where, you know, a support, a support task isn't one-shotted.

Starting point is 00:04:23 Usually at a company like Base 10, when a ticket comes in, there's like what, like, one, two, 10, 20 actions I get taken. and that is where, you know, someone can develop a specialized model. So there's almost two versions of this then. There's new companies like Abraigran or Decagon or some of these other things that you mentioned that are doing these new types of applications

Starting point is 00:04:42 that are using AI and they sell it to customers. The other is enterprises building things in-house or building their own models. What proportion of the market today do you think is these new application companies versus enterprises just adopting AI? Yeah. And how do you think that looks in a couple years?

Starting point is 00:04:57 Yeah, I think that's a, that's what we did. I think you asked me the same question two years ago. I had to be repetitive. It is crazy. It's inconsistent. The answer is just that it's crazy that the answer is still, I think. I think if you look by inference count, it'd be 99% the fall. Yeah.

Starting point is 00:05:16 And that kind of represents the scope of the opportunity here, is that the majority of the market hasn't come online and added AI into the world. Yeah. Most of enterprise adoption is well ahead of us. And I think that's one of the very exciting things about AI. Yeah. Because there's just so much still to come and people are underestimating that, I think. 100%. But what's cool is that we're seeing the transition happen, right?

Starting point is 00:05:37 Before it was like, hey, are they using AI tools? I don't think that was immediately obvious two years ago. I think it's obvious now that yes, they are. Are they using closed-source model APIs? I think they're starting to get there. And then once you do that and then you kind of see what is possible, then comes the whole custom model adoption. I think that is all that is ahead of us today.

Starting point is 00:05:57 So if the majority of our customer base today is, as you described, the former, like, application companies, AI natives, the fast-growing, I mean, some of them are at considerable scale now, like the abridge cursor. Yeah. Open evidence. Open evidences of the world. What, you know, what do they, what do they teach you? What does that push the company to do? How do you think about serving them versus evolving for the enterprise? Yeah. I think, firstly, like, you just learn a lot by building where the company's greatest scale, doing the most interesting things. We think of it two ways. I think there's like the most obvious way, which is just build full of a highest scale, you know, most, the customers that will push you the

Starting point is 00:06:43 most from technologically and everything kind of will fall into play. I think the Stripe evolution as a company showed that which is like Stripe now like so many enterprises, but 12 years ago, that wasn't the case. But they just built for the frontier and kind of went with them. And the second way we think about this is to just think about building for companies that are serving enterprises. So yes, we don't serve the enterprise, but our customers serve enterprises. A bridge serves FDivize Open Evidence Decagon. All these writer, gamma, all these companies serve enterprises en masse. And what we actually get is like a translation of the requirements from them, which is like, you know, they're like, hey, we need this sort of data retention.

Starting point is 00:07:25 We need this where models need to be deployed. This is the types of GPUs or the latencies they're okay with. There's the model requirements from like a transparency perspective that they care about. And so I think that is actually the more nuanced answer. It's that if you listen to what their needs are, we actually get a full translation of what the enterprise was required. I would say that by serving companies like a bridge and open evidence, we're probably pretty well suited to go serve the healthcare system

Starting point is 00:07:52 given that they are selling and latent health given that they are selling to them. How much of a shift are you saying in terms of the types of open source models that are being used? And so I think we've seen an evolution where two, three years ago, I think the main thing was kind of mistrawled and then a few other things. And then meta kind of came along with Lama. And then it kind of really shifted in terms of the misperformment models or of Chinese origin in different ways. Do you see that sort of mix reflected in terms of what's being used by our customers?

Starting point is 00:08:16 Yeah, I think customers, at least the customers we are serving, are very, and these are like the fastest growing AI companies in the world that are very forward thinking, they want to use this model. And they are optimizing. I think there is, there are a, there's a subset of task, which I think is small today, where people really start to start with cost. But everyone comes from capability first, because that's really where the economic growth is being unlocked, where the value is being delivered, and then they optimize. And I think that's like actually being, you know, and so with that in mind, you know, you name, like, you name everything from GPTOSS all the way to moonshot model to deepseeks to canopy, or Ophias, which is like

Starting point is 00:09:03 really good text-to-speech models. Customers generally want to use whatever's at the frontier. And I think the difference is just being, I think we have a lot. more visibility into how to run these and how to run these really well, and secondly, that they could now. There have been a number of different concerns raised about the use of Chinese models, in particular security, or is there something embedded in the models or, you know, Trojan horses or other things? A, do you think there's any real concern there?

Starting point is 00:09:34 And, B, you know, people often talk about how there should be like U.S. counterweights to this. In a geopolitical perspective, do you think that's something that's legitimate or something we should be worried about, or how do you think about the sort of origins of these models versus their uses? Yeah, look, I think these models, firstly, are fantastic. They're amazing. We work with these teams.

Starting point is 00:09:53 They're truly awesome. I'd say, look, I don't, it is hard for me. It's hard for me to see, and I could be wrong, but, like, you know, if I network bound these models that they're not magically, you know, going to be able to cross those network boundaries. And so data, Zeta, and, you know, I don't, and we, I've never seen any real evidence. except from some very early models that I think people picked up on very quickly

Starting point is 00:10:20 that there is some agenda or bias built in these. I do think that to some extent is, I think there is importance to the US that we develop our own models. I think that that would be a massive loss if that there are five companies, five different labs in China that are creating open source models

Starting point is 00:10:43 and we're struggling to get one set up, so it's necessary. I also think it's inevitable. And, you know, like the deep seek moment a year ago, I remember someone saying to me, I thought it was very well said, which is like, and the world's changed a lot, but they said, hey, you know, we should kind of just forget

Starting point is 00:11:04 that this is a Chinese model. We should just act like this came from meta and build with that in mind. It's like, you know, I think you kind of miss. the forest from the trees. There's two scenarios, right? Either America does not ever come up

Starting point is 00:11:20 with good open source models, I think there's probably a fundamental problem there. Or we will get there, and we need to be ready for that world. Yeah, that makes sense. It's interesting because, you know, like you, I think it's very important for the U.S. to have a strong open source footprint here. At least for now, it looks like effectively the Chinese government is subsidizing,

Starting point is 00:11:40 at least a large subset of these models, and that subsidy or surplus is effectively just being passed on the US enterprises who are adopting these models. In other words, it's a way for the Chinese government to effectively subsidize US enterprise

Starting point is 00:11:50 in an indirect manner. And I think that's a little bit lost right now. But, you know, it's always interesting to weigh that against some of the other concerns to raise. I appreciate your comments on this.

Starting point is 00:11:59 And I think the concern also just, there just becomes, it's like, what happened if we aren't able to? Like, if it is fun, like, I think if you think of it with the economics here, which is deep seek by most,

Starting point is 00:12:12 deep seek is a very good model. You know, like, and like you can argue whether it's at the absolute frontier or not, but like let's go back three months and it's there. And so think about everything. We were doing a whole lot of things three months ago. Yeah. And so let's just think about that. Well, you know, if it, you could run deep seek, probably 20% of the cost of running

Starting point is 00:12:33 open andropic models in production with comparable, better latency, probably better reliability. If we don't have access to that intelligence in that form, I think. is just a massive loss. And as a country, we won't be able to innovate as fast because the cost of intelligence going down in control of intelligence, what we have seen just made more intelligence. Intelligence being embedded in more places. Yeah, an important note here that we didn't mention explicitly is that the state of the art models, the ones that are most far ahead on the frontier are actually still the closed source

Starting point is 00:13:04 anthropic, open AI, Google, etc. Yeah. What has been, actually, maybe you can just characterize like workload a little bit, like how of tokens being served on Base 10, like how many of them are from custom models of some kind versus like vanilla open source today? It is all custom. It's basically, okay.

Starting point is 00:13:24 So like 95% plus. 95% and I think that's really cool, to be honest. Look, we have two businesses. We have three businesses. We have three businesses right now. Should we help you count? No, no. So we have like dedicated, dedicated inference,

Starting point is 00:13:39 which is basically custom model inference, your SLA is your SLA then we've shared inference which is shared inference endpoint shared SLAs and then we have a training business I'd say 95% of the tokens today

Starting point is 00:13:54 are on the first business and almost all of them there's probably for almost all of them the customers making some modifications to the model with their own data specialized for the use case

Starting point is 00:14:09 and I think what's even more important is they might be compiling it in different ways. No one is just running the vanilla open source weights. You might be customizing it for quality, but you also might be customizing it for performance. You made an acquisition of a research team a few months ago. You mentioned post-training customization. What was the rationale behind the acquisition?

Starting point is 00:14:32 What is that team doing today? Yeah. So the rationale around the acquisition was, you know, we are infrastructure and product people. we have product people and now are really good infrastructure people and the and we didn't have much of a research capability ourselves and what we saw was the market moving heavily that we could accelerate the market itself with post-training resources of either product tied or aren't even just as resources for that market so pars

Starting point is 00:15:10 there was a company that was a base 10 customer. So there were post-training models and running them on base 10. And I think what they realized was that they would eventually need to become an inference company. And what we realized was like, hey, we really needed that expertise because it represents a way for us to get closer to the customer earlier and be able to support them more.

Starting point is 00:15:39 and just made sense as a like pairing them together and um just as as as i said in the opening statement here which is you know as more and more post-trained models have come up we've realized that like the demand for people for people to either um for software loops to do post-training or for post-training expertise is very high and we're really really investing um in that um there are also a bunch of Australians, you know, I like to think that we had a bit of alpha there. But, yeah, that's been fantastic. They're working with all sorts of customers. And it's also very interesting when you start, you know, we were doing a lot of research

Starting point is 00:16:24 on the performance side and less so on the post-training side. It's interesting as we've started to do a lot more research on the post-training side. You start to see how linked inference and post-training are. And like, you know, even when you think about stuff like quantization and when you should do that. And like, you know, how training, how you train the model affects how you need to quantize for inference and how paired these problems are has become like very apparent. And more and more we're like in the post-training inference are kind of both sides of the same problem. So because inference ideally will beget more post-training where inference creates data, you do e-vales. You can now post-train on that.

Starting point is 00:17:05 reward function that you found with those e-vals and hopefully just set up the entire look. Plenty of folks from Ant and OpenAI, Sam, Greg, et cetera, have said in recent months that inference is super strategic, inference talent is strategic, capacity is strategic. So between that and post-training, these are very difficult to gather, like, capabilities. I imagine that lots of your customers go to you guys for advice. on how to do this progression of moving to custom models. Like, what do you tell people about the lifecycle and when they should invest in that? Yeah, I think it's, hey, go find, go prove to yourself with the best in class model that you have something worth optimizing.

Starting point is 00:17:50 And I think, you know, a lot of, you know, if a customer comes to us, you know, was that meme, which was like, it was like two years ago. It feels like no GPUs pre-product market fit. It's like no post-training pre-product fit is whatever. Yeah, yeah, yeah. So people that you're working with here are very, very at scale first. Yeah, they have a user signal that they know how to optimize. And they've shown that they can, you know, they can serve customer value and that value, and that they have something special around that value.

Starting point is 00:18:20 And once you have that value, it's like, okay, now how can I do that better, faster, and cheaper with the idea being that, hey, if you need to be very good at customer support, you can, you maybe don't need to be that good at coding and that a specialized model might be a better fit for that problem and you can do a better, faster, cheaper. What about the capacity side? You started with unifying capacity across all the clouds and new clouds. How do you think about this when everybody keeps talking about a supply crunch and a multi-year supply crunch? I think there's so much narrative around the supply crunch.

Starting point is 00:18:54 And no matter, like, as much as we hear about it, I don't think people realize how bad it really is. Like, there is, you know, there is very, very little Slack compute available. Like, you know, we run pretty large clusters ourselves, and we run them in, like, uncomfortably high utilization. You know, what I'm saying, we're like mid-90s utilization most of the time. There is, we have made, we have, we sit in 18 different clouds now. We have 90 clusters around the world across 18 different clouds. And like, you know, initially we started, we like built this technology to be able to like kind of create one runtime fabric that spans all these different clouds and try to abstract that away from our customers as a way to think about reliability, latency, failover, all these things that we think would be very important for very mission critical use cases. That same technology, like just our ability to get compute wherever humanly possible has been really, really helpful in our ability to get supply.

Starting point is 00:19:58 And what I mean by that is we can be introduced to a new provider in a different country and have it up and running with the whole base 10 inference stack. As part of the fabric. Part of the fabric in half a day, maybe less. And that gives us enormous flexibility. Even for us, it is hard for us to grow. We have a, we have a, I think that's, yeah, I'll say it. We have a 4 PM standing meeting for the company where we basically like, like, how do we, like, how do we, how we, how we manage capacity for the demand right now?

Starting point is 00:20:40 I think the second part which people don't really, the two, the, the second part that people don't really understand is that there are also a lot of, um, suppliers right now. that it's kind of grifty you know like I think you know they haven't run they haven't run data centers before you know they don't understand SLAs especially for inference

Starting point is 00:21:07 and so like you know even when there is capacity available there's a lot of deal like there's probably we run a lot more than we have redundancy so it's fine but if you you know there's probably like a dozen

Starting point is 00:21:20 good like clouds and I I'd probably like put like three or four of them in like the gold tier. And I think that just means that like supply, like not only are we supply crunch, we're supplier and operationally crunched onto people who can who can run these data centers as well. How far ahead can you actually buy capacity right now? In other words, like, is there any, any slack in the market if you buy two years ahead or five years out?

Starting point is 00:21:47 You know, what you mean like actually like contract length or actually like, hey, I want this in January 28. Yeah, either one. Yeah. I mean, it's more the, I want this in January 28 or at least I have some visibility into my future supply. Yeah. You could buy that, but you could also remember how quickly the market is, how quickly the

Starting point is 00:22:07 market is moving. And like, you know, that gets balanced somewhat off like the fact that the H-100 is such a great chip and like, and then set, you know, it's crazy, it's four years, four and a half years old, the price is going up still. Yeah. Maybe it has a useful life for nine years. Yeah. So, you know, that's good, but at the same time, at the same time, you know, yes, you can do that.

Starting point is 00:22:32 But, you know, you're making a lot, like, you're making a lot of bets as part of that. And then in terms of, I think that's the big thing that's changed over the last six months is that the term length that people want has just gone up. So if you wanted a thousand, a thousand, 124 B-200, which is, you know, from a good cloud, right now you're not getting that less than a three to five-year contract. Right now with probably a 20 to 30 percent TCV prepay. So like actually what becomes important when acquiring capacity is you need to have enough demand to supply it to server, but then you also need like a low cost of capital, which is actually changing the dynamic pretty significantly. Does that impact how you think about going public?

Starting point is 00:23:22 as a company because arguably. Yeah. I think you'd go sooner. Yeah, exactly. Yeah, I think you need, like, I think the, and I think there is demand for that. But I think, you know, the pool, the, it also, you know, one of our,

Starting point is 00:23:37 one of the, one of the realizations that we had recently, and with software people. And so we don't, we don't think like this all the time is that, you know, our business has, like, very interesting working capital. Mm-hmm. On requirements, like, you know, and, and I, And I think even, and that as a result of that, it has very interesting financing. Yeah.

Starting point is 00:23:59 Requirement. And we're not, at least right now, we're not even going down to the, down to the debt. There's also things you could do in terms of debt or other structures. Yeah. Yeah. And yeah, I've learned a lot about debt. Yeah. Recently.

Starting point is 00:24:10 Given the supply crunch, in France being one of, you know, the top couple markets you're going after, you have plenty of people who understand this problem. And therefore, you know, competition. How do you think about, like, what are the factors that create a dominant player here or a winning player? Is it, as you mentioned, cost of capital? Is it access to supply? Is it software? Is it demand? Yeah. Just being excellent everything. Yeah, it's, um, look, I think what's so interesting about inference is GP. Is it operations? I guess it's actually cloud. Yeah, yeah. I think, like, GPUs as a service is not sticky. I think that's been seen. Like, customers generally,

Starting point is 00:24:52 we just see that as commodity. Infraint with the software layer included is incredibly sticky. You know, like, just like, you know, none of our top 30 customers have ever churned. You know, we're talking like 400% annual NDR around our business. And so it's like very, very sticky. So I think that software layer is very important. The optimist in me is like, oh, there's so much value in the software. And we will build the best software.

Starting point is 00:25:22 for inference that exists. I think, as I think is becoming clear now, access to inference compute is a strategic advantage. And I think that is like the, I think that is the strategy that even the labs are going after, which is like, if we have all the compute, good luck running inference. Yeah, yeah. In a world of constraint compute, the number one thing to own is compute.

Starting point is 00:25:48 Yeah. And so, you know, just owning it in and of itself as an asset. And I think people under appreciate that. Yeah, you can't make a good hot chocolate without milk. And you know, unless you're a vegan. I let you're a vegan. Say, no one's a vegan inference. Well, I'm going to ask you, people might want, they might want alternative milk, right?

Starting point is 00:26:06 So, okay, like when you, the H100 is a great chip. People, you know, want a B 200. They want GV 200. They want, of course, tons and tons of invidia. When you think about making a bet, you know, several years in the future, Do you believe that there is a multi-chip world? Like, what do you think happens from a compute perspective on the chip side? Yeah.

Starting point is 00:26:30 I think, you know, diversification everywhere is the same way I want a world of many models. I think, you know, we want a world of many most things. And I think it'd be sad if it didn't happen. Yeah, and I think everyone would be sad. I will say to some extent, which is, yeah, and I think, there will be inference specific chips. I think you have like decode specific chips, I think. And we're looking at...

Starting point is 00:26:55 And Nvidia said this too. Yeah, yeah. I mean, that was a whole GROC LP thing. It's like, you know, I think, I think that is very straightforward and it makes sense. I think people really, really, really underestimate supply chain stuff with the video, like how good they are at that. Kuda, how good Kuda is, the developer ecosystem around it. And, you know, we...

Starting point is 00:27:18 it the ability, like, to me, like, one of the most important things is the infrastructure company in this moment is how fast you can move. And you can move fastest within Vida today. I mean, I think that is the reality. And like, it just like given the scale that they operate at, given the scale that they operate at, it's, it's hard to, it's hard to see, it's hard to see the, and I'm not saying it won't happen, like the short term, like in the next couple years, how anyone's going to be able to be for that. especially with, you know, so much of the other players, like what you need to be able to compete here is the ecosystem to form around you. And if you tie up all your supply with one buyer,

Starting point is 00:28:01 which, you know, a bunch of the other chip providers have done, it's actually hard for that ecosystem to form. You know, like if you think about if you're a big lab and you have a proprietary deal with one chip type where you get 90% of the supply, it's actually in your best interest to make sure you get 95% of supply at every. You know, like, if you think about, if you're a big lab and everything that's got built for you and no one else could ever use it. When you think about reacting to the market, what do you think is happening with the actual workloads that you have to go invest in, right? Like obviously, code agents and Long Horizon agents over time have become a big deal. People talk a lot more about CPU compute, video inference is different. I don't know if it's that.

Starting point is 00:28:39 Sandboxes, like, what's important for you guys to invest in now? Yeah, look, I think this, for us, all the runtime stuff is obviously, very important. And what that means is like what tips we run on, how we run, what kind of workload we support. Do we get very good at diffusion transformers? Yes. Coding ages need sandboxes, we should code with sandboxes. There's all sources of new speculation techniques to get faster in print. We need to do that. Even stuff like KV Cashaway routing and, you know, that stuff's a bit old now, but like continuing to be very good at that and somewhat disentangling pre-fill and decode and starting to treat them a separate problem.

Starting point is 00:29:17 I think that's, you know, something we are very focused on and we're seeing massive gains. That's at the runtime level. I'd say, you know, beyond that, you know, everything we think about is how to create more of that loop between inference post-raining because we think that just begets more inference. And so, like, we will build a partner in almost everything there. So, you know, we're going to work with, you know, the best EvalS companies of the world to make sure that's very well integrated, like brain trust into an around base 10, you know, we will partner with all on the sandboxes side, build the best sandboxes experience that will exist.

Starting point is 00:29:57 And then we'll create the best training APIs to make it so continual learning becomes somewhat of a solve problem. It's not just like a discrete thing. That's, I think, the core base 10 product thesis is like how do we build that loop? And then everything out around that becomes, make sure that we can do everything we can to ensure that gets the biggest possible. That's access to compute. That's an infrastructure. Make sure we can get compute anywhere. Make sure we have access to our own compute.

Starting point is 00:30:28 And then I think it's all the primitives that come after that. That just become incredibly like margin or creative, both for us and our customers, which is stuff like, you know, sandboxes and like the async batch inference, like how we drive utilization by having a first class batch inference experience. To me, this is like what an inference cloud looks like. It's that you are very good inference, and then you start to do all the things tangential, or that loop into inference and partner and where necessary and build where necessary. But we really do want to own, like start with that core inference story and then go down to unblock supply, accrete margin, and go off the stack to unlock value. What would surprise people about some of the issues you discover only at scale?

Starting point is 00:31:14 I'll give you an example. I was surprised when you guys ran into scale limitations, like fundamental limitations with some of the hyperscaler products that you were consuming. Yeah. And I think as the AWS GCPs of the world as supporting infinite scale. Yeah. I mean, I think you just, and like, again, like, I think very, very large companies that run services of big scale is probably the same stuff.

Starting point is 00:31:39 Mm-hmm. Is that all the edge cases just become... You actually experience it. You experience them. And like, you know, and you, I'll give you a few examples here. Like, you see, you know, you start seeing, you know, yesterday we had, for the first time ever, we saw some kernel panic.

Starting point is 00:31:57 That only happened because some fluent bit worker was creating too many logs and the scale was too big and it was all into one node. And it was happening two terms at the same time by two different workers. So you see all like the systems level and kernel level problems. But then you start to see,

Starting point is 00:32:17 I think the craziest stuff is that you start to see with LLMs that these runtimes are pretty immature. Even how we use KV Cash is, you know, probably a little less sophisticated than most people see, than most people see. And we are starting to see the limitations of the current and the next set of primitives that need to be built from a scale of security performance perspective.

Starting point is 00:32:44 But I think it's really at the runtime level and the systems level. But the edge cases are, I'd say, a lot more systems level than they are LOM specific. What are the things that keep you up at night? Capacity. I think, you know, I think capacity. I think the other one is probably just this market's so big. And so, like, it represents a moment when you should be as aggressive as possible. and, you know, really, you know, we've grown a ton of this year of the last 12 months, the last few months,

Starting point is 00:33:19 but the answer is always just, you know, go bigger, go faster, and I think that's really, really fun. It's also a little exhausting, and it's also like, we are all in somewhat uncharted territory in terms of how fast and how big you can go and how things can get. But I think the big one is compute. I think, like, there's no world in which there's enough compute to, you know, get the amount the amount of value that we want to get out of Lums in the next five to ten years. Or we have to invent a lot of new stuff. Yeah.

Starting point is 00:33:48 Maybe if we just talk a little bit about what you're learning scaling, you know, 30X is like an aggressive thing to go through as a company. You've brought in a lot of really amazing talent, like Danny and Samir and Stephen Day, folks on both the technical and the go-to-market side. What do you think is working about how you are recruiting and scaling? Or what's your philosophy on that? We were very, very flat, like, until, I know, 12 to 18 months ago. I remember I went on a walk with a lot, actually.

Starting point is 00:34:27 And a lot of it's like, you just need leaders. And it's actually like so contrary to everything, you know, as engineers, you're like, oh. Yeah, it's all overhead. Everything is overhead. Everything is overhead. You once told me, I think that you didn't, you're like, hey, Sarah, Sarah, what about we just have engineers instead of salespeople?

Starting point is 00:34:47 Yeah, yeah, yeah, bad. Everybody learns it. Everyone's the same. We're all the bad. I remember, like, you know, you said it so clearly at the time a lot, and I think that's what we've noticed. We're just, like, actually having a leadership team that you can trust, that you can trust, is so important.

Starting point is 00:35:07 I think the two or three things that I would say is, you want people where you can give them whole problems. And so, like, you know, if you feel like you are micromanaging, if you feel like you need, if you feel like, you know, you have to be involved in everything, I think that's a bit of a cop out as a founder because you're just like, I just need to be involved in everything. It's like, no, you probably don't have the right people.

Starting point is 00:35:32 I think the second thing is be very, very clear what you're optimizing for. Because I think when you're very, very clear where you're optimizing for the people and like, if it's something generic, like we want the smartest, hardworking people, like you can't do much with that. Like, with us, what we cared about was, hey, actually, we don't care about a lot of people who have done this before. We care about people who think first principles. Work has to be a high priority, but they also have to be very kind and nice and, you know,

Starting point is 00:35:59 care about the collaborative environment. we don't have a hero culture, you know, very low ego. And, you know, if you need a manager, like, it's probably not, it's probably not the right place to be. But I think once you have, when you have that clear rubric, the people become very apparent that will fit into it. And the people that don't fit into it also become very apparent. And I think what's more, like we've hired amazing people like you mentioned, but I think what's a lot more interesting is like I think we haven't had a ton of like turnover there unnecessarily like people tend to work because we because we have a very clear in what we want

Starting point is 00:36:41 it took us a while to get there. What about the idea of like an operations culture? And we were talking to Alyssa Henry about this and she's like, well, the hard thing about cloud is actually just operations. I slipped with a pager under my pillow for a decade. I don't think I've seen you detached from your Slack channel for my. My phone's buzzing right now. I'm kidding.

Starting point is 00:37:02 I hope that's not a second one. I'm getting anxious. And you've been concerned before. Like, do people get it? Like, you know, what is distinctive about that? I think, I think, like, one, I think if you've worked at infrastructure company, like, we were once in a meeting with a bunch of AWS execs. And this was, you know, like very senior AWS folks.

Starting point is 00:37:24 All their pages went off multiple times during our 45 minimum. You know, like, it's a, I think, like, it's, it's, it's very much, like, it's a cultural thing. But, yeah, like, I don't, you know, Al, like, inference can go down. And, like, you know, we, you know, you know, you learn to, like, you know, what's this, like, I think, me and my co-founder, when his pageant goes off, his seven-year-old said, is that a P-0? Oh. Oh, is that, is that a P-Zero? And so, you know, I think that is, you just have to get used to. And that's a culture you live in.

Starting point is 00:37:57 It just changes the speed. But also it becomes like a, you know, a cultural thing. I think it's very, very, it rejects people that don't fit into it very, very quickly. Like engineers who avoid patriotic. Yeah, you know, when we have PSAOs, but like everyone on the call. Like, you know, like there's been a joke, there may as well be a siren that goes off in the office. And when it wins their instance. So people have been talking ad nauseum in the AI community about Jevin's Paradox.

Starting point is 00:38:26 Yeah. where if you decrease the cost of, it's really a question around price elasticity and availability, if you decrease the cost of a good, say intelligence as a good, people actually consume more of it. Like the personal or business ROI of it, the demand for it goes up, not down. Do you see this? And are you working against yourself trying to make these models more efficient? Do people just use them more or less? Yeah, I think you think about this from a developer's,

Starting point is 00:38:56 perspective and consumer perspective. I think consumers just want the best answers and the best experience that's somewhat governed by more intelligence to some extent. I think when you go to the developers, from the developers perspective, they will insert more intelligence if you make it cheaper. Like that's, you know, and they will insert more intelligence anyway. But if you make it more cheaper, they'll insert a hell of a lot more intelligent. And you see this with agents. It's that agents are just longer running now. And I think that's what we have seen with the cost of inference going down, which is, you know, folks are just like, okay, we can we can run this for longer, we can make it do a bit more work and we'll get to a larger end. I think that compute scales from inference perspective as well.

Starting point is 00:39:46 And, you know, I think we are seeing that with almost all our customers, which is, you know, they either start with like, this is the quality of answer, right? I need to get to, and this is the amount of inference I need to do to get there, or this is the base I will model that I can start with, that I can work with to get there. And I think the more we drive down the costs, what they realize is more intelligence just means better use experience. I just want a better answer. Better answers, better experiences, more dollars, more revenue. So, yeah, I think inference going down just begets more.

Starting point is 00:40:19 It is truly, I think we're kind of in a world that is, you know, it is the last market, right? Like, even if there's AGI, all that's left is imprint. Yeah. So you do not see in your customers a, like, this answer is enough and this action is enough. No. Yeah, it's going to keep going for a long time. Yeah. How do you view all this kind of evolving as a future?

Starting point is 00:40:43 So basically this is one of the, it seems like it's going to be one of the biggest markets of all times. Yeah. We have this massive shift where we're moving from software and seats and digitization into actual intelligence, selling units of cognition. Yep. Selling agentic workflows. What does this all look like in a couple years? Like, what is your view of this future world? I think for consumers, it's the best possible thing, right?

Starting point is 00:41:05 Like, everything is somewhat smarter. You know, you get better care because your doctors have access to better tools. There's more, you know, like, there's all this stuff about. they're being less software engineers. I think we just build more software. And we just build a ton more software. And like, you know, I see, you know, we're not slowing down hiring of software engineers.

Starting point is 00:41:27 We're just building more things. And for the consumers, that just means better to look more software. All those good things. It's almost like everybody has our own team for everything, right? You have an agent which helps with your doctor. You have an agent that helps you learn stuff. You have an agent that helps you organize your life.

Starting point is 00:41:41 It's concierge. It's concierge. Yeah. Yeah. Concier's everything for everyone. Yeah. And I think like what that means, well, that's amazing.

Starting point is 00:41:48 I think that's great. And education, same thing. You have college education. You get personalized access to everything. I think then you go one step back in how it affects developers. I think, you know, and companies, I think if you don't embrace this, I think it's an extension moment for a bunch of folks, which is like, you know, everything needs. And I don't think that means that, you know, forward design needs figmat.

Starting point is 00:42:16 I think that's a thing. I think what's more interesting is just like, you know, all these workflow and software companies need to figure out what is the intelligent or intelligent inserted versions that drive the amount, all that user value for those and consumers that we talked about. Yeah, very exciting. Thank you so much for joining us today.

Starting point is 00:42:35 Thanks, guys. Find us on Twitter at No Pryor's Pod. Subscribe to our YouTube channel if you want to see our faces. Follow the show on Apple Podcasts, Spotify, or wherever you listen. That way you get a new episode every week. And sign up for emails or find transcripts for every episode at no dash priors.com.

No Priors: Artificial Intelligence | Technology | Startups - Baseten CEO Tuhin Srivastava on the AI Inference Crunch, Custom Models, and Building the Inference Cloud

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.