No Priors: Artificial Intelligence | Technology | Startups - Speed will win the AI computing battle with Tuhin Srivastava from Baseten

Starting point is 00:00:00 Hi, listeners. Welcome to another episode of No Pryors. Today, Alad and I are catching up with Tuhin Srivastava, the CEO and co-founder of Base 10, which gives teams fast-scalable AI infrastructure, starting with inference. They're one of the players at the center of the battle heating up around AI computing. Welcome, Tuhin. Hi, thanks for having me. Good to see you guys. Let's start at the beginning. For any listeners who don't know, what is Base 10 and how'd you start working on it? Base Tan is a infrastructure product, so we provide fast-scalable AI infrastructure for engineering teams working with large models.

Starting point is 00:00:36 Currently, we're focused on inference, and we want to do a lot more after that. But for the past, say, four and a half years, actually, oh, that's a long time. For the last four and a half years, we've been, you know, cutting our teeth and trying to build this thing. I think it's been pretty rewarding over the last 12 months, seeing the market kind of show up and, you know, everyone get equally excited about AI infrastructure. We started this honestly because, firstly, you know, we thought ML was pretty cool in 2019. We thought it was going somewhere and we wanted to build a Picks and Shelvers business

Starting point is 00:01:09 and kind of solved the problems that we were running into. I think the side note here is that I wanted to sell a company with my friends. You often say that base 10 isn't no code, it's efficient code. Like, why does that difference matter? That wasn't always the case. I'd say, I'd say like, you know, those times when we had elements, which were definitely a bit no-cody, what we've learned over the last three or four years is, you know, code is just incredibly powerful and engineers want to write code. Even in its best form, you know, you want to build

Starting point is 00:01:37 really, really tight abstractions, but I think the ability to turn the knobs under the hood is very, very important. I think no code kind of makes that a lot harder. I don't think it removes it, but it makes it a lot harder. So what we do is just build very strong, intuitive abstractions that try to make the easy thing super easy and still make the hard thing. is possible. So, you know, you can get a lot of value really quickly. But I say unlike a lot of other infrastructure products that have been built in the last 10 years, we're trying to solve against the graduation problem, which is that, you know, that we're able to support teams as they grow and scale. And just to sort of make it a little bit more of a rural for our listeners,

Starting point is 00:02:19 like what are the types of applications that run on base 10? Like, what's the skill of the platform? Do you have a favorite application? Everything from, you know, tiny side projects on weekend all the way to companies that are pretty AI native. We've supported foundation model companies. We work with companies like Descript, where AI is very, very core to the product experience. We power a lot of AI features that Patreon has shipped. But I'd say some of the more interesting use cases from our perspective, actually, on my perspective at least, are either the really small teams that we're giving a lot of leverage to so that they can ship things very quickly. So a really good example of that might be a company like planned.

Starting point is 00:02:58 AI, which is basically building an SDK for call centers, is how I describe it. But, you know, they're able to ship models and, you know, co-locate workloads so that they can get, you know, sub-300 millisecond or sub-200 millisecond responses without, you know, months and months of infrastructure effort. I think on the other hand, it's like, it's really exciting to see, you know, companies become AI enabled. because that's where we see a lot of the values going to be over the next decade

Starting point is 00:03:29 is, you know, if I look at a company like Picnic Health, which has actually been around for a decade and starting to do a very, very interesting thing with the corpus of data that they've gathered over the last 10 years and like supporting those use cases, I think their models called Picnic GPT which extracts information from medical records. And to me, those are the really exciting use cases

Starting point is 00:03:47 where, you know, you're giving leverage to companies that are good at, like, they're good at the domain that they are working in, their model might be proprietary. The data might be proprietary, but the infrastructure doesn't necessarily need to be proprietary, and we can give them just an easy way to deploy that stuff without many, many people months. It's become like in vogue to compare the size of your GPU cluster. People are spending a lot of money on GPUs. We hear about 600,000 H-100 equivalents and lots of venture rounds being raised,

Starting point is 00:04:23 often to train, you know, large models in some domain or another or even, you know, more and more expensive post-training, are training in inference workloads different? Yeah, I think so. I think they just have very different, almost like SLAs for the customer. Like, you know, things that matter for inference are things like, you know, your cluster is somewhat co-located with where you're doing your work. Whereas training, you know, stuff like that matters a bit less. It doesn't really matter. where that training is happening as long as all your GPUs are somewhat together.

Starting point is 00:04:57 You know, even the GPU classes themselves, like, you know, the four training, networking is a very, very important piece to have networking on the racks themselves with inference and matters a little less because you're doing a little bit more on individual GPUs and less so across GPUs. I think from a user perspective, there's a lot more workflow, I'd say, in inference that's repeated across customers as opposed to, you know, you guys work with a bunch of companies training models, you know, the state of the art really is, give me some SSH keys and let me go at it, whereas with inference is definitely repeated workflow where people are trying to get similar things out of the inference infrastructure, whether that

Starting point is 00:05:39 be like work, whether that be, you know, version management, whether that would be the way they deployed, hooking it up into CICD, you know, cold starts and so on and so forth. So I'd say it seems a bit more repeatable today how the problem is being solved by, you know, customers. I'd say the hardware requirements are quite different. They're probably a little less to some degree. But I think, you know, resiliency and reliability matters a lot more. You know, downtime is unacceptable from a perspective. No to get terminated all the time from a training perspective. You were quite early to this market. And I think you folks pioneered a lot of the sort of early ML sort of infrastructure for these sorts of use cases and applications. What has been the most

Starting point is 00:06:22 surprising thing or what did you least expect relative to how things evolved? Yeah, I think I can answer that question at two different altitudes. I think you can answer that question from like a market perspective, which is that, you know, I think, Eli, you have some old writing, which is, you know, markets, basically markets or all that matter. I think we've felt that very viscerally in some sense, which is that, you know, you can build all this cool stuff. And then when markets show up, you feel that and that really pushes, you know, the customer forward and the needs for your product forward. I think, so that's one thing, which is like the acceleration we saw through the end of 2022. And in 2023, you know, definitely took us by surprise. Like, I can, I can be really honest

Starting point is 00:07:06 and say that from 2019 to 2022, it was pretty quiet. You know, we had, we had happy customers, but, you know, the demands weren't necessarily there. I think from like a, practitioner perspective, how fast some of these teams move has really shocked me, which is like, I think what is really clear in AI and early stage AI in general, and like I think the enterprises are waking up to realize this right now is that speed is actually your number one advantage. Things are moving so fast that, you know, there is, if you're not, if you're not competing on speed, you're going to be left behind. And so there's actually a lot of propensity to buy versus build, I'd say.

Starting point is 00:07:50 People are happy to, you know, to buy technology. Where I'd say in the past, people were pretty hesitant to buy infrastructure. We talk with companies all the time where we think that, oh, you know, they probably have something built out where, you know, they're a lot less sophisticated than you think and they're handling a lot more scale. So they need to be able to have the infrastructure to support that. So I think that's probably one of the long. larger things that we'd be surprised by, which is, you know, how fast people need to move to

Starting point is 00:08:22 be relevant. I think the other thing, which is just like how GPU needs have evolved, going from, you know, at the end of 2022, most of our customers were using T4s and 18Gs. You know, after that, it changed to A100s. Now it's gone to H100. Like the compute needs aren't necessarily going down. They're only going up, especially as these services scale up. You guys were just highlighted for leading on the independent artificial analysis.A.I. benchmarks for highest throughput and lowest latency serving. Congrats. Can you give us some intuition for what is driving that? Like, you know, what makes inference hard or hard to run fast? Inference is, you know, I think there's like multiple things which are quite difficult about inference. You know, I think there's a workful headaches, which we've talked about a bit. I can talk more about there's like the scalability. and reliability and reliability ball and next.

Starting point is 00:09:15 I think the stuff you're talking about is performance optimization, which is really like how long does one generation take? I think, you know, there's a lot of work and research that's being done to run generations as fast as possible

Starting point is 00:09:28 to get the maximum throughput and the minimal latency. I think historically, historically, or I mean, it's funny when I say historically because I mean, it's the last six months. But over the last six months, over the last six months,

Starting point is 00:09:42 a lot of work has been done in a research community to get basically these things to move faster stuff, like speculative decoding, which came out. I think I can't remember when it came out sometime in the last six months. It started really being used. For us, what it means is like how much, how much, how well you can use the GPU, how you can scale across multiple GPUs,

Starting point is 00:10:04 and how you can honestly, like, be really, really up-to-date with the latest things that are happening in OpenTools. and research, you know, we've partnered with a company, with, with Nvidia, with a company. We've partnered with Nvidia and really, like, worked really closely with their L-L-M engine called T-R-T-L-L-M, and that's actually driven a lot of the performance gains that we've worked with, and we've contributed to that, we've forked that. But, you know, the hard thing there, a lot of the optimization you're doing is pretty low-level, and there's no rule abstraction.

Starting point is 00:10:42 So you either have to learn how you use open source very well, rewrite some of these kernels by yourself. You know, if you look at something like opening eye, like what do people complain about? A lot of the time it's speed. It's speed. And like that's probably, you know, one of the core performance advantages of open source

Starting point is 00:10:56 is that you can get these smaller, smaller, smaller models to run faster. And I think, you know, that will continue to be a massive focus for us going forward as well. On the, on the benchmarks, I think, you know, it's pretty crazy how that's, evolved as well. I think, you know, we've gone from, like, state of the art being 90 tokens a second, then, you know, it got over 100, now it's over 200. Now we're talking for some people after 300, 400. And I think that's going to continue to be like a very, very important

Starting point is 00:11:31 place to innovate. And we think over time, it will get somewhat commoditized, the performance, especially for language models, to be honest. I think, you know, more and more of that. stuff should run locally to some degree, I think. But being on top of it, making sure that we're kind of attached to the state of the art is, you know, if we're not existential risk to the business. And so, you know, we have to do. How much optimization have you been seeing for other types of models? So diffusion models, some of the language, the Texas speech models, you know, other areas like that. I'm just sort of curious. There seems like there's different types of optimizations happening across different foundation model types as well.

Starting point is 00:12:12 So I was curious, you know, what's state of the art there and how you're thinking about it. I don't have the metrics on hand, but we're seeing and, you know, also pushing limits there as well. So just yesterday, for example, you know, we were able to get whisper running. So there's whisper, there's faster whisper and the whisper on TRT, which again is the invidia thing. I think what we are seeing is that there's more and more focus on bringing these experiences. experiences to real time as possible. And so, like, you know, one of our customers, the company could gamma AI powered storytelling storytelling software.

Starting point is 00:12:47 They use table diffusion image models to generate images. You know, getting that, not waiting four or five seconds there and still having high quality images is, again, core to their business and making it very, very fast and easy to use. And I think, you know, we are seeing, like, definitely from a customer requirement perspective, if we're seeing that, I think, you know, have we been able to juice as much there? Not yet, but I think we're getting that.

Starting point is 00:13:16 Speaking of the applications, you know, driven by these models still being generally startlingly slow, including the like really amazing capable ones from chat GPT to cognition to, you know, things like like PICA and Mid Journey and, like the, you know, in a way that consumers have not seen in many years, we are waiting, you know, seconds for, for interactions. Is your view of that, like, it'll change because the models are getting smaller. It'll change because more smaller models will get more powerful. People do distillation. People just get better at running these things.

Starting point is 00:13:55 Like, we'll get better hardware. What's the, what's the path to, like, not waiting 10 seconds a generation, two minutes a generation? You know, if you look at so many of the gains of running this trial fast, let's take that as an example, you know, the step function gains come from, you know, a few things. Like, the first one is running on H100 and not running on A100. So I think, you know, it's a hardware gets better and better.

Starting point is 00:14:20 Like, you get this almost like leg up. And so as, as, you know, hopefully those prices go down and that gets more available, there'll be one thing. Hopefully the H200 comes up next and we did that. I think the next piece is around software optimization. So stuff like continuous batching and dynamic batching and speculative decoding, which basically makes it easier to either paralyze or batch process a bunch of things or makes each individual generation faster or offsets it, offload it to another model.

Starting point is 00:14:50 There's lots of stuff in that place. I think these models are also just going to get smaller, to be honest. I think that's like the really, really powerful small model that does one thing, That's pretty exciting. I think is it, I think that's stuff like Olam is very interesting. But I think if you saw Sourcegraph has this where they basically announced that Cody's now running locally for a lot of customers. I think, you know, that's a pretty actually exciting proposition. We have to figure out where we see a lot like that.

Starting point is 00:15:23 It's not necessarily, you know, amazing for cloud providers like ourselves because, you know, we want everything to run on the cloud. But I think things do get smaller. things do get more efficient, and we have more distilled, more powerful model, sharper models to do. Yeah, it's the unbundling of models. How and when do customers choose to deploy their own models or open source models or fine-tuned open source models on their own infra versus use public model endpoints? Or like, what guidance would you give people? This is a general trend, right? Which is that you've got to open out, you're going to anthropic, you have an API that works. Really quickly, you're like, that's too slow. Or that's too expensive. Or, you know,

Starting point is 00:16:01 I don't need something that powerful. So we often see, like, if customers have a lot of money, they end up going to pay for private deployments in Azure, they still have the son-in-ness issue. Then they go to open-source models. When you're starting out with open-source models, you're going to go to a shared endpoint. There's lots of great shared-end point providers.

Starting point is 00:16:21 But there are things that might matter to you, which that shared endpoint providers can't give you. Firstly, you know, you want your own SLAs. So you don't want kind of this noisy neighbors problem that when my if me and me and a lot of apps have two competing apps and the lot of apps gets slammed I don't want my my app to slow down my model calls still need to be fast so in that case you might want might want dedicated compute but there's also like data and privacy stuff maybe you want you know your data running on the same infrastructure data

Starting point is 00:16:52 going through the same infrastructure the other folks infrastructure is going to and honestly maybe even at some scale it's cheaper to run it yourself. And they're the three things. I think then when you go to larger companies, if it comes to no brain, like shared endpoints don't really work for large companies. They're not definitely not going to work for enterprises. And a lot of times you're going to want a, you might want Ms. Rao 7B not coming off the shared endpoint provider, maybe not even on a dedicated endpoint. You might want it self-hosted within your own AWS and GCP. And that's kind of actually where we see a lot of the world going

Starting point is 00:17:26 as well, which is that especially larger customers, they're going to have. have actually pretty good compute deals, and they're going to have their own spent, you know, their own credit system or spend commit with the marketplaces, and actually running it on the infrastructure solves a lot of problems and has a massive cost advantage to them as well. And so I think there's like three stages to it, which is like you started sharing for instance, end point providers. You go to dedicated in the cloud, but I think for some customers, that's not enough either and you want to go into your own cloud. One prediction on the enterprise side would be that, you know, if chat GPT only launched 15, 16 months ago and

Starting point is 00:18:05 GPT4 just came out a year ago, then most enterprises are still in a planning cycle and they haven't really adopted AI at any real scale, which means for infrastructure providers like base 10, it's like a huge opportunity that's about to come, right? Already you're cresting this giant wave, and the wave is about to get 10 times bigger potentially. What do you view as the timeline for really mass scale enterprise adoption of AI? and where do you think things will be in terms of order of magnitude usage overall a year from now, two years from now?

Starting point is 00:18:33 I'm just sort of curious, like, what your view of that future is. I think it's a good question. I think we've been so wrong with time scales here. But I do think, like, what we see it right now is when we go to talk to enterprises, I'd say a lot of them, you know, like, honestly, what we're seeing now

Starting point is 00:18:52 is that, like, co-pilots, especially co-gen stuff, that actually has already made its way in the enterprise. Like most enterprises we talk to, like when you ask them how advanced are you on the AI strategy, they'll tell you, well, everyone uses co-pilot. That's like the first big foray. I think the next piece then is like using like open AI or anthropic in some way.

Starting point is 00:19:15 Well, I think that is going on right now. And I think people are starting to experiment with that. I think like the fear, I would say, Like, someone was just telling me that I think there's Pfizer earmarked, like tens of millions of dollars for ML investment or AI investment over the next 12th to 18 months. That's kind of frightening to me in some degree. Like, I think it's great. It's great for us. You know, I love to hear it as a business builder here.

Starting point is 00:19:41 But it's also like, you know, to me that means that the pressure is actually coming from the above. And, you know, that's kind of like the, I'd say, the ML trap we fell into in 2018 to 2020 when CIOs were buying software. It wasn't really attached to real user value or product value. And so I would actually say that we're probably overestimating how big enterprise will get in the next 12 to 18 months, but we're underestimating where it will be in, you know, three to five years from now. And I think 10x is like a pretty massive underinvestment.

Starting point is 00:20:13 Like, you know, like what, yeah, like I'd say what we see, like we're working with a customer right now that is foreign engineers and has by the end of this year, we'll have, you know, mid hundreds of thousands of dollars annual spent, you know, that's for one use case with sub-thousand users for them. And that's, and they're already cash flow positive as a business, which is like, which is insane to me in general, but like I can't even imagine what these workloads are going to look like when we get to the enterprise. When you start to think about, you know, take a, like, customer service in chatbots is probably like the number one place where people think

Starting point is 00:20:53 about efficiency, the volume that some of these customers, you know, rather my brother's, the head of AI at Sun Run, which is like a public company that does solar panels. And I think that's like another really good example of a company where there's so much opportunity for, so much opportunity for AI just to eat away at processes. And the volume is, you know, just so much higher than what we're thinking from like a traditional business perspective. With harder requirements, which even drive spending at. Yeah, I think one like reset in stance that people should have on spending here is that traditionally, like, you know, people were looking at software companies, you got really concerned as a software investor if your cost of goods sold was affected

Starting point is 00:21:46 by a lot of like data processing, basically. And so, you know, you have this like, expectation that your average SaaS business at maturity might have 80% gross margins. And I think like, you know, now people understand that the training businesses have a big upfront CAPEX investment that, you know, may or may not pay off. But I think one of the things that you're pointing out is that you can actually spend a lot on the inference and the core intelligence and that actually, you know, end up with a very valuable business on the other end with perhaps fewer people. And so I think people have talked about that shape of company, but they don't really think of it as a norm yet. And, you know, I, at least in my portfolio, we are seeing more

Starting point is 00:22:33 efficiency on headcount and like a lot more compute spend. And I know for some of the base 10 customers that compute spend for inference is actually like the, you know, one of the largest items on the P&L. We were working with this customer, and we basically, yeah, and this was like a challenge for us. So we asked for an upfront payment for the year of compute. And, you know, the CTO and came back to them and said, hey, like, I appreciate what you're doing, but no, like this is, you know, after payroll, this is our second biggest expense

Starting point is 00:23:13 for the year. like we're not going to do that i think that's you know somewhat indicative of like how much spend there is here is that you know and it's it's probably also someone indicative of how big you know i personally think the market um can can be once we start seeing mass scale um adoption but i do think you know i think that's a good point um sarah which is that you know i think it's somewhat of a reset in terms of you know i don't know if this looks like normal SaaS business. Actually, I know, sure, it does not. I mean, I think, you know, even like traditional multiples, it's really, really hard to think about. And, you know, what's crazy about it, though,

Starting point is 00:23:55 is I think the most efficient businesses through markups and through software optimization can actually, like, drive pretty healthy margins and still have these really aggressive consumption contract. And I think that's, I don't know, I think that's rare. I can't think of, I feel like you guys see more businesses than I do. And then, but, I hope. But, but, like, you guys will be able to, like, chime in on that more. Like, is that unique to this industry? Where else have you seen that?

Starting point is 00:24:27 It's been a while since I've seen so many companies ramping so quickly. Like, and sometimes they were fake ramps. So, like, you know, in the internet wave of the 90s, it was kind of startup selling to each other and kind of bootstrapping off a venture capital. And then there was giant telecom build out. sound like a five-year cycle that causes huge revenue uplift, and then suddenly there was a glut and things dropped dramatically. Here it feels like things are ramping really fast off of products that are a couple months old, which sometimes suggests that there's not defensibility.

Starting point is 00:25:01 And so then the question is just to become, okay, how do you build defensibility and what does that mean? And how do things get commoditize and do they? And, you know, so there's a couple different markets where suddenly you see three companies all go from zero to five or zero to ten million of revenue in a year. And then you're like, okay, there's three of these companies and they all ramped at the same time, the same amount. And so there's enormous demand, but what does that mean in terms of, do they cannibalize each other?

Starting point is 00:25:26 Can three more entrants come in and do the same thing? Like, what is the basis for competition in that market? And so I think there's a lot of that happening too, which is, at least for me, pretty unexpected. And I think it's just because we have such a big technology capability shift that suddenly you can do things you literally couldn't do a year ago, you know? It's kind of amazing. I think it's particularly exciting when you go and apply that to.

Starting point is 00:25:48 I think, Eli, you've cut your teeth on a bunch of different health care initiatives. Healthcare is like, you know, a really interesting place where, like, you know, like nuance. If you remember nuanced technologies, like, you know, they have the stranglehold over this market fees. And obviously, like it always looked like they were kind of struggling all along the way as well. And then, and then Whisper comes along. And, you know, you see like that market now, like, you know, note-taking for the medical thing. It's insane how fast it's going.

Starting point is 00:26:19 And like there clearly is real value there. And then I think the question actually goes, maybe this does with like a SaaS business again when you're like, okay, what's the workflow? And there's the power and the defensibility of the workflow you're powering. Yeah, it's a really good point. And then the other thing I think that people often forget is that many markets are not monopolies.

Starting point is 00:26:36 Many of them are oligopolis. You know, that's payments with Stripe and ad-Gen and PayPal and all these things, right? And so it's also possible some of these market structures are oligopoly markets, and then it's possible it actually ends up being win or take all. And there's some network effect or data effect. But if you look at some of these types of companies like healthcare to your point is a great example where it's a deal driven, right? You have large deployments with big customers and you lock them in for multi-year deals. And if you're actually able to lock down customer bases and effectively you can fragment it in an oligopoly market more easily than if you have renewables every year, right? So I think also part of it is just like what's the contractual structure?

Starting point is 00:27:12 of a market. And people really don't talk about that kind of stuff, but I think it's really fascinating to think about through the lens of what actually is a sustainable business in each one of these categories. You know, beyond health care, it's like, where are these businesses going to get disrupted? I think like financial services is an obvious one. And they're, funnily enough, I think they've been at the cusp of this stuff in the past. I don't actually think they're on the cusp of it as, as much as, you know, you'd think, like I, you know, If you think about a lot of the big data stuff, like 10 years ago, you know, the hedge funds were all over that, right? They're like, hey, there's alpha.

Starting point is 00:27:47 There's alpha here. And I know that, you know, some of them are starting to look at large models and language models and whatnot. But I do feel like they're actually being a bit lagged in terms of their adoption of these things. And it might actually be because they were so deep in the other sphere, in like the old ammo world, that's hard to kind of really quickly turn things around. It's just such a different capability set that I think like old school machine learning or where you're just effectively doing regressions and just pulling out patterns and data is kind of different from some of the generative stuff in terms of what it does and what it can do for you. And one of the things I've been thinking about recently related to what you just said is what are the companies that just don't care about this. And that would be a very good thing because they're defensible. In the era of AI eating everything like what can't be eaten and therefore maybe those are really good things to get involved with or to work on.

Starting point is 00:28:37 because you're not threatened by a dozen different new startups. That becomes really hard when you start to think about some of the demos that I thought my job was safe until yesterday. Yes. One of my partners asked me how long I thought venture capital was going to last in terms of like an agent-based automation taking over because he was all excited that he got out of software engineering at exactly the right time before his skills became useless.

Starting point is 00:29:07 But I tried to give him a real answer, which is I think on the early stage, at the early stages, a lot of the data doesn't exist, right? Like you'd have to capture real world data. You have increasingly meetings over Zoom, but you would want to capture a lot of information about people. So much of it is access and the information about who is like leaving and like a 100x engineer and entrepreneurial and product oriented and it works with velocity. Like a lot of that is not collected. today, right? So you have this big inputs and there's no digital trail for it. And so you have this big inputs problem. I think the decisioning, like if you think about like what is actually structurally predictable, maybe if you have all that data, the like people are the most

Starting point is 00:29:54 identifiable piece. But the, maybe you can, maybe you have a model that is doing continuous learning and can and learn like metastructures like a lot is talking about like oh this is a market that operates as an oligopoly where these are the core um core drivers of you know differentiation and these the dimensions of competition and such but i think that that feels quite hard when you're investing in a technology landscape that is always changing right um and so like you're kind of always out of distribution and you don't have the data on the people and like i don't know how you make decisions on whether products are any good because you'd have to have all the customer point of view or you'd have to have taste. Maybe models would have taste. I think Sarah is saying that her job

Starting point is 00:30:40 is defensible, which I think is what everybody says. No, no, no, not my job. I am, I'm just saying this morning, I gave it a good thing. And I was like, should we hire people to go work on this? And I was like, nah, that doesn't feel like a tenable problem this year. But I commit to, if it is feasible, Like, we're going to be first. But I just, I feel like it'll be like, you know, another six months or so. So that means it'll be next month. You give them what way. It's actually in about 20 minutes.

Starting point is 00:31:09 It was a company launching. Wait, I can ask you one more question because, like, you know, you're working closely with Nvidia. You work with the hardware providers. People are really interested in this topic now of like, I mean, generally, like, do you believe in hardware heterogeneity, right? There are some strong opinions on this from, you know, data brics and others here. And do you, like, you know, do you still see the same supply demand dynamics around GPU shortage from your customers that you did maybe beginning of last year?

Starting point is 00:31:43 I think the chips had just changed. So, like, I think there is, like, before there was a shortage, it just felt like of everything. Like, if you wanted anything except a T4, you could not get it. That was hard. I think, you know, right now, there's. There's two things we're seeing, is that one, like, most of the, like, it is now possible for us to acquire compute, for us to acquire compute pretty quickly. We have big spend, though, and so that is, like, everything should be kind of conditioned on that, is that we are, you know, we're making long-term commitments with providers, so we're able to, that gives us negotiating power. I think customers are still struggling with availability for the most premium chips.

Starting point is 00:32:29 And I think, you know, whether that's H100 or A100, I think, even when there is availability, you're oftentimes looking for like three to six weeks of negotiating with cloud providers. And then your rep calling in favors in exchange for something or the amount of times with the cloud provided with escalated conversations just to get people moving classes is unripped. And so I think customers are still running into it. I do think it's getting better. And I do think that it will, you know, it seems like it will go away. I think the heterogeneity argument around different things, I mean, it's probably a good thing, right?

Starting point is 00:33:07 Like, if there is more than one provider of chips, but that being said, I personally think that it's pretty overstated how easy it is to run something that looks like Kuda or Kuda in some form on an AMD chip. It seems like a challenge to me. I know a lot of people, there are people who believe that they've got it done. I think the amount of time we spend debugging bad notes we get in a place where that you have a lot of information about existing infrastructure. That's challenging as it is. I can't imagine what they'll be on these ships that are untested. So I think over time, yes, I hope so.

Starting point is 00:33:50 That would be great. I think short term, you know, it's really hard for me to see how we make investments beyond in video especially when there's a customer where there's like customer a crunch on the other side from customers who are like hey we need this now and I don't think I don't think we want to the other thing we don't want to do is that you know what base 10 doesn't give you it doesn't give you raw access to GPU that's conditioned on this inference problem today that you know like you if I gave you a GPU to use on base 10 there's not that much you could do on it except inference just just in terms of the access control um that we give you um

Starting point is 00:34:26 That being said, like, you know, our customers do end up, you know, fiddling with NVIDA drivers. They do end up, you know, like installing new versions of Pytorch and, you know, have like custom Docker images. I mean, I think running those things on things that aren't in video, especially, you know, doing them in an abstracted way. Like, I think it'd be easier if I was building a service on top of these MD chips and saying take the service. But, you know, our customers do interface with those, with the GPS in some. way and I think build an abstracted service where the heterogeneity is like, that just sounds very, very challenging. I'm sure there are people much smarter than me who could figure it out,

Starting point is 00:35:04 but I think for us, like, I think that would just add a lot of complexity and really slow down how fast we can move. But I hope, I think, you know, it is a good world when there is more than one option. How do you see customers thinking about build versus buy and how do you think that's going to be evolving over time? Speed is the only thing that matters in this market. And what what this means is that, you know, if you spend time fiddling around with your infrastructure and your service goes down when you launch it, I think that actually hurts the end user experience a lot. And it's something you just don't want to mess around with. And we see this from customers, you know, even customers like where AI is their core thing is that

Starting point is 00:35:47 they are understanding that what they, what is proprietary to them is models data and work flow, what is repeated for them is infrastructure. And I think, like, the amount of times that we've seen in the last 12 months that we're going to build this ourselves, which is very much like how infrastructure engineers were thinking a decade ago, only to come back three months later. It's like, I have a, you know, Docker dumpsterifier somewhere. It's like, it's our super qualifier. Like, have you built it yourself is how we know someone's going to be a great best in

Starting point is 00:36:22 customer? because they empathize with the pain and they know that this is going to allow them to move a lot of us. We had a company with a full-person AI infrastructure team that'd been building this for two years, migrate all their workloads over the base 10 and 36 hours.

Starting point is 00:36:40 And I think that is a pretty amazing case study for them, which is like, holy crap, we can now take these four engineers and focus on what is actually our competitive differentiated advantage. And the way we think about our business is not, you know, we don't need to scoop everything, something off every single customer either. Like, we offer options where you can run this in your own, in your own

Starting point is 00:37:01 environment and, you know, pay us a license fee. And, you know, I think there, it is very, very cost effective the way that, um, us and honestly other providers are doing this. Like, I think it's crazy, to be honest, to try to build this yourself, especially at the scale that some of these customers are operating at. I was looking at one about, like, the customers that we chat with this morning who was tinkering around and they said they were doing a billion tokens a day. This is a six-person chat bot company that has a billion tokens a day going through them. To build the infrastructure that supports that with the elasticity of reliability and the performance

Starting point is 00:37:41 and then build the product experience around that, that's impossible for a six-person team. And I think you should try to take away things that other people can do just as well. well, if not better. And that's my take. And I think that's kind of what I see the market coming around to as well, which is that speed is a competitive advantage. Let's spend out, like we can spend our way, we can buy that competitive advantage without a long build cycle. This is an awesome conversation. Thanks for doing it, Tuhin. Thanks for joining. Thanks, Sarah. Thanks a lot. Find us on Twitter at No Pryor's Pod. Subscribe to our YouTube channel if you want to see

Starting point is 00:38:17 our faces, follow the show on Apple Podcasts, Spotify, or wherever you listen. That way you get a new episode every week. And sign up for emails or find transcripts for every episode at no-dash priors.com.

No Priors: Artificial Intelligence | Technology | Startups - Speed will win the AI computing battle with Tuhin Srivastava from Baseten

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.