Orchestrate all the Things - SambaNova is enabling disruption in the enterprise with AI language models, computer vision, recommendations, and graphs. Featuring CEO Rodrigo Liang

Starting point is 00:00:00 Welcome to the Orchestrate All the Things podcast. I'm George Amadiotis and we'll be connecting the dots together. Samba Nova just added another offering under its umbrella of AI as a service portfolio for enterprises, GPT language models. As the company continues to execute on its vision, we caught up with CEO Rodrigo Leang to look both at the big picture and under the hood. I hope you will enjoy the podcast. If you like my work, you can follow Link Data Orchestration on Twitter, LinkedIn and Facebook.

Starting point is 00:00:35 So hi, Rodrigo, good to meet you and thanks for making the time for the call today. I appreciate your agenda is pretty full. So let's jump straight to the conversation and I'm going to start well since this is the first time that we actually get to meet. I would like to start by asking you to say a few words on Samba Nova itself like very very briefly the founder story and, when you get past that point, what I'd like to focus on is the whole concept of data flow. I'm sure you've shared that story a number of times. So to give it a little bit of a twist and make it interesting both for you and my audience who is specifically interested in graph, I would like to ask you to highlight

Starting point is 00:01:28 the graph processing aspects of data flow. And I'm asking you this question because I read up a little bit on data flow and I noticed that in other mentions of data flow that you've had, you emphasize the, let's say the compiler is aspect of it. So I think there must probably be a connection to the graph there. Of course, yeah. Thanks for having me. This is a pleasure to meet you and chat about our company. The company is co-founded with two Stanford professors, really thinking about this next generation of computing. As we all know, this pre-AI to post-AI transition is going to affect all of us, right? It's going to affect every company in every industry and in different ways,

Starting point is 00:02:17 we haven't even thought about yet. And it's actually in a lot of ways, a existential question for many companies as far as making sure that they can adopt AI in an efficient way, right? In an efficient way with the right level of results. Today, it takes a lot of work for you to get AI solutions into production, right? Not creating models, but using AI as a production level workflow for your business.

Starting point is 00:02:47 And so that's ultimately what SummonOv is about. We thought about kind of the, what are the needs that the businesses have, right? How do you end up creating a workflow or a solution that's good enough or certainly even better than what humans can do? And how can we replace those very manual workflows or very error-prone workflows

Starting point is 00:03:09 with something that is a lot more accurate, right? And so you look at with the state-of-the-art models today, like GPT-3 and high-resolution computer vision, I mean, it's getting to this stage where these automated systems can do as well, if not better than humans on a number of tasks, right? But what that requires is requires these large scale models with a lot of very high performance infrastructure,

Starting point is 00:03:48 and it requires expertise on how these models want to run. And it touches on the graphs that you're talking about, right, that if you are a, if you're one of the top 10 companies in the world, maybe you have thousands of data scientists and you can devote to it. But if you are a fortune 5,000 company and you don't have those, how do you

Starting point is 00:04:05 build enough expertise to deploy a GPT-3 model, which, I mean, as a model, it's fantastic, right? I mean, GPT-2 even, it's fantastic. And these language models are getting to the point where it can do so many different things now. You just need to know how to deploy it, right? And so what someone over decided to do was come in come in and said, look, there just are not enough experts in the world to satisfy all the companies that need to have AI. These models are complex by nature. A simpler version is not going to be good enough to get to production. You need these state-of-the-art models. You need them in order to replace the workflows that exist today, right? So why didn't you let Salmanova come and do it for you?

Starting point is 00:04:48 And your expertise focuses on collecting the right data, collecting, you know, getting the data for your business, making sure you understand what insights and what questions you want to ask, right? But the training inference, you know, all of the management of the models, we can do it for you as a subscription you know and so uh and we've you know have pretty good success with it you know a lot of uh uh very

Starting point is 00:05:11 very expert uh uh organizations have signed on with some in over including the us government including uh we just made an announcement today with uh uh uh one of the larger banks in Europe subscribing to our GPT service. And so really excited about this model as a way to accelerate and get our customers jumping ahead, right? You don't need to spend two years building up your AI team, building up your infrastructure, learning the models, doing all of that work. We'll cut the line and within weeks, not months, weeks, you're up and running with state-of-the-art GPT.

Starting point is 00:05:48 And so that's really what Samba Nova is focusing on. Your second question was about graphs. Is that what it was? Yes. Well, specifically on the foundation, let's say, of datalow, which I guess is the core concept around which Samba Nova is based, so I would like to ask you, since as much as I've been able to read up on it, it seems like it was sort of built from the ground up in the way that reverses, let's say, how chips were traditionally designed and built. So I would like you to elaborate a little bit on

Starting point is 00:06:26 on that philosophy and again as far as I've been able to tell it looks like there are some parallels to how compilers for example work and this is where I see the connection to graph and graph processing and since this is a topic I have a personal interest in i was wondering if you could uh make the connection basically yeah exactly right and look you know someone we were software first company so when my co-founder started this uh research and their professors at stanford just you know i mean just amazing folks uh uh that that uh redefined the way that uh uh computing works at different layers of the hardware and software stack over over many decades right and so so they're actually much better at explaining this but i'll do my best but if you really think about kind of how these neural nets work right what it is is just

Starting point is 00:07:20 interconnection of all these nodes where you're doing actual computation in order for you to figure out, to see if the output of that one cycle computation is a better result, a higher light, a higher accuracy result than your previous cycle. And you just kind of continue to do those iterations over and over again, right? The way that computing happens for that type of computation today is what people call kernel by kernel, right? You know that computing happens for that type of computation today

Starting point is 00:07:45 is what people call kernel by kernel, right? You know, you're just looking at what's happening right in front of you today. You bring it into your computational engine, most likely a GPU today, maybe a CPU, and you actually look at that, compute that, and you store the results somewhere, and then you load the next kernel in to try to figure out,

Starting point is 00:08:05 okay, now what are we doing next, right? And then you say, oh, I need the inputs of the previous thing, let me bring that in. Then I do some computation, and then I store the results of that again. And then I'll bring the next kernel, figure out, okay, now what does this thing need? Well, what happens in that particular mode is,

Starting point is 00:08:21 one, you have to do these computations, and the intermediate results between these kernels have to get stored somewhere, usually off-chip. And this is why you see this big boom in HBM, high bandwidth memory, because you're doing a lot of handshakes between the computational engine and some very intermediate memory, right? It's all scratch. All that data is not actually kept, I mean, this big scheme of things, not kept in perpetuity, right? It's not something that you need for a long time. It's just very short amount of time to store in between while your computational engine is starting to swap out the kernels, right? And then the second thing that you don't have is you actually don't know which kernel is coming next. As a computational engine, you did your computation and then you send it back and you let the host send you the next computational kernel.

Starting point is 00:09:12 And then you start figuring out, oh, what do I need? Oh, yes. The previous data was stored here. Let me go get it. Right. And so it's very hard to plan resources when you don't know what's coming. Right. We don't know what's coming, right? When you don't know what's coming and you don't know what all the resources you might need. And so one of the beauties of us doing the way that we've done this is, and started with the compiler stack was the first thing we want to do is say,

Starting point is 00:09:36 look, these neural nets are very predictable, right? I mean, you know exactly what it, you take a GPT model. I mean, all the interconnections, you know, way in advance, as big as they are, as big as they are. You can take a fairly small model like a resonant model or, you know, take something as big as GPT. They're all predictable. You can see all these interconnections between them way in advance. Right. And so what we want to do with our technology is just say, look, why don't we let, because the models are getting so big that the human eye and the human mind were not made to optimize for it, right? But compilers do a great job at that, right?

Starting point is 00:10:15 And so if you allow the tools to come in and unroll the whole graph, and just see every layer of the graph, every interconnection that you might need, where the section cuts are, where all the high critical latency interconnections are, where the high bandwidth connections are, right? If you run and roll the whole graph and you get an entire map of what you need, then you actually have a chance of figuring out kind of how to really optimally run this particular graph.

Starting point is 00:10:41 Right? And so then you say, okay, well, now that I have the software stack that does this automatically with the human mind is, you know, really, you know, not, not, not optimum for figuring these, those things, especially the dynamic nature of it, right? As graphs move, sometimes you're in high bandwidth, sometimes you don't, right? It depends on which part of your loop you're running, right? And so, but, you know, these, these compilers are fantastic at that. And that's what we have. It's something I will call Samba Flow.

Starting point is 00:11:06 But it's tremendous in being able to do that. Once you have that, your next question is, well, what hardware substrate can run that the best, right? Because everything that exists today, CPUs, GPUs, even FPG, what they know how to do is one kernel at a time, right? One little, and then let me feed, and then store in HPM, and let me take the next one. That's one kernel at a time.

Starting point is 00:11:31 So what we decided to do here is to say, look, actually, what you really need is you need a substrate, a hardware substrate that wants to match to the data path that the graph has already determined in bandwidth and bus size. And so what you really need is you want something that allows you to then take this graph that you enrolled, take all the bandwidth and latency requirements that you figured out, that's optimally to run this particular network. And then you wanna just map it exactly as this

Starting point is 00:12:01 and to keep the data on chip. So you feed the outputs of one kernel straight into the inputs of the next one without leaving the chip. So all of these bandwidth requirements that you need for memory and things like that, they get reduced dramatically because you're not storing all this intermediate data just because you're swapping kernels, right? And so that's really fundamentally what we're doing with someone else. We're just keeping all of these graphs in interconnections that we already know about in relation to each other, optimally tied together

Starting point is 00:12:31 so that you can feed the machine as the graph is moving through and you can make all the orchestration way in advance. Right? And you can scale it, many graphs on one chip. You can put one graph in hundreds of chips, right? Because the compiler doesn't care. It's just all basically bandwidth and latency that's optimizing around, right? So that's basically at the core of it. And what you see is some of our most sophisticated customers in the US government, for example, saying, hey, by turning that on, they're getting 8 to 10x,

Starting point is 00:13:01 sometimes 20x advantage compared to their GPU results that they've optimized for years. And that's really the power of a dataflow type of architecture. Okay, so I guess that also preemptively answers a follow up question that I had. So another core choice that you seem to have made is the fact that well you don't ship on board chips that you can that others can integrate in their existing servers or architectures but you basically ship you either ship the entire the entire box including network connections and everything or you make that available as a service

Starting point is 00:13:42 and I guess the reason that you're doing that is what you just described, that you have this very unique architecture that I suppose you would not be able to work by just integrating in existing servers, let's say. Well, we could. I mean, I've been asked many times, will you sell us the chip, right? But I go back to my initial claim. The large majority of the world do not have the AI expertise to take chips at this raw level, you know, the software at the kind of lower level and implement into solutions, right?

Starting point is 00:14:20 And really what we're focused on is getting as many of the Fortune 5000 companies to production and AI solutions as possible versus trying to talk to as many AI developers as possible. And we do those as well. The developers love creating new models. But really, our thesis of the company is saying, look, these models are getting to a point where they're fantastic, like a GPT model, where they're just fantastic. Really what people need is for us to productize it for them. Pre-train the model, bring the model to their data, get them into production, allow them to actually run it, monitor it, maintain it, checkpoint it, all the things that you have in production, you can just sign up to some of them and we'll take care of all of that.

Starting point is 00:15:06 Right? And so really that's the form factors we offer really is more a function of who we're actually trying to cater to versus the technical constraints of the device. You know what I mean? Like for general purpose, we can download models from all these different depots

Starting point is 00:15:24 and push of a button because we have this compiler stack, push of a button, you can compile and run and train it and get state-of-the-art results, both in terms of accuracy. In many cases, we actually set the world record for performance, right? So we can do all of those things. And yet for a large part of our customer base, that's inventing a new model is not their biggest problem. Their biggest problem is I want to deploy in production, right? And so they call us because then we can come in and say, okay, well, you know, to do a document classification solution for your contracts, it takes this many. And so we come in and we just deploy our standard systems with GPT and you

Starting point is 00:16:03 subscribe to it. And the beauty of it is it eliminates this large expert headcount need for data scientists that most people are having a hard time hiring for, right? It eliminates this large infrastructure upfront cost that many of them have to go by because you're just subscribing. So you're actually just paying a monthly fee to infrastructure that we deploy anywhere you want, including their own site, right? And then ultimately, as the model evolves and changes, you don't need to have the expertise to keep up with it

Starting point is 00:16:34 and say, hey, do I need this new model? Or should I change that model? We do it all day long. This is what we do. So we will, under the hood, change the models as appropriate for our customers. And so it makes it really easy for them to actually say, okay, well, I don't want to be an AI shop. what we're doing so we will under the hood change the models as appropriate for our customers and so it makes it really easy for them to actually say okay well i don't want to be an ai shop right my

Starting point is 00:16:50 business is x let me let someone over be my ai shop and i can get the benefits of ai without having to invest so much time money and energy into um into getting the capabilities that I think everybody's going to need. Yeah, I think what you just described is, well, another way of framing what one of your co-founders, Chris Ray, has termed data-centric AI. So basically, his position is like, well, OK, all these models are great and everything, but we've reached the point that they're kind of a commodity anymore. So you should focus on your specific data. And I guess you're facilitating that by giving people the infrastructure to let them cater to their specific data for their domain and just like, let you take care of the rest.

Starting point is 00:17:39 Exactly right. Exactly right. And again, you know, there will always be certain classes of model where innovation will continue to be there and allows you to do some new things. But for large classes of models, they're starting to get to the point where they're just fantastic, right? They do a lot of really good things already and the new increases in improvements

Starting point is 00:17:58 and accuracy are helpful, but they're now getting to the point where they're incremental, right? And so if you can just take the current existing models and make it easy for people to deploy, right? Easy for people to consume, easy for people to get results quickly, right? Not months, but days or weeks, you know,

Starting point is 00:18:16 you're up and running, not, and you don't have to have hundreds of people managing, because as you know, some of these models, they're so big, it's like, yeah, a thousand chips aggregated to get to run one model, right? You sneeze and you get the wrong result, right? And so we try to eliminate all of those things so that when we deploy it, we deploy a solution that we believe is right, that we've trained it to be correct, and we maintain it for you. And so that's really kind of our model here to make sure that you're getting

Starting point is 00:18:44 what everybody else has spent years developing. You can jump the line and you can get the same, you know, actually can get better because we set the world record on a number of things. You can get better without having to actually invest all the time and energy and get, you know, capabilities that, you know, other companies have spent years developing. So one of the iconic models that you referred to and also I think one of your latest announcements was making GPT available on Samba Nova as a service. And that was new to me because you know last time I checked so before I read up for this discussion actually the only way I knew of that people could access GPT-3 specifically was well via OpenAI's API in conjunction with Microsoft so I was wondering about the details basically of what it is that you license is it like a joint project with OpenAI?

Starting point is 00:19:47 Is it the previous version of GPT? Or how does it work exactly? It's a GPT-3 model. And again, we do a whole class of GPT. Some people, as you know, GPT-3, as big as it is, as powerful as it is, not everybody needs the 175 billion GPT-3 model and uh so some people want a 13 billion model some people want to have a perimeter model right so we can range and but the construct of a gpt model is pretty um uh very similar right and so and it's one of those things that we as we

Starting point is 00:20:19 talk more and more of our customers that um a lot of people really wanted to have access to this model you know they do believe that this is the model that gives them the maximum flexibility for the next many, many years. And it's going to continue to evolve. But the construct of it is really valuable. Like I said, the model can do so many different things these days and becomes one of those assets that every company needs to figure out a way to get access to it because it's the type of model that, like we said,

Starting point is 00:20:45 is getting to the point where if you have it, it's going to be really great for a large number of people. You don't really need to invent a lot of new models, right? But the problem was exactly what you said. The access to it was difficult. So we actually just announced this morning our first customer on this GPT-DAS, it's it's a large european bank um basically um what we do here is you you want the access to this type of model we'll deploy our infrastructure anywhere it's actually our systems right our software stack our people that manage that model right we'll train it to accuracy we'll actually fine tune it on your data and then we'll bring it to wherever you want in this particular case it's actually going to be on their own on-prem

Starting point is 00:21:28 on their own data center side this is so useless of it it's not like they're sharing because there's privacy questions about my data i can't have it you know in different places so we'll put it behind their firewall completely uh um dedicated to their use case, and they just subscribe to it. They pay us by month. Right, and so it's a type of model that now suddenly you have your own private access to a GPT model, as big as it is, right?

Starting point is 00:21:56 A GPT model, you can, all your folks in the company can use it for whatever it is that you need. We maintain the model on your behalf, right? And so, and that's a perfect example of how, you know, banks are, these are, you know, I mean, banks and, you know, traditionally fairly sophisticated and diligent institutions when it comes to technology are jumping the line and say, hey, I need that.

Starting point is 00:22:19 I'm going to deploy it. I'm going to deploy it in this way, right? Because I don't want to go and create it myself. Some of you come and do it for us and we're going to deploy it everywhere for all of our users in this way, right? Because I don't want to go and create it myself. Some of you come and do it for us, and we're going to deploy it everywhere for all of our users inside the bank, right? And so that's kind of what we're doing, and we're doing it. We can replicate that recipe over and over again because, again,

Starting point is 00:22:35 we've integrated all of this into a nice compact infrastructure that allows us to deploy the service in a way that you don't even know kind of what the hardware needs to look like, right? You know, like a lot of people talk about, oh, it's all these chips and all these, you know, networking, all this stuff, but we hid it all away, right? So let's have Lenovo take care of it. We'll deploy it. You just tell me what your SLAs are, right? How quickly you need these things. We'll size that for you and we'll deploy wherever you need it, right? And that's kind of. And that's effectively what this particular bank has done with us. And we're super excited about that collaboration.

Starting point is 00:23:12 Okay. So you already answered the other part I was wondering about. So basically fine tuning to domain because, well, yes, I mean, GPT-3 in itself is super, super useful and everything, but there's a number of reasons why people would want to fine tune it to their domain, you know, competitive differentiation, and they want to have their own data and all of that. But I guess you also do that. So that points me to the next set of questions really, which is about your business model, because based on what you said, it sounds like part of it, at least, is based on services. So I was wondering if I'm right about that. And then, yeah, how does your,

Starting point is 00:23:55 what's the mix that you play on, basically? Services in that, I mean, we think of ourselves as a more flexible platform, right? Think of services more like, if you're in the world this year, it's more like a Salesforce, right? And Accenture, I mean, we actually partner with a lot of really great companies that help us with a lot of the customizations that customers need because we don't do that level of work. We're really more about deploying our platform in a flexible way. If you look at what types of models, obviously, if you look at our infrastructure, you've looked at data flow architectures and we can run anything that people want, really. We're a general purpose platform.

Starting point is 00:24:41 We can train, we can infer can run uh uh computer vision models recommendation models lstm we can run all sorts of different things right and then and and people do right you know our our us government uh partners uh run some really really sophisticated models you know we have one that they publish where they map the universe i mean they run all sorts of different things right and so um but really where you see us focus on the data flow as a service, we focus on three classes of models, right? Just three of the hundreds we could, right? You know, because these are the things

Starting point is 00:25:13 that we've actually determined that our customer base is looking for us to deploy in production, right? So those are natural language, computer vision, and specifically where you have high resolution computer vision, where you feel like in the enterprise, you need high res. And today, people don't realize that most of the existing technology is not able to break through into high resolution. And so we'll talk about that. But then recommendation system, which is going to power our internet economy right and so uh so we decided that those are really the three classes of models that we will deploy as a service but really what we do is these are pre-determined models we

Starting point is 00:25:53 use size of you know there's some flexibility about kind of how big you run it and how quickly you know we can give you that little flexibility but it's not really service in in in the in the traditional world of like we're customizing it, right? We're not, we bring others in to help us for things that are outside the knobs that our platform gives you, but within the flexibility of platform gives you, it gives you a lot, right?

Starting point is 00:26:17 You can increase parameter count, you can change the models and you can do a bunch of different things. But that's why we say, we're going to look at our models more like how Salesforce kind of sells it for CRM. So it's a platform for you to run these types of models and we'll actually maintain it on your behalf, right?

Starting point is 00:26:32 We'll maintain the accuracy on your behalf. And as you know, as you can train and then you can deploy, you can do inference on our systems as, you know, Chris will talk about this as well, but as you run in production models drift right and so what you sometimes need is you need a little bit of uh a little bit of retraining right you know as you're in production and most systems have to go through this big cycle retraining and requalifying where some anova because we're at one platform where we actually

Starting point is 00:27:01 train and and and we distill to what you know to our own targets that you can actually every so often retweak it, right? So they maintain an accuracy. And that's a key part of how we maintain good results for a customer that in the same platform, we can do multi-tenant inference and have lots and lots of people in production, and then suddenly collapse it, pause that, collapse to do a retrain on the model just to tweak it back up. And then we're back to multi-tenancy. So I think we can do a lot of very flexible things that allow us to keep what we believe are production level facilities, production level functions that customers expect and need.

Starting point is 00:27:41 Okay, and final one to wrap up, we're almost out of time. So you've already gone a long way. So you've raised a number of pretty big rounds. You have a really high valuation as well. It sounds like you're growing your customer base. And so what's your roadmap? So where do you want to be in like, I don't know, six months or a year from now? Well, I mean, there's a race going on, right? And people aren't always aware, right, in their own verticals that there's an AI race going on, right? And you think about the banks, you think about manufacturing, you think about healthcare, you think about, you know, all these different sectors, you know, where people are using

Starting point is 00:28:23 AI as an opportunity to catapult their position within their sector, right? And, I mean, you've seen this, it's not just Sunwanova, I mean, just the entire industry of AI, there's a lot of really disruptive things going on, right? Of which we play one part of that. But we look at that, we look at our job as really trying to enable this disruption, right?

Starting point is 00:28:45 That we can go into these verticals and we aren't necessarily the experts for all of those industries, right? We partner with a lot of great customers and great partners for those sectors. But what we can do is we provide you a platform to really disrupt and create new ways to compete in that industry in a way that's pretty disruptive. I always say it feels like this pre-AI to post-AI transition is going to be as big, if not bigger than the internet. And there are a lot of signs that tell you that, right? There are a lot of signs that, look, it's not going to be an incremental change. Entire ways that work is being done today will disappear, right? Just because the robots can come in and do it better, they're more efficient and you can kind of remove

Starting point is 00:29:28 entire chunks of work. And so if people are looking at AI as, hey, I can tweak it and run a particular thing 10% more efficient, that's thinking too small, right? Because we are talking with people and they're looking at, hey, here's 30% of my workflow, I'm going to just remove all of it. I'm just going to take all that away

Starting point is 00:29:48 and let the machines take over. And that's the power of AI. And so we're super excited in the next five to 10 years as our partners and our customers and other folks are getting these solutions to production and we're enabling them. It's going to be really exciting because I think, you know, we think that we have a critical

Starting point is 00:30:05 role to play in enabling that and enabling it in production in a way that people can rely on it. It's no longer part of AI research or AI labs. This is in production. And that's really ultimately kind of what we started the company for. How do we get customers into production and creating value for their mission critical applications? Okay, great. Sounds like you have your work lined up ahead of you. So, best of luck and I hope we'll be able to catch up sometime soon again. Thank you. I hope you enjoyed the podcast. If you like my work, you can follow Link Data Orchestration on Twitter, LinkedIn, and Facebook.

CODACE Plant Stand

Orchestrate all the Things - SambaNova is enabling disruption in the enterprise with AI language models, computer vision, recommendations, and graphs. Featuring CEO Rodrigo Liang

SambaNova just added another offering under its umbrella of AI-as-a-service portfolio for enterprises: GPT language models. As the company continues to execute on its vision, we caught up with ...CEO Rodrigo Liang to look both at the big picture and under the hood. Article published on ZDNet

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.

Your Ad Here

CODACE Plant Stand

Orchestrate all the Things - SambaNova is enabling disruption in the enterprise with AI language models, computer vision, recommendations, and graphs. Featuring CEO Rodrigo Liang

SambaNova just added another offering under its umbrella of AI-as-a-service portfolio for enterprises: GPT language models. As the company continues to execute on its vision, we caught up with ...CEO Rodrigo Liang to look both at the big picture and under the hood. Article published on ZDNet

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.