Orchestrate all the Things - SambaNova is enabling disruption in the enterprise with AI language models, computer vision, recommendations, and graphs. Featuring CEO Rodrigo Liang
Episode Date: November 15, 2021SambaNova just added another offering under its umbrella of AI-as-a-service portfolio for enterprises: GPT language models. As the company continues to execute on its vision, we caught up with ...CEO Rodrigo Liang to look both at the big picture and under the hood. Article published on ZDNet
Transcript
Discussion (0)
Welcome to the Orchestrate All the Things podcast.
I'm George Amadiotis and we'll be connecting the dots together.
Samba Nova just added another offering under its umbrella of AI as a service portfolio for enterprises,
GPT language models.
As the company continues to execute on its vision,
we caught up with CEO Rodrigo Leang to look both at the big picture and under the hood.
I hope you will enjoy the podcast.
If you like my work, you can follow Link Data Orchestration on Twitter, LinkedIn and Facebook.
So hi, Rodrigo, good to meet you and thanks for making the time for the call today.
I appreciate your agenda is pretty full. So let's jump straight to the
conversation and I'm going to start well since this is the first time that we actually get to
meet. I would like to start by asking you to say a few words on Samba Nova itself like very very
briefly the founder story and, when you get past that
point, what I'd like to focus on is the whole concept of data flow. I'm sure you've shared
that story a number of times. So to give it a little bit of a twist and make it interesting
both for you and my audience who is specifically interested in graph, I would like to ask you to highlight
the graph processing aspects of data flow.
And I'm asking you this question because I read up a little bit on data flow and I noticed
that in other mentions of data flow that you've had, you emphasize the, let's say the compiler is
aspect of it. So I think there must probably be a connection to the graph there.
Of course, yeah. Thanks for having me. This is a pleasure to meet you and chat about our company.
The company is co-founded with two Stanford professors, really thinking about this next generation
of computing. As we all know, this pre-AI to post-AI transition is going to affect all of us,
right? It's going to affect every company in every industry and in different ways,
we haven't even thought about yet. And it's actually in a lot of ways, a existential question for many companies as far as making sure that they can adopt AI
in an efficient way, right?
In an efficient way with the right level of results.
Today, it takes a lot of work for you
to get AI solutions into production, right?
Not creating models, but using AI
as a production level workflow
for your business.
And so that's ultimately what SummonOv is about.
We thought about kind of the,
what are the needs that the businesses have, right?
How do you end up creating a workflow
or a solution that's good enough
or certainly even better than what humans can do?
And how can we replace those very manual workflows
or very error-prone workflows
with something that is a lot more accurate, right?
And so you look at with the state-of-the-art models today,
like GPT-3 and high-resolution computer vision,
I mean, it's getting to this stage
where these automated systems can do as well,
if not better than humans on a number of tasks, right?
But what that requires is requires these large scale models
with a lot of very high performance infrastructure,
and it requires expertise on how these models want to run. And it touches on the graphs that you're talking about,
right, that if you are a,
if you're one of the top 10 companies in the world,
maybe you have thousands of data scientists
and you can devote to it.
But if you are a fortune 5,000 company
and you don't have those,
how do you
build enough expertise to deploy a GPT-3 model, which, I mean, as a model, it's fantastic, right?
I mean, GPT-2 even, it's fantastic. And these language models are getting to the point where
it can do so many different things now. You just need to know how to deploy it, right?
And so what someone over decided to do was come in come in and said, look, there just are not enough
experts in the world to satisfy all the companies that need to have AI. These models are complex by
nature. A simpler version is not going to be good enough to get to production. You need these
state-of-the-art models. You need them in order to replace the workflows that exist today, right?
So why didn't you let Salmanova come and do it for you?
And your expertise focuses on collecting the right data,
collecting, you know, getting the data for your business,
making sure you understand what insights
and what questions you want to ask, right?
But the training inference, you know,
all of the management of the models,
we can do it for you as a subscription
you know and so uh and we've you know have pretty good success with it you know a lot of uh uh very
very expert uh uh organizations have signed on with some in over including the us government
including uh we just made an announcement today with uh uh uh one of the larger banks in Europe subscribing to our GPT service.
And so really excited about this model as a way to accelerate and get our customers
jumping ahead, right?
You don't need to spend two years building up your AI team, building up your infrastructure,
learning the models, doing all of that work.
We'll cut the line and within weeks, not months, weeks,
you're up and running with state-of-the-art GPT.
And so that's really what Samba Nova is focusing on.
Your second question was about graphs.
Is that what it was?
Yes.
Well, specifically on the foundation, let's say, of datalow, which I guess is the core concept around which
Samba Nova is based, so I would like to ask you, since as much as I've been able to read
up on it, it seems like it was sort of built from the ground up in the way that reverses,
let's say, how chips were traditionally designed and built. So I would like you to elaborate a little bit on
on that philosophy and again as far as I've been able to tell it looks like there are some
parallels to how compilers for example work and this is where I see the connection to graph
and graph processing and since this is a topic I have a personal interest in i was wondering if you could uh make the connection basically yeah exactly right and look you know
someone we were software first company so when my co-founder started this uh research and their
professors at stanford just you know i mean just amazing folks uh uh that that uh redefined the way
that uh uh computing works at different layers of the hardware and software stack over
over many decades right and so so they're actually much better at explaining this but i'll do my best
but if you really think about kind of how these neural nets work right what it is is just
interconnection of all these nodes where you're doing actual computation in order for you to figure out,
to see if the output of that one cycle computation
is a better result, a higher light,
a higher accuracy result than your previous cycle.
And you just kind of continue to do those iterations
over and over again, right?
The way that computing happens
for that type of computation today is what people call kernel by kernel, right? You know that computing happens for that type of computation today
is what people call kernel by kernel, right?
You know, you're just looking at what's happening
right in front of you today.
You bring it into your computational engine,
most likely a GPU today, maybe a CPU,
and you actually look at that, compute that,
and you store the results somewhere,
and then you load the next kernel in to try to figure out,
okay, now what are we doing next, right?
And then you say, oh, I need the inputs
of the previous thing, let me bring that in.
Then I do some computation,
and then I store the results of that again.
And then I'll bring the next kernel, figure out,
okay, now what does this thing need?
Well, what happens in that particular mode is,
one, you have to do these computations, and the intermediate results
between these kernels have to get stored somewhere, usually off-chip. And this is why you see this
big boom in HBM, high bandwidth memory, because you're doing a lot of handshakes between the
computational engine and some very intermediate memory, right? It's all scratch. All that data is not actually kept, I mean, this big scheme of
things, not kept in perpetuity, right? It's not something that you need for a long time. It's just
very short amount of time to store in between while your computational engine is starting to
swap out the kernels, right? And then the second thing that you don't have is you actually don't
know which kernel is coming next. As a computational engine, you did your computation and then you send it back and you let the host send you the next computational kernel.
And then you start figuring out, oh, what do I need? Oh, yes.
The previous data was stored here. Let me go get it. Right.
And so it's very hard to plan resources when you don't know what's coming.
Right. We don't know what's coming, right?
When you don't know what's coming and you don't know what all the resources
you might need.
And so one of the beauties of us doing the way that we've done this is,
and started with the compiler stack was the first thing we want to do is say,
look, these neural nets are very predictable, right?
I mean, you know exactly what it, you take a GPT model.
I mean, all the interconnections, you know, way in advance, as big as they are, as big as they are.
You can take a fairly small model like a resonant model or, you know, take something as big as GPT. They're all predictable.
You can see all these interconnections between them way in advance. Right. And so what we want to do with our technology is just say, look, why don't we let,
because the models are getting so big that the human eye and
the human mind were not made to optimize for it, right?
But compilers do a great job at that, right?
And so if you allow the tools to come in and unroll the whole graph, and
just see every layer of the graph, every interconnection that you might need,
where the section cuts are, where all the high critical latency interconnections are,
where the high bandwidth connections are, right?
If you run and roll the whole graph
and you get an entire map of what you need,
then you actually have a chance of figuring out
kind of how to really optimally run this particular graph.
Right? And so then you say, okay, well,
now that I have the software stack
that does this automatically with the human mind is, you know, really, you know, not, not, not
optimum for figuring these, those things, especially the dynamic nature of it, right? As graphs move,
sometimes you're in high bandwidth, sometimes you don't, right? It depends on which part of your
loop you're running, right? And so, but, you know, these, these compilers are fantastic at that.
And that's what we have.
It's something I will call Samba Flow.
But it's tremendous in being able to do that.
Once you have that, your next question is,
well, what hardware substrate can run that the best, right?
Because everything that exists today,
CPUs, GPUs, even FPG,
what they know how to do is one kernel at a time, right?
One little, and then let me feed, and then store in HPM, and let me take the next one.
That's one kernel at a time.
So what we decided to do here is to say, look, actually, what you really need is you need a substrate, a hardware substrate that wants to match to the data path that the graph has already determined
in bandwidth and bus size.
And so what you really need is you want something
that allows you to then take this graph that you enrolled,
take all the bandwidth and latency requirements
that you figured out,
that's optimally to run this particular network.
And then you wanna just map it exactly as this
and to keep the data on chip.
So you feed the outputs of one kernel
straight into the inputs
of the next one without leaving the chip. So all of these bandwidth requirements that you need for
memory and things like that, they get reduced dramatically because you're not storing all this
intermediate data just because you're swapping kernels, right? And so that's really fundamentally
what we're doing with someone else. We're just keeping all of these graphs in interconnections that we already know about
in relation to each other, optimally tied together
so that you can feed the machine
as the graph is moving through
and you can make all the orchestration way in advance.
Right? And you can scale it, many graphs on one chip.
You can put one graph in hundreds of chips, right? Because the compiler
doesn't care. It's just all basically bandwidth and latency that's optimizing around, right?
So that's basically at the core of it. And what you see is some of our most sophisticated customers
in the US government, for example, saying, hey, by turning that on, they're getting 8 to 10x,
sometimes 20x advantage compared to their GPU results
that they've optimized for years.
And that's really the power of a dataflow type of architecture.
Okay, so I guess that also preemptively answers a follow up question that I had.
So another core choice that you seem to have made
is the fact that well you don't ship on board chips that you can that others can integrate in
their existing servers or architectures but you basically ship you either ship the entire
the entire box including network connections and everything or you make that available as a service
and I guess the reason that you're doing that is what you just described,
that you have this very unique architecture that I suppose you would not be able
to work by just integrating in existing servers, let's say.
Well, we could. I mean, I've been asked many times,
will you sell us the chip, right?
But I go back to my initial claim.
The large majority of the world do not have the AI expertise to take chips at this raw level,
you know, the software at the kind of lower level and implement into solutions, right?
And really what we're focused on is getting as many of the Fortune 5000 companies to production and AI solutions as possible versus trying to talk to as many AI developers as possible.
And we do those as well.
The developers love creating new models.
But really, our thesis of the company is saying, look, these models are getting to a point where they're fantastic, like a GPT model, where they're just fantastic.
Really what people need is for us to productize it for them. Pre-train the model,
bring the model to their data, get them into production, allow them to actually run it,
monitor it, maintain it, checkpoint it, all the things that you have in production,
you can just sign up to some of them and we'll take care of all of that.
Right?
And so really that's the form factors we offer really is more a function
of who we're actually trying to cater to
versus the technical constraints of the device.
You know what I mean?
Like for general purpose,
we can download models
from all these different depots
and push of a button because we have this compiler stack, push of a button, you can compile and run and train it and get state-of-the-art results, both in terms of accuracy.
In many cases, we actually set the world record for performance, right?
So we can do all of those things. And yet for a large part of our customer base, that's inventing a new model is not their biggest problem.
Their biggest problem is I want to deploy in production, right?
And so they call us because then we can come in and say, okay, well,
you know, to do a document classification solution for your contracts,
it takes this many.
And so we come in and we just deploy our standard systems with GPT and you
subscribe to it. And the beauty of it is it eliminates this large
expert headcount need for data scientists that most people are having a hard time
hiring for, right? It eliminates this large infrastructure upfront cost that many of them
have to go by because you're just subscribing. So you're actually just paying a monthly fee to
infrastructure that we deploy anywhere you want,
including their own site, right?
And then ultimately, as the model evolves and changes,
you don't need to have the expertise to keep up with it
and say, hey, do I need this new model?
Or should I change that model?
We do it all day long.
This is what we do.
So we will, under the hood,
change the models as appropriate for our customers.
And so it makes it really easy for them to actually say, okay, well, I don't want to be an AI shop. what we're doing so we will under the hood change the models as appropriate for our customers and so
it makes it really easy for them to actually say okay well i don't want to be an ai shop right my
business is x let me let someone over be my ai shop and i can get the benefits of ai without
having to invest so much time money and energy into um into getting the capabilities that I think everybody's going to need.
Yeah, I think what you just described is, well, another way of framing what one of your co-founders, Chris Ray, has termed data-centric AI.
So basically, his position is like, well, OK, all these models are great and everything,
but we've reached the point that they're kind of a commodity anymore. So you should focus on your specific
data. And I guess you're facilitating that by giving people the infrastructure to let
them cater to their specific data for their domain and just like, let you take care of
the rest.
Exactly right. Exactly right. And again, you know, there will always be certain classes
of model where innovation will continue to be there
and allows you to do some new things.
But for large classes of models,
they're starting to get to the point
where they're just fantastic, right?
They do a lot of really good things already
and the new increases in improvements
and accuracy are helpful,
but they're now getting to the point
where they're incremental, right?
And so if you can just take the current existing models
and make it easy for people to deploy, right?
Easy for people to consume,
easy for people to get results quickly, right?
Not months, but days or weeks, you know,
you're up and running, not,
and you don't have to have hundreds of people managing,
because as you know, some of these models, they're so big,
it's like, yeah, a thousand chips aggregated to get to run one model, right? You sneeze
and you get the wrong result, right? And so we try to eliminate all of those things so
that when we deploy it, we deploy a solution that we believe is right,
that we've trained it to be correct, and we maintain it for you. And so
that's really kind of our model here to make sure that you're getting
what everybody else has spent years developing.
You can jump the line and you can get the same, you know, actually can get better because we set the world record on a number of things.
You can get better without having to actually invest all the time and energy and get, you know, capabilities that, you know, other companies have spent years developing.
So one of the iconic models that you referred to and also I think one of your latest announcements
was making GPT available on Samba Nova as a service. And that was new to me because you know last time I checked so before I read up for this discussion
actually the only way I knew of that people could access GPT-3 specifically was well via OpenAI's
API in conjunction with Microsoft so I was wondering about the details basically of what
it is that you license is it like a joint project with OpenAI?
Is it the previous version of GPT?
Or how does it work exactly?
It's a GPT-3 model.
And again, we do a whole class of GPT.
Some people, as you know, GPT-3, as big as it is, as powerful as it is, not everybody
needs the 175 billion GPT-3 model and uh so some people want a 13
billion model some people want to have a perimeter model right so we can range and but the construct
of a gpt model is pretty um uh very similar right and so and it's one of those things that we as we
talk more and more of our customers that um a lot of people really wanted to have access to this
model you know they do believe that this is the model that gives them the maximum flexibility
for the next many, many years.
And it's going to continue to evolve.
But the construct of it is really valuable.
Like I said, the model can do so many different things these days and becomes one of those
assets that every company needs to figure out a way to get access to it because it's
the type of model that, like we said,
is getting to the point where if you have it, it's going to be really great for a large number of
people. You don't really need to invent a lot of new models, right? But the problem was exactly
what you said. The access to it was difficult. So we actually just announced this morning
our first customer on this GPT-DAS, it's it's a large european bank um basically
um what we do here is you you want the access to this type of model we'll deploy our infrastructure
anywhere it's actually our systems right our software stack our people that manage that model
right we'll train it to accuracy we'll actually fine tune it on your data and then we'll bring
it to wherever you want in this particular case it's actually going to be on their own on-prem
on their own data center side this is so useless of it it's not like they're sharing because there's
privacy questions about my data i can't have it you know in different places so we'll put it behind
their firewall completely uh um dedicated to their use case,
and they just subscribe to it.
They pay us by month.
Right, and so it's a type of model
that now suddenly you have your own private access
to a GPT model, as big as it is, right?
A GPT model, you can, all your folks in the company
can use it for whatever it is that you need.
We maintain the model on your behalf, right?
And so, and that's a perfect example of how, you know,
banks are, these are, you know, I mean,
banks and, you know, traditionally fairly sophisticated
and diligent institutions when it comes to technology
are jumping the line and say, hey, I need that.
I'm going to deploy it.
I'm going to deploy it in this way, right?
Because I don't want to go and create it myself.
Some of you come and do it for us and we're going to deploy it everywhere for all of our users in this way, right? Because I don't want to go and create it myself. Some of you come and do it for us,
and we're going to deploy it everywhere for all of our users inside the bank, right?
And so that's kind of what we're doing,
and we're doing it.
We can replicate that recipe over and over again because, again,
we've integrated all of this into a nice compact infrastructure
that allows us to deploy the service in a way that you don't even know
kind of what the hardware needs to look like, right? You know, like a lot of people talk about,
oh, it's all these chips and all these, you know, networking, all this stuff, but we hid it all away,
right? So let's have Lenovo take care of it. We'll deploy it. You just tell me what your SLAs are,
right? How quickly you need these things. We'll size that for you and we'll deploy wherever you
need it, right? And that's kind of. And that's effectively what this particular bank has done with us.
And we're super excited about that collaboration.
Okay. So you already answered the other part I was wondering about.
So basically fine tuning to domain because, well, yes, I mean, GPT-3 in itself is super, super useful and everything, but there's a number of reasons
why people would want to fine tune it to their domain, you know, competitive differentiation,
and they want to have their own data and all of that. But I guess you also do that. So that points
me to the next set of questions really, which is about your business model, because based on what you said, it sounds like part of it, at least,
is based on services.
So I was wondering if I'm right about that.
And then, yeah, how does your,
what's the mix that you play on, basically?
Services in that, I mean,
we think of ourselves as a more flexible platform, right?
Think of services more like, if you're in the world this year, it's more like a Salesforce, right?
And Accenture, I mean, we actually partner with a lot of really great companies that help us with a lot of the customizations that customers need because we don't do that level of work.
We're really more about deploying our platform in a flexible way. If you look at what types of
models, obviously, if you look at our infrastructure, you've looked at data flow
architectures and we can run anything that people want, really. We're a general purpose platform.
We can train, we can infer can run uh uh computer vision models
recommendation models lstm we can run all sorts of different things right and then and and people do
right you know our our us government uh partners uh run some really really sophisticated models
you know we have one that they publish where they map the universe i mean they run all sorts
of different things right and so um but really where you see us focus on the data flow as a service,
we focus on three classes of models, right?
Just three of the hundreds we could, right?
You know, because these are the things
that we've actually determined
that our customer base is looking for us
to deploy in production, right?
So those are natural language, computer vision,
and specifically where you have high resolution computer vision, where you feel like in the enterprise, you need high res.
And today, people don't realize that most of the existing technology is not able to break through into high resolution.
And so we'll talk about that. But then recommendation system, which is going to power our internet economy right and so uh so we decided that those are really the three classes of
models that we will deploy as a service but really what we do is these are pre-determined models we
use size of you know there's some flexibility about kind of how big you run it and how quickly
you know we can give you that little flexibility but it's not really service in in in the in the
traditional world of like we're customizing it, right?
We're not, we bring others in to help us
for things that are outside the knobs
that our platform gives you,
but within the flexibility of platform gives you,
it gives you a lot, right?
You can increase parameter count,
you can change the models
and you can do a bunch of different things.
But that's why we say,
we're going to look at our models more like
how Salesforce kind of sells it for CRM.
So it's a platform for you to run these types of models
and we'll actually maintain it on your behalf, right?
We'll maintain the accuracy on your behalf.
And as you know, as you can train and then you can deploy,
you can do inference on our systems as, you know,
Chris will talk about this as well,
but as you run in production models drift
right and so what you sometimes need is you need a little bit of uh a little bit of retraining
right you know as you're in production and most systems have to go through this big cycle
retraining and requalifying where some anova because we're at one platform where we actually
train and and and we distill to what you know to our own targets that you can actually every
so often retweak it, right? So they maintain an accuracy. And that's a key part of how we maintain
good results for a customer that in the same platform, we can do multi-tenant inference and
have lots and lots of people in production, and then suddenly collapse it, pause that,
collapse to do a retrain on the model just to tweak it back up.
And then we're back to multi-tenancy.
So I think we can do a lot of very flexible things that allow us to keep what we believe are
production level facilities, production level functions that customers expect and need.
Okay, and final one to wrap up, we're almost out of time. So you've already gone a long way. So
you've raised a number of pretty big rounds. You have a really high valuation as well. It sounds
like you're growing your customer base. And so what's your roadmap? So where do you want to be
in like, I don't know, six months or a year from now? Well, I mean, there's a race going on, right?
And people aren't always aware, right, in their own verticals that there's an AI race
going on, right?
And you think about the banks, you think about manufacturing, you think about healthcare,
you think about, you know, all these different sectors, you know, where people are using
AI as an opportunity
to catapult their position within their sector, right?
And, I mean, you've seen this, it's not just Sunwanova,
I mean, just the entire industry of AI,
there's a lot of really disruptive things going on, right?
Of which we play one part of that.
But we look at that, we look at our job
as really trying to enable this disruption, right?
That we can go into these verticals and we aren't necessarily the experts for all of those industries, right?
We partner with a lot of great customers and great partners for those sectors.
But what we can do is we provide you a platform to really disrupt and create new ways to compete in that industry in a way that's pretty disruptive.
I always say it feels like this pre-AI to post-AI transition is going to be as big,
if not bigger than the internet. And there are a lot of signs that tell you that, right? There
are a lot of signs that, look, it's not going to be an incremental change. Entire ways that work
is being done today will disappear, right? Just because the robots can come in and do it better,
they're more efficient and you can kind of remove
entire chunks of work.
And so if people are looking at AI as,
hey, I can tweak it and run a particular thing
10% more efficient, that's thinking too small, right?
Because we are talking with people and they're looking at,
hey, here's 30% of my workflow,
I'm going to just remove all of it.
I'm just going to take all that away
and let the machines take over.
And that's the power of AI.
And so we're super excited in the next five to 10 years
as our partners and our customers and other folks
are getting these solutions to production
and we're enabling them.
It's going to be really exciting because I think, you know,
we think that we have a critical
role to play in enabling that and enabling it in production in a way that people can rely on it.
It's no longer part of AI research or AI labs. This is in production. And that's really ultimately
kind of what we started the company for. How do we get customers into production and creating value for their mission
critical applications? Okay, great. Sounds like you have your work lined up ahead of you. So,
best of luck and I hope we'll be able to catch up sometime soon again. Thank you.
I hope you enjoyed the podcast. If you like my work,
you can follow Link Data Orchestration on Twitter, LinkedIn, and Facebook.