Orchestrate all the Things - SageMaker Serverless Inference illustrates Amazon’s philosophy for ML workloads. Featuring Bratin Saha, AWS VP of Machine Learning
Episode Date: April 21, 2022Amazon just unveiled Serverless Inference, a new option for SageMaker, its fully managed machine learning (ML) service. The goal for Amazon SageMaker Serverless Inference is to serve use cases w...ith intermittent or infrequent traffic patterns, lowering total cost of ownership (TCO) and making the service easier to use. We connected with Bratin Saha, AWS VP of Machine Learning, to discuss where Amazon SageMaker Serverless fits into the big picture of Amazon’s machine learning offering and how it affects ease of use and TCO, as well as Amazon’s philosophy and process in developing its machine learning portfolio. Article published on VentureBeat
Transcript
Discussion (0)
Welcome to the Orchestrate All the Things podcast.
I'm George Amadiotis and we'll be connecting the dots together.
I hope you will enjoy the podcast.
If you like my work, you can follow Link Data Orchestration on Twitter, LinkedIn and Facebook.
Amazon just unveiled Serverless Inference, a new option for SageMaker,
its fully managed machine learning service.
The goal for Amazon SageMaker serverless inference is to
serve use cases with intermittent or infrequent traffic patterns, lowering total cost of
ownership and making the service easier to use. We connected with Prathin Saha, AWS VP of Machine
Learning, to discuss where Amazon SageMaker serverless fits into the big picture of Amazon's
machine learning offering and how it affects ease
of use and total cost of ownership, as well as Amazon's philosophy and process in developing
its machine learning portfolio. So I'm Prathen Sahar, I'm the VP for all AI and machine learning
services, you know, all the three layers of the AI ML stack. And prior to this, I was at NVIDIA,
where I was VP of the software.
Prior to that, I was at Intel.
When I came to AWS,
the initial focus was on the lower layers of the stack,
the ML infrastructure and the managed service,
which is SageMaker.
And SageMaker happened to become one of
the fastest growing services in the history of AWS.
We have seen a lot of customer uptake for our services.
Today, more than 100,000 customers use our AI and ML services,
and most of the machine learning in
the Cloud happens on what we build on AWS.
So that gives us a really good view into where the industry is going and how customers are using AI and ML to transform their experience.
And I'm happy to go into those details if you want.
Great. Thank you for the introduction. And actually the occasion for having this conversation is the fact that
you're announcing the general availability of new service around SageMaker. So again,
as a sort of on ramp to the specifics of the announcement, I wanted to ask you to just
give an overview of SageMaker. And obviously, it's come a long way since it was first released in
2017 and as you also alluded to in your introduction it's being used by a great number of
people and so let's give the the overview of SageMaker. I know it's a very comprehensive service. It offers many different facets.
And let's focus a little bit on the latest announcement that was released on new services
on SageMaker, which was in December 2021, in which you also announced the beta of the serverless service that is about to go GA now.
So, you know, SageMaker, as you said, has come a long way since when we first launched
it.
When we launched it, it was really intended for data scientists and ML practitioners.
And since then, especially last year, we expanded the audience.
So today it is the most comprehensive service in the cloud for doing machine learning, and
it provides you capabilities for the entire machine learning pipeline, starting from data
processing to model deployment and model monitoring.
And data processing includes both structured data and unstructured data.
And so there's a lot of unique capabilities we have in terms of just the data processing. There's structured data, unstructured data. There's a lot of unique capabilities we have in
terms of just the data processing,
the structured data, unstructured data,
there's feature store, which is data store.
Then of course, we have all of the machine learning at scale,
like ML training and ML inference.
I think one of the things that customers have really liked and
have started to standardize
on is SageMaker Studio.
SageMaker Studio provides a visual UX, a single pane of glass for the entire end-to-end machine
learning workflow.
It was the industry's first IDE, first machine learning IDE.
What we have seen customers do is they've really ramped up their usage on SageMaker
as a result of which it has become one of the fastest growing services in AWS history.
Now on the inference side, customers have a variety of needs.
Some of them, for example, have low millisecond latencies
and they use online inference, which we already support.
And then some customers have intermittent workloads,
and so they want to be able to say,
I just want to have the instance come up when I need it.
I need it for a short duration of time,
and then I want the instance to go down.
I don't want to have to manage any infrastructure for this.
When I need it, the compute has to magically come up,
I use it and then it goes down.
The capability that we had up until now,
which was online inference and batch inference and async inference,
those didn't address that particular customer need,
which is done by serverless.
That is why we are launching serverless. We have a number of customers,
Hugging Face among them, that wants to use it.
This gives customers pay as you go.
You only use it during the time you need it.
You only pay per inference,
and there's no infrastructure management you have to do.
All of this comes with
all of the machine learning capabilities
that SageMaker provides.
It's something that I think
customers will find a lot of value in.
Now, end of last year,
we also added a number of features on SageMaker,
like SageMaker Canvas,
which is meant for data analysts.
It's a no-code way of doing machine learning.
This is really taking machine learning to a new audience.
Then we also launched SageMaker Studio Lab,
which is completely free compute,
free storage for students,
experimenters to just quickly get started with SageMaker.
We continue to make the key capabilities like training,
like inference, like notebooks,
SageMaker Studio, all of the tooling.
We just continue to keep enhancing it.
The team has launched more than
60 significant features in the last year alone.
We are very happy to see how customers
have driven us to innovate faster
and how the team has responded in terms of
just raising the bar on how quickly we innovate.
Thank you.
I've had the chance to read the preview
of the announcement that you're going to make tomorrow.
And there were a few things that
stood out in this for me. So, I'm taking the opportunity to ask you around those. So,
one of those things was, well, which you also touched upon previously was, I was wondering
actually about the process through which those new features are prioritized.
Well, first of all, collected and then prioritized.
So, and you're probably the best person to answer that.
So how does it work?
Does your team have a list of candidate features?
And then do you go out and seek feedback on that from customers?
Or is it more customer driven, like customers coming to you and saying,
hey, we need this or that?
Or do you meet someone in the middle?
It's primarily customer-driven.
It's primarily customer-driven.
We spend a lot of time talking to customers,
understanding their pain points
and how we can help them mitigate those pain points.
So the vast majority of our roadmap is customer-driven.
Then we look at common patterns and is this
a pain point that is being widely felt?
Is this something that we think is going to get important and so on?
There is some amount of inventing on behalf of the customer
where we anticipate or try to anticipate the needs of customers.
But even a lot of that is driven by what customers think they need going down the road.
So our roadmap is primarily driven through customer conversations, through customer feedback.
And we have a variety of mechanisms for that, including spending a lot of time with customers.
And then we just look at them and try to get them out as quickly as we can.
Thank you.
And the other thing that one other thing I was wondering about is deployment and usage options.
So you already mentioned some of the options that were already there in terms of
inference. And as a kind of side note for people potentially listening to the conversation,
I should probably explain that, well, inference is a very important part of machine learning.
It's not as, well, highlighted or glorified, let's say, as training, which tracks headlights for a number of reasons.
But actually, inference dominates the operational cost of running machine learning models in production.
And it's also very important to be able to have many options for inference.
And you highlight some of those in your introduction.
So another thing that I was wondering about is how exactly does deployment on SageMaker works?
And to make it more specific, I read somewhere in the announcement, in the preview of the
announcement, that it's basically a bring your own container type of model so people can use
different, can have many options. So I was wondering if that means that it's also possible to deploy
models that have not necessarily been trained in using SageMaker in this new service and also what kind of formats are supported and whether there are
specific requirements for this containerization. Yes, so you know and as we're saying inference is
a very important part because ultimately you know when you're doing training you're just building up
the model but when you're making predictions that is when you're just building up the model. But when you're making predictions, that is when you're
extracting insights from data. And so ultimately, you know, machine learning is really ultimately
about inference because that is when customers are actually able to extract insights from the data.
Now, you know, we have, so on the inference side, we have a lot of deployment options. First is we have a menu of features.
One is online inference that is usually used when you have low latency requirements and so on.
Then we have batch inference where you can take a set of data and just at one shot,
kind of just stream through it and make inference of that batch of data.
And then you have async inference,
asynchronous inference,
where the payload is large
and the way it works is you're making,
you're doing inference of a large payload of data.
It's not a synchronous thing
where the client is waiting for the results,
but when you're done with the inference, you send a signal back to the client. And then finally is waiting for the results, but when you're done with inference,
you send a signal back to the client.
Then finally is the serverless one,
which is intermittent traffic,
and you just want the instance to come back up when you need it,
and shut down when you don't need it.
We also have something called inference recommender.
Customers say you have a lot of options,
purpose built for different features,
but how do we choose between them?
How do we know which one is going to give me the right performance and so on?
For that purpose, we have something called
the inference recommender where customers can come in,
they can submit their models, and the inference recommender will customers can come in they can submit their models
and the inference recommender will look at the models and say this particular instance or this particular way of doing things is going to provide you the best performance or the lowest cost and so
on and that is something that makes it a lot easier for customers to get started and deploy their models.
Okay. However, it's still not clear to me whether this is something, well, this
inference recommender that you mentioned, whether this is something that people can also use regardless of whether they have used SageMaker to train their model. So can I train my model? Oh yes, yes, yes, yes, absolutely.
So one of the things that I think I want to emphasize
in general as a design, sort of answer is yes,
you can use inference recommender,
even if you have not used SageMaker
for training the models.
But the wider point that I wanted to emphasize
is that we are very big, SageMaker is an end-to-end platform, so it provides all of the features.
And we think if you use all of it, customers get a lot of benefit in terms of end-to-end lineage and traceability and just getting a robust platform end-to-end.
But if customers want to use a part of it, they are absolutely free to do so. So some customers, for example, use SageMaker only for training, but don't use it for inference
because they may have reasons for doing it on-prem, for example.
Some customers use it only for inference, not for training.
Some customers only use notebooks.
As a design principle, we have always built all our components and all our
capabilities in a way such that it's modular and such that you can use just one part of it. You
don't have to use all of it. Okay. Yeah. Thanks. Thanks for the clarification. So I think that
this new feature that you're about to release primarily touches upon two important areas.
One is total cost of ownership, because as you already said, if you have traffic that's intermittent, then it makes sense to use something like serverless because you don't necessarily want to have an instance up and running all the time.
So you're not charged and running all the time so
you're not charged for it all the time as well and the other one is is of use really because well
you save a lot of effort in terms of setting up and provisioning instances and so on so serverless
makes it much easier and in terms of total cost ownership, I tried to look around a little bit and I found something interesting.
So I found a comparison that someone from your team presumably has put together in terms of they compared total cost of ownership of SageMaker against other Amazon options basically so EC2 or
Amazon Kubernetes and well there were a few interesting findings in that I could
summarize them by saying that it turns out that using SageMaker results in
having a lower total cost of ownership I didn't see see, however, any similar comparisons to other
options, let's say other cloud vendors or other independent vendors and so on. So that
made me wonder about the positioning, let's say, in the target audience for SageMaker.
And again, you're probably the best person to answer that. So are you looking with SageMaker to primarily address Amazon users who are not
necessarily practicing, let's say, machine learning and are sort of looking around for their options
and sort of turning them instead of using vanilla, let's say, instances and doing customizations?
Are you looking to turn them to use SageMaker or are you also looking to potentially attract a broader audience because of, well, the scope that you also alluded to and the fact that you're end-to-end?
You know, a couple of things.
One is we are really focused on our customers, not on our competitors, right?
And so that is why, you know, that is why we are really singularly focused on making sure customers get all of the information.
Now, it is true that when you use SageMaker over self-managed options, you get a lower cost of ownership.
And you get a lower cost of ownership because you don't have to put in the effort for managing the infrastructure,
for building all of the capabilities that we are building.
Like, you know, if you're using SageMaker, you get compliance out of the box.
You get end-to-end security and encryption.
You get, you know, all of your instances are managed.
So you don't have to do the work of provisioning and keeping them up to date and all of that,
that you would otherwise have to do.
So it turns out that we have done a number of studies and customers are able to save more than 50% on the TCO over a three-year period when they're using SageMaker.
Now, you know, our focus really is of the fact that AWS provides the broadest and
deepest set of capabilities for machine learning for customers. And I'll give you some of the
innovations that we have, like the machine learning IDE, that's the first time it was done.
If you look at deep learning performance, TensorFlow, PyTorch,
you get the fastest performance on SageMaker across all providers.
Then when you look at the comprehensive data processing capabilities
for structured data with data wrangler,
for unstructured data with ground truth,
for storing data with feature store,
as well as some of the other features that we have.
You have the most comprehensive data processing capabilities.
When you think about training and inference,
the scale at which we operate,
as I said, the vast majority of machine learning is happening.
So the scale at which we operate is
more than that of any other place.
So, you know, I think customers should move to the cloud because by now it's been well shown that moving to the cloud
gives you a lot more agility, velocity, cost, benefits, and so on.
And then AWS provides, you know, the important thing is
machine learning is not sitting in isolation.
Machine learning is built on a foundation of the compute and the storage and database and all of
that. And when you look at all of that, you know, AWS provides you just the best and the most
capable cloud services. And then when you look at what SageMaker and our AI services offers,
they offer you 50 percent more TCO advantage
over self-managing them on AWS.
In terms of our feature roadmap,
we provide the deepest and the broadest set of
capabilities for machine learning across any provider.
I think that's borne out by
the fact that we have the largest number of customers.
We have a lot of customers who choose SageMaker over
other providers and who are also
moving from on-prem to SageMaker.
I just wanted to make sure that I communicate that
why our focus is not on competitors.
Our focus is solely on customers.
We do see a lot of customers moving from on-prem or other providers simply because we provide the most capable and the broad feature, the new release that you're about to unveil,
have you had the chance to evaluate how this plays into the whole total cost of ownership conversation?
So presumably, for the use cases that it applies to, it's going to have an impact.
It's going to lower it even further.
Do you have any metrics that you could potentially share?
Have you evaluated this at all?
You know, for the, you see, our goal is always threefold.
One is improve performance, reduce cost and improve the ease of use. We
are going to continuously do that every year. And as you rightly pointed out, for the use cases that
can be supported by this, this is probably going to reduce the cost by an order of magnitude.
We don't have numbers off the bat, and that's because we're just launching GA. And once you
launch GA, only then do
you really know, okay, this is the traffic pattern and this is how customers are saving. But I'm very
confident that customers are going to see a significant savings on top of the significant
savings that they already see. Okay. And then in terms of ease of use, well, this is obviously something very hard to pin down with specific metrics and all that.
And I'm presuming that obviously if you haven't had the chance to evaluate total cost of ownership, which is more specific,
you obviously didn't have the chance to evaluate ease of use as well.
However, just to get a sense, let's say, of how frictionless it can be for people to use.
So if people are already SageMaker users and they have been using one of the options existing so far. How easy will it be for them to switch
to the new serverless inference if they judge that this is the option that they prefer?
It'll be pretty easy for them. And by the way, they can do the other thing as well,
which is they can start with serverless inference and then they can move on to online inference if
later on the needs change. So we have built it in a way such that migration is going to be pretty easy.
Is it going to require code changes or is it going to be on the configuration level?
It's more going to be on the configuration level, but we have different APIs as well.
So I think we can often give you those details
but you know we actually made it pretty easy for customers to move from one configuration to the
other okay all right and then i guess the last area to uh to touch upon is basically um where
do you go from from? I mean you already have
quite comprehensive offer and now with the latest feature that you recently
announced going GA and you also said in the beginning that the way you
move forward is by listening to your customers basically so can you perhaps
allude let's say, not to something specific
that you'll be working on, but then some areas that you prioritize?
I think we are going to continue to improve the performance, ease of use and cost,
but there are also things that we are going to do in terms of making sure we provide more
comprehensive data processing capabilities. We we provide more comprehensive data processing capabilities.
We already have pretty comprehensive data processing capabilities, but we'll keep on
improving them. I think we'll also continue to make it easier to do machine learning automation
at scale. ML loves what I call ML industrialization. How do you make machine learning
a systematic engineering discipline
where pipelines are automated,
you include CICD and so on.
And so that's, you know,
the broad theme of ML industrialization
or MLOps.
So that I think is going to continue
to remain a big feature for us.
And we are going to continue
to make the interactive developer experience better
through, you know, more notebook features,
more innovations on that side.
So I think all of those are going to be important for us.
I hope you enjoyed the podcast.
If you like my work, you can follow Link Data Orchestration on Twitter, LinkedIn, and Facebook.