Orchestrate all the Things - SageMaker Serverless Inference illustrates Amazon’s philosophy for ML workloads. Featuring Bratin Saha, AWS VP of Machine Learning

Starting point is 00:00:00 Welcome to the Orchestrate All the Things podcast. I'm George Amadiotis and we'll be connecting the dots together. I hope you will enjoy the podcast. If you like my work, you can follow Link Data Orchestration on Twitter, LinkedIn and Facebook. Amazon just unveiled Serverless Inference, a new option for SageMaker, its fully managed machine learning service. The goal for Amazon SageMaker serverless inference is to serve use cases with intermittent or infrequent traffic patterns, lowering total cost of

Starting point is 00:00:30 ownership and making the service easier to use. We connected with Prathin Saha, AWS VP of Machine Learning, to discuss where Amazon SageMaker serverless fits into the big picture of Amazon's machine learning offering and how it affects ease of use and total cost of ownership, as well as Amazon's philosophy and process in developing its machine learning portfolio. So I'm Prathen Sahar, I'm the VP for all AI and machine learning services, you know, all the three layers of the AI ML stack. And prior to this, I was at NVIDIA, where I was VP of the software. Prior to that, I was at Intel.

Starting point is 00:01:10 When I came to AWS, the initial focus was on the lower layers of the stack, the ML infrastructure and the managed service, which is SageMaker. And SageMaker happened to become one of the fastest growing services in the history of AWS. We have seen a lot of customer uptake for our services. Today, more than 100,000 customers use our AI and ML services,

Starting point is 00:01:40 and most of the machine learning in the Cloud happens on what we build on AWS. So that gives us a really good view into where the industry is going and how customers are using AI and ML to transform their experience. And I'm happy to go into those details if you want. Great. Thank you for the introduction. And actually the occasion for having this conversation is the fact that you're announcing the general availability of new service around SageMaker. So again, as a sort of on ramp to the specifics of the announcement, I wanted to ask you to just give an overview of SageMaker. And obviously, it's come a long way since it was first released in

Starting point is 00:02:26 2017 and as you also alluded to in your introduction it's being used by a great number of people and so let's give the the overview of SageMaker. I know it's a very comprehensive service. It offers many different facets. And let's focus a little bit on the latest announcement that was released on new services on SageMaker, which was in December 2021, in which you also announced the beta of the serverless service that is about to go GA now. So, you know, SageMaker, as you said, has come a long way since when we first launched it. When we launched it, it was really intended for data scientists and ML practitioners. And since then, especially last year, we expanded the audience.

Starting point is 00:03:23 So today it is the most comprehensive service in the cloud for doing machine learning, and it provides you capabilities for the entire machine learning pipeline, starting from data processing to model deployment and model monitoring. And data processing includes both structured data and unstructured data. And so there's a lot of unique capabilities we have in terms of just the data processing. There's structured data, unstructured data. There's a lot of unique capabilities we have in terms of just the data processing, the structured data, unstructured data, there's feature store, which is data store.

Starting point is 00:03:53 Then of course, we have all of the machine learning at scale, like ML training and ML inference. I think one of the things that customers have really liked and have started to standardize on is SageMaker Studio. SageMaker Studio provides a visual UX, a single pane of glass for the entire end-to-end machine learning workflow. It was the industry's first IDE, first machine learning IDE.

Starting point is 00:04:22 What we have seen customers do is they've really ramped up their usage on SageMaker as a result of which it has become one of the fastest growing services in AWS history. Now on the inference side, customers have a variety of needs. Some of them, for example, have low millisecond latencies and they use online inference, which we already support. And then some customers have intermittent workloads, and so they want to be able to say, I just want to have the instance come up when I need it.

Starting point is 00:04:53 I need it for a short duration of time, and then I want the instance to go down. I don't want to have to manage any infrastructure for this. When I need it, the compute has to magically come up, I use it and then it goes down. The capability that we had up until now, which was online inference and batch inference and async inference, those didn't address that particular customer need,

Starting point is 00:05:19 which is done by serverless. That is why we are launching serverless. We have a number of customers, Hugging Face among them, that wants to use it. This gives customers pay as you go. You only use it during the time you need it. You only pay per inference, and there's no infrastructure management you have to do. All of this comes with

Starting point is 00:05:45 all of the machine learning capabilities that SageMaker provides. It's something that I think customers will find a lot of value in. Now, end of last year, we also added a number of features on SageMaker, like SageMaker Canvas, which is meant for data analysts.

Starting point is 00:06:05 It's a no-code way of doing machine learning. This is really taking machine learning to a new audience. Then we also launched SageMaker Studio Lab, which is completely free compute, free storage for students, experimenters to just quickly get started with SageMaker. We continue to make the key capabilities like training, like inference, like notebooks,

Starting point is 00:06:31 SageMaker Studio, all of the tooling. We just continue to keep enhancing it. The team has launched more than 60 significant features in the last year alone. We are very happy to see how customers have driven us to innovate faster and how the team has responded in terms of just raising the bar on how quickly we innovate.

Starting point is 00:06:55 Thank you. I've had the chance to read the preview of the announcement that you're going to make tomorrow. And there were a few things that stood out in this for me. So, I'm taking the opportunity to ask you around those. So, one of those things was, well, which you also touched upon previously was, I was wondering actually about the process through which those new features are prioritized. Well, first of all, collected and then prioritized.

Starting point is 00:07:28 So, and you're probably the best person to answer that. So how does it work? Does your team have a list of candidate features? And then do you go out and seek feedback on that from customers? Or is it more customer driven, like customers coming to you and saying, hey, we need this or that? Or do you meet someone in the middle? It's primarily customer-driven.

Starting point is 00:07:53 It's primarily customer-driven. We spend a lot of time talking to customers, understanding their pain points and how we can help them mitigate those pain points. So the vast majority of our roadmap is customer-driven. Then we look at common patterns and is this a pain point that is being widely felt? Is this something that we think is going to get important and so on?

Starting point is 00:08:17 There is some amount of inventing on behalf of the customer where we anticipate or try to anticipate the needs of customers. But even a lot of that is driven by what customers think they need going down the road. So our roadmap is primarily driven through customer conversations, through customer feedback. And we have a variety of mechanisms for that, including spending a lot of time with customers. And then we just look at them and try to get them out as quickly as we can. Thank you. And the other thing that one other thing I was wondering about is deployment and usage options.

Starting point is 00:09:00 So you already mentioned some of the options that were already there in terms of inference. And as a kind of side note for people potentially listening to the conversation, I should probably explain that, well, inference is a very important part of machine learning. It's not as, well, highlighted or glorified, let's say, as training, which tracks headlights for a number of reasons. But actually, inference dominates the operational cost of running machine learning models in production. And it's also very important to be able to have many options for inference. And you highlight some of those in your introduction. So another thing that I was wondering about is how exactly does deployment on SageMaker works?

Starting point is 00:09:53 And to make it more specific, I read somewhere in the announcement, in the preview of the announcement, that it's basically a bring your own container type of model so people can use different, can have many options. So I was wondering if that means that it's also possible to deploy models that have not necessarily been trained in using SageMaker in this new service and also what kind of formats are supported and whether there are specific requirements for this containerization. Yes, so you know and as we're saying inference is a very important part because ultimately you know when you're doing training you're just building up the model but when you're making predictions that is when you're just building up the model. But when you're making predictions, that is when you're extracting insights from data. And so ultimately, you know, machine learning is really ultimately

Starting point is 00:10:51 about inference because that is when customers are actually able to extract insights from the data. Now, you know, we have, so on the inference side, we have a lot of deployment options. First is we have a menu of features. One is online inference that is usually used when you have low latency requirements and so on. Then we have batch inference where you can take a set of data and just at one shot, kind of just stream through it and make inference of that batch of data. And then you have async inference, asynchronous inference, where the payload is large

Starting point is 00:11:33 and the way it works is you're making, you're doing inference of a large payload of data. It's not a synchronous thing where the client is waiting for the results, but when you're done with the inference, you send a signal back to the client. And then finally is waiting for the results, but when you're done with inference, you send a signal back to the client. Then finally is the serverless one, which is intermittent traffic,

Starting point is 00:11:52 and you just want the instance to come back up when you need it, and shut down when you don't need it. We also have something called inference recommender. Customers say you have a lot of options, purpose built for different features, but how do we choose between them? How do we know which one is going to give me the right performance and so on? For that purpose, we have something called

Starting point is 00:12:20 the inference recommender where customers can come in, they can submit their models, and the inference recommender will customers can come in they can submit their models and the inference recommender will look at the models and say this particular instance or this particular way of doing things is going to provide you the best performance or the lowest cost and so on and that is something that makes it a lot easier for customers to get started and deploy their models. Okay. However, it's still not clear to me whether this is something, well, this inference recommender that you mentioned, whether this is something that people can also use regardless of whether they have used SageMaker to train their model. So can I train my model? Oh yes, yes, yes, yes, absolutely. So one of the things that I think I want to emphasize in general as a design, sort of answer is yes,

Starting point is 00:13:12 you can use inference recommender, even if you have not used SageMaker for training the models. But the wider point that I wanted to emphasize is that we are very big, SageMaker is an end-to-end platform, so it provides all of the features. And we think if you use all of it, customers get a lot of benefit in terms of end-to-end lineage and traceability and just getting a robust platform end-to-end. But if customers want to use a part of it, they are absolutely free to do so. So some customers, for example, use SageMaker only for training, but don't use it for inference because they may have reasons for doing it on-prem, for example.

Starting point is 00:13:54 Some customers use it only for inference, not for training. Some customers only use notebooks. As a design principle, we have always built all our components and all our capabilities in a way such that it's modular and such that you can use just one part of it. You don't have to use all of it. Okay. Yeah. Thanks. Thanks for the clarification. So I think that this new feature that you're about to release primarily touches upon two important areas. One is total cost of ownership, because as you already said, if you have traffic that's intermittent, then it makes sense to use something like serverless because you don't necessarily want to have an instance up and running all the time. So you're not charged and running all the time so

Starting point is 00:14:45 you're not charged for it all the time as well and the other one is is of use really because well you save a lot of effort in terms of setting up and provisioning instances and so on so serverless makes it much easier and in terms of total cost ownership, I tried to look around a little bit and I found something interesting. So I found a comparison that someone from your team presumably has put together in terms of they compared total cost of ownership of SageMaker against other Amazon options basically so EC2 or Amazon Kubernetes and well there were a few interesting findings in that I could summarize them by saying that it turns out that using SageMaker results in having a lower total cost of ownership I didn't see see, however, any similar comparisons to other options, let's say other cloud vendors or other independent vendors and so on. So that

Starting point is 00:15:51 made me wonder about the positioning, let's say, in the target audience for SageMaker. And again, you're probably the best person to answer that. So are you looking with SageMaker to primarily address Amazon users who are not necessarily practicing, let's say, machine learning and are sort of looking around for their options and sort of turning them instead of using vanilla, let's say, instances and doing customizations? Are you looking to turn them to use SageMaker or are you also looking to potentially attract a broader audience because of, well, the scope that you also alluded to and the fact that you're end-to-end? You know, a couple of things. One is we are really focused on our customers, not on our competitors, right? And so that is why, you know, that is why we are really singularly focused on making sure customers get all of the information.

Starting point is 00:16:48 Now, it is true that when you use SageMaker over self-managed options, you get a lower cost of ownership. And you get a lower cost of ownership because you don't have to put in the effort for managing the infrastructure, for building all of the capabilities that we are building. Like, you know, if you're using SageMaker, you get compliance out of the box. You get end-to-end security and encryption. You get, you know, all of your instances are managed. So you don't have to do the work of provisioning and keeping them up to date and all of that, that you would otherwise have to do.

Starting point is 00:17:19 So it turns out that we have done a number of studies and customers are able to save more than 50% on the TCO over a three-year period when they're using SageMaker. Now, you know, our focus really is of the fact that AWS provides the broadest and deepest set of capabilities for machine learning for customers. And I'll give you some of the innovations that we have, like the machine learning IDE, that's the first time it was done. If you look at deep learning performance, TensorFlow, PyTorch, you get the fastest performance on SageMaker across all providers. Then when you look at the comprehensive data processing capabilities for structured data with data wrangler,

Starting point is 00:18:17 for unstructured data with ground truth, for storing data with feature store, as well as some of the other features that we have. You have the most comprehensive data processing capabilities. When you think about training and inference, the scale at which we operate, as I said, the vast majority of machine learning is happening. So the scale at which we operate is

Starting point is 00:18:41 more than that of any other place. So, you know, I think customers should move to the cloud because by now it's been well shown that moving to the cloud gives you a lot more agility, velocity, cost, benefits, and so on. And then AWS provides, you know, the important thing is machine learning is not sitting in isolation. Machine learning is built on a foundation of the compute and the storage and database and all of that. And when you look at all of that, you know, AWS provides you just the best and the most capable cloud services. And then when you look at what SageMaker and our AI services offers,

Starting point is 00:19:25 they offer you 50 percent more TCO advantage over self-managing them on AWS. In terms of our feature roadmap, we provide the deepest and the broadest set of capabilities for machine learning across any provider. I think that's borne out by the fact that we have the largest number of customers. We have a lot of customers who choose SageMaker over

Starting point is 00:19:53 other providers and who are also moving from on-prem to SageMaker. I just wanted to make sure that I communicate that why our focus is not on competitors. Our focus is solely on customers. We do see a lot of customers moving from on-prem or other providers simply because we provide the most capable and the broad feature, the new release that you're about to unveil, have you had the chance to evaluate how this plays into the whole total cost of ownership conversation? So presumably, for the use cases that it applies to, it's going to have an impact.

Starting point is 00:20:46 It's going to lower it even further. Do you have any metrics that you could potentially share? Have you evaluated this at all? You know, for the, you see, our goal is always threefold. One is improve performance, reduce cost and improve the ease of use. We are going to continuously do that every year. And as you rightly pointed out, for the use cases that can be supported by this, this is probably going to reduce the cost by an order of magnitude. We don't have numbers off the bat, and that's because we're just launching GA. And once you

Starting point is 00:21:24 launch GA, only then do you really know, okay, this is the traffic pattern and this is how customers are saving. But I'm very confident that customers are going to see a significant savings on top of the significant savings that they already see. Okay. And then in terms of ease of use, well, this is obviously something very hard to pin down with specific metrics and all that. And I'm presuming that obviously if you haven't had the chance to evaluate total cost of ownership, which is more specific, you obviously didn't have the chance to evaluate ease of use as well. However, just to get a sense, let's say, of how frictionless it can be for people to use. So if people are already SageMaker users and they have been using one of the options existing so far. How easy will it be for them to switch

Starting point is 00:22:27 to the new serverless inference if they judge that this is the option that they prefer? It'll be pretty easy for them. And by the way, they can do the other thing as well, which is they can start with serverless inference and then they can move on to online inference if later on the needs change. So we have built it in a way such that migration is going to be pretty easy. Is it going to require code changes or is it going to be on the configuration level? It's more going to be on the configuration level, but we have different APIs as well. So I think we can often give you those details but you know we actually made it pretty easy for customers to move from one configuration to the

Starting point is 00:23:12 other okay all right and then i guess the last area to uh to touch upon is basically um where do you go from from? I mean you already have quite comprehensive offer and now with the latest feature that you recently announced going GA and you also said in the beginning that the way you move forward is by listening to your customers basically so can you perhaps allude let's say, not to something specific that you'll be working on, but then some areas that you prioritize? I think we are going to continue to improve the performance, ease of use and cost,

Starting point is 00:23:58 but there are also things that we are going to do in terms of making sure we provide more comprehensive data processing capabilities. We we provide more comprehensive data processing capabilities. We already have pretty comprehensive data processing capabilities, but we'll keep on improving them. I think we'll also continue to make it easier to do machine learning automation at scale. ML loves what I call ML industrialization. How do you make machine learning a systematic engineering discipline where pipelines are automated, you include CICD and so on.

Starting point is 00:24:29 And so that's, you know, the broad theme of ML industrialization or MLOps. So that I think is going to continue to remain a big feature for us. And we are going to continue to make the interactive developer experience better through, you know, more notebook features,

Starting point is 00:24:45 more innovations on that side. So I think all of those are going to be important for us. I hope you enjoyed the podcast. If you like my work, you can follow Link Data Orchestration on Twitter, LinkedIn, and Facebook.

Orchestrate all the Things - SageMaker Serverless Inference illustrates Amazon’s philosophy for ML workloads. Featuring Bratin Saha, AWS VP of Machine Learning

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.