Grey Beards on Systems - 161: Greybeards talk AWS S3 storage with Andy Warfield, VP Distinguished Engineer, Amazon

Starting point is 00:00:00 Hey everybody, Ray Lucchese here with Keith Townsend. Welcome to another sponsored episode of the Greybeards on Storage podcast, a show where we get Greybeards bloggers together with storage assistant vendors to discuss upcoming products, technologies, and trends affecting the data center today. We have with us here today Andy Warfield, an old friend and VP Distinguished Engineer at Amazon. So Andy, why don't you tell us a little bit about yourself and what you've been up to with S3 at Amazon. All right. Well, hey, Ray and Keith. It's awesome to be on the show. I'm an engineer at Amazon. I work across all of our storage products. I've been at Amazon for about six years, a little over six years. And I joined actually from a storage startup called Coho Data. I was a professor at the University of British Columbia in Vancouver before it as well. And I was on the Greybeards as part of that startup, I guess, probably.

Starting point is 00:01:18 A long time ago, yeah. Yeah, it was, we all met Andy at Coho and did a number of Tech Field Day events with Coho Data where Andy gave serious sessions on storage technology and what they were doing. So that was great. So we're here to talk a little bit about S3 Express OneZone. I guess that's a new technology that you guys rolled out? It is. And Ray, I can't let you off the hook on that thing first. I had this thought. I was driving in here today. I was like, oh, this is the morning I've been looking forward to all week.

Starting point is 00:01:53 I'm on gray beards. And then I was like, wait a second. Howard's not doing it anymore. And Keith's on it. And I wonder if Keith now has a gray beard as a qualification for this thing. And then I had this moment of like, holy smokes, maybe the reference is actually to me. Maybe it's me with the. Keith every once in a while have a gray beard.

Starting point is 00:02:15 I haven't seen one in a while, but I've seen it in the past. Yeah. I'm rocking one now. All right. All right. You got to fit the show, Keith. Yeah. So let's talk about the S3 Express stuff.

Starting point is 00:02:31 I assume that this is still a pretty technical storage audience in the podcast, and I can just get into details on stuff. S3 was a really interesting change for me to come and work on. A lot of the storage work that I'd done as a researcher and in startups was much more toward the lower-level primitives, block-level primitives and stuff, and a lot of OS and hardware-level work with early NVMe and stuff like that. Right, right. S3 is kind of to keep with the Greybeards theme, I guess, is like 18 years old now.

Starting point is 00:03:15 And I remember it launching, which is a bit terrifying. And S3 as a storage system is a little bit weird because it's, you know, as you guys know, it's like a REST API, right? It's like it's almost a direct take on the HTTP protocol in terms of gets and puts and mapping the HTTP verbs to storage. And I guess to sort of like, you know, explain the motivation behind Express, which is this really low latency version of S3 that we've just launched. Um, when S3 originally launched, it was really, um, I think I found some old language that talks about it being the, um, the file system for the internet was, was how some of the team originally talked about it 18 years ago, but really it was more like the storage locker for the internet, right?

Starting point is 00:04:05 It was highly durable, you know, really, really secure. And it was, you know, the analog that I kind of think about is it was the place that you would put like golf clubs and skis and stuff in the back of your car and drive across town to and stick away knowing that they were safe, but also being a bit of a drive to get to. And over the past almost 20 years, we saw that shift to S3 hosting a lot of analytics. I think the Hadoop community in particular wrote a bunch of connectors, this connector called S3A being kind of the dominant one that got included in stacks like Cloudera. And folks started to move workloads off of on-premise HDFS type analytics file systems and onto S3. And so...

Starting point is 00:05:00 I think the other thing that's been real prominent lately is all the AI data requirements have just exploded, right? I think it's really three things. You need enough compute to build these enormous models. You need a lot of the algorithm sort of renaissance that we've had in terms of deep neural nets and a lot of the statistical work on machine learning. But you also need a big, massive pile of training data. And S3, I think, wound up being an enabler. Certainly not the only one, but a lot of these large-scale cloud storage systems were really enabler, certainly not the only one, but a lot of these like large scale cloud storage systems really enablers for that. Yeah. I find a lot of the, you know,

Starting point is 00:05:52 the typical standard AI data sets are all sitting on S3 someplace. You know, I mean, if you look at any of the historic AI stuff that's gone on in the last couple of decades, the canonical data sets are sitting in S3 someplace. Yeah, yeah, absolutely. There's actually, as an aside, and we'll get to S3 Express eventually, but as an aside on that, when I was relatively early on here, when I was switching from being back at UBC after the last

Starting point is 00:06:25 startup to Amazon, I was doing a whole bunch of, uh, really fun research work with this, uh, this botany group in, um, in, uh, in genomics at UBC. And they were, they were doing tons and tons of analysis on, um, on sunflower genomes, which is kind of an interesting thing because a lot of the genomics work that, that we talk to folks about is usually human genomics. Yeah. And the biologists have terrible senses of humor and they joke around that human genomics were very boring because the human genome was very similar across humans, whereas their joke was that sunflowers are much more promiscuous than humans.

Starting point is 00:07:06 And so they had a lot more noise in the data. But a lot of their data and a lot of the data sets that they were working with wound up being stored on this service that Amazon offers above S3 called Amazon Open Data. And so there are all these like enormous free curated data sets, everything from like sequence data on COVID to like restaurant menus in New York to train schedules to all sorts of stuff. And so there's, there's totally like the common crawl data set as one example has been one of the dominant data sets used for, for LLM training. And and that thing's hosted over AWS open Data and just receives a ton of traffic. Yeah, and it's an S3-based service, but there's APIs on top of it apparently, right?

Starting point is 00:07:55 Yep, that's right. Interesting. I think that's been one of the more interesting things about S3 and the evolution of S3 over the past few years. And before we get to kind of the express service, I'd love to talk a little bit about that evolution of abstracted services, from a blog post actually last night i read it from the author who said that amazon is really bad at higher level abstractions and s3 might be probably the perfect argument against that so and can you talk kind of like some of the natural evolution of abstracted services above, you know, just get puts and deletes of records or objects in S3. How has S3 kind of evolved over the years?

Starting point is 00:08:54 The whole Lambda thing is all kind of S3 driven too, you know? Yeah, you can, you know, you can create, you can create serverless applications. Actually I was helping a buddy redesign an app and I was thinking through how I would generate an event to create a Lambda and the whole bus is there that if the object

Starting point is 00:09:13 is created, then it creates an event and then I can run a Lambda off of that. But even before we even get into Lambda, as our buddy Corey Quinn will say, S3 is absolutely a database service. Oh yeah. It's a backend for many of them. Well, it's not just the backend for a lot, but a lot of folks are doing, you know,

Starting point is 00:09:37 heavy end database with S3 is the, you know, yeah. Yeah. It's the storage layer. Key volume store for such a thing then? Well, okay, Keith, that's like six different awesome topics that we can go through. Let me get through the API stuff real quick because I think the database thing is a super interesting topic. So the API stuff, I mean,

Starting point is 00:10:02 I wish that I could say that, and I wasn't here 18 years ago. I wish, I think the team also wishes to say that they just had incredibly good taste and good foresight designing the APIs. I think the reality was, like I mentioned, they stuck pretty close to the REST API initially. The REST API, when you look at the way that S3 uses it, turns out to be a pretty interesting object API. And it, I think, was really influential on a lot of object systems. I will say, though, that from the moment that I joined the team, the thing that was immediately starkly apparent to me was that I think the scale that S3 operates under, I mean, S3 is, I think the, I get this wrong. It's, I think the second oldest service. I think SQS may have

Starting point is 00:10:55 launched just ahead, but maybe we beat SQS. We definitely launched a little bit before EC2. And so as a consequence, you know, there was a lot of stuff on S3. And I think as the API grew, the team really become acutely aware of the burden of supporting any API changes that they made, and how important it was to make really careful decisions there. And so from the beginning of my time working on the service, I've been really impressed and kind of like, I don't know, just really impressed and pleased with how seriously the team takes those decisions. We have big internal fun technical arguments about how API changes should surface and whether we should launch like API level feature changes for, for things. And there's a lot of like,

Starting point is 00:11:52 you know, we're going to be supporting this for, for the next, you know, forever. So, so anyway, going, going around to the databases thing, that thing is super neat. I think that if you look back around the time that S3 was launching, at the time, the place where a lot of large-scale, read-heavy database workloads were was really around you would build a data warehouse for that. Yeah. And you would build a data warehouse by sitting down with a database engineer and designing a schema, right? Like you would kind of go and do the ERM and all that stuff. Like all the database or something like that? Is that what you're talking about? And the thing that's been really remarkable with S3 was the sort of, you know, the success of things like Apache Hadoop and then MapReduce and Spark and stuff. And then the success of the data lake pattern. And I think at its core,

Starting point is 00:12:51 the data lake pattern is basically, let's make the storage layer visible and accessible to any tool that wants to use it. And let's build the engine as a separable thing. And that's proven to be pretty successful over the 18 years of S3 with things like, you know, customers ultimately adopting columnar formats like Parquet, building stuff with like, you know, Athena, but then launching teams doing stuff with Cloudera or Databricks or whatever tooling or even third party service they want. And then over the last probably four years, three years, the massive growth of these open table formats, things like Iceberg, Delta Lake and Hudi, we're seeing huge adoption there. And those things are really serving as a sort of like a middle ground to close some of the gaps between building effective engines and working on top of object

Starting point is 00:13:50 style data. And when you're saying the engine, the engine is effectively the key value store and the mechanism that handles the API and then front end to back end kind of requests and things like that. No, the separation that I draw is the engine being the kind of query engine, right? The thing that's actually taking your SQL query or your whatever, Scala program or whatever. And then figuring out how to issue that. And the

Starting point is 00:14:20 really interesting thing with these open table formats is like with Parquet, we, I mean, we've always had folks doing this kind of work on S3. And historically, it was like flat log files, right? CSVs or just flat logs or things like that. And people would go and run grep against them or build whatever tooling against them. And then with the columnar formats like Parquet, you suddenly move to a thing where the data was semi-structured and you'd have a group of rows of a table and the group would

Starting point is 00:14:53 be divided into columns. And so if you were doing a query that only needed to look at two columns, that was just two big bulk reads to that range of the table. And so the place, and that's been the state of things up until, like I said, probably about three years ago. And the thing that these OTFs like Iceberg have really shifted to bring is those parquet formatted tables, you kind of have to address them directly and you build mountains and mountains of them and your schema can't really change and they're not really mutable and so what these open table formats are doing is they're adding this indirection layer where you actually get a table abstraction right they just build a bunch of extra metadata often stored alongside the the parquet files often sort of in parquet actually and and now

Starting point is 00:15:42 if you want to add a column to your table, you can update the schema and the OTF implementation, like Iceberg, just knows that those old Parquet files, they're going to get a default value for those columns and things. And so it really does kind of start to build a table style data store on top of the object storage, but it preserves the thing that you have like thousands and thousands of web servers offering up the data. And so from a throughput perspective,

Starting point is 00:16:12 you can like peak up to bursts of hundreds of terabytes a second to run a query. Hundred of terabytes per second? Yeah. Wait a minute, there's a whole different scale here we're talking about. So the key value store is someplace in there, embedded in either the engine or in the columnar store, or it's part of this whole structure, I guess, someplace, right?

Starting point is 00:16:42 It's basically the storage layer of the stack. That's right. Yeah, and I think this is where people, especially folks coming from traditional enterprises, kind of don't get about S3 and how deep it goes. I had a buddy from VMware go over to join the team over at Amazon as a principal engineer working on S3. And I'm thinking, wait, isn't S3 done? It has one, it's the object storage

Starting point is 00:17:21 of the cloud. And it's with these other services that it is feeding, it's more than just gets, puts, and destroys for objects at high performance, but lending itself to these higher level capabilities where it's essentially lending itself to be a built-in database service within itself when you need these basic capabilities at scale. So now we're getting into kind of where S3, I think we're getting into the part where S3 isn't done, like this express zone and this idea of one of Amazon's, I think, most resilient services. What's the data guarantee for S3? The durability guarantee, you mean? Yeah. Yeah. I mean, the durability guarantee, I mean, the two dominant components to it, and we split them out, are that it's designed for 11 nines of durability and that it's designed to

Starting point is 00:18:27 survive the loss of a single facility and we we tease those two things apart because there are events that happen at like really different um granularities in time and so when we talk about the 11 nines thing um the team internally in the design is looking at the failure rates that we're seeing at scale of physical servers and disks dominantly. So we're looking across thousands and thousands of those things, like huge populations. We run stats at a per model level, at a per data center level and stuff. And we are tuning the repair system to make sure that we are replicating and then repairing data at a rate that preserves a goal of 11 nines of durability inside the system. And so it's kind of like a phenomenal level of durability, but it's kind of like a phenomenal level of, of durability, but it's really a design guideline where we're like keeping an enormous amount of, of buffer against failure.

Starting point is 00:19:31 And then separately from that end of the design, we make sure that the way that we place data and the redundancy that we have for data is such that even if we lost an entire facility, right? So like, you know, like something horrible happened to a building, that we would be in a position where the data would still be redundant and safe. I was reading somewhere in S3 Express One Zone stuff that they claimed an availability of 99.95%. So I guess that's where we get into like the trade-off like that there's overhead

Starting point is 00:20:07 associated with 11 nines of durability and some of that might be performance and it's express zones is for those few of us actually it's not a few that says you know i need more performance out of s3 because this is where i3 because this is where I want my data to be. I don't want to have to move to block or file to get better performance. I don't want to change my application architecture. I just want faster object. You know, it's very unusual in my history

Starting point is 00:20:39 as a gray beard and storage to see storage solutions come out that are not only orders of magnitude faster, but 50% cheaper. How did you guys do that? That's a craziness. What are you guys, you're doing this wrong, Andy, if that's what you're doing. I'm just trying to say this.

Starting point is 00:20:57 You're leaving money on the table as a former graybird would say. Holy smokes. You guys are just like volleying the uh yeah so you wanted questions we're here for questions okay so uh um i mean on the on on the um uh on the cost thing we're just like absolutely ruthless about you know it's it's it's the amazon frugality thing i think where we're the team is is absolutely ruthless about um efficiencies in the system and adopting more and more efficient hard drive capacities and working for platform efficiencies and stuff. Flipping back to what Keith was saying, Keith, you were steering toward the express thing, but I'm trying to remember the question. Yeah. The question is the compromise in high door B11-9s is typically

Starting point is 00:21:56 performance. And so this is an awesome question. So one of the other principal engineers on the team, and we'd been kicking, like the service had been kicking around this idea. Customers have been asking, right? Like first we were kind of that storage locker, and then we did the analytics workloads, and that really drove a ton of scaling on the web server side, and it was a throughput win, right? Like we really increased the aperture, the width of the network coming into S3. And then over the past bunch of years, it's exactly the thing that you were saying, Ray, that customers are starting to tell us that they would really prefer to use S3 as a building block for primary storage for all of their data. And to do that, we need to further close gaps on the sort of performance surface. And it's latency now, right?

Starting point is 00:22:44 They need the data to be quick as well as, as voluminous. And so Keith, like to the, the way that you're, you're talking about it, it's exactly right. That, that facility failure durability requirement, Amazon takes this availability zone primitive so seriously. And it's a thing that wasn't at all obvious to me coming out of enterprise storage, that when we talk about building a region with three availability zones, those are three carefully surveyed locations that are intended to have very strong levels

Starting point is 00:23:24 of physical fault isolation. They're on all sorts of high levels of design separation for things like power and exposure to various forms of disasters. And there's some distance apart and things like that. And so it is intended to be a building block for distributed systems because it's intended to be such that it's very, very unlikely that you would ever lose more than one of those short of a really severe regional event. And so when we started to look at this, like ask from customers to build a quicker S3, right, to really bring the latency down. And, you know, to give you context, with the regional S3 for object storage, you're in the like mid tens of milliseconds, you know, like, it kind of depends on the workload and request sizes and stuff. But you're in the like, you know, 30 to 50

Starting point is 00:24:18 ish milliseconds roundtrip for at least first byte. And that's because it's an object API built on top of HTTP, and there's all sorts of access control checks, and there's just like 18 years of legacy. But it's also because that system is composed of a whole bunch of microservices, and all of the microservices are concerned with surviving the loss of an entire building. And so as we looked at taking the latency down, we realized that those services internally were doing a lot of careful round trips to make sure the data was resilient across at least three different availability zones. And so one of the other PEs and I did a bunch of initial prototyping. We sat down. We actually carved off time every Friday for part of a year and built a prototype of Express.

Starting point is 00:25:11 And one of the initial things we realized was that we were going to have to build it inside a single AZ if we really wanted to get down. We were shooting to do low single digit milliseconds as an initial design goal. And so Express actually keeps the 11 nines design. And so when we look at Express, we're still monitoring host and drive failures. And under a steady state, we are designing such that, you know, based on the failures of media or servers, we still are providing that really, really high level of durability, right? It's, you know, I don't know what the number would be

Starting point is 00:25:50 for something like a RAID 6. It would depend on how fast you're replacing the drives and stuff, but like we're way above that, right? Like we're shooting for a huge amount of design resiliency. However, we're not resilient against the loss of the AZ. And so if there was any kind of an event that impacted the entire AZ, like many, many racks of things inside a building, then we wouldn't be resilient for that. And so, you know, some customers really like hold a lot of S3 customers really hold the durability regionally to be kind of the paramount thing. And so they'll keep their primary copy there and we're seeing a big pattern

Starting point is 00:26:32 where folks shift data into Express when it's active and run workloads off of it there. Well, I mean, how else are you going to do it? I don't know how else you do a single millisecond type of response time across multiple AZs. You know, you have to, when a write occurs, you have to make sure it's all in all these locations, et cetera, et cetera. It can't be done. They're physically not possible. Yeah, but I think the compromise is a great compromise because the types of applications we're asking to run this type of high availability uh high performance read section off of we're stuck from a architecture perspective

Starting point is 00:27:15 right once we get to that scale and we've developed it on top of object storage and now the only way to uh get that performance and we're thinking about tier and i've talked to customers that's doing this where you know object was the long-term highly durable platform then they move the high performance stuff to block or to uh foul and that would get them their performance but now they have two different you know application development paths they have to to FAL, and that would get them their performance. But now they have two different application development paths. They have to break their app, their existing architecture, to deal with FAL, which is not as friendly as Object

Starting point is 00:27:55 when it comes to development perspective. Now customers are continuing to do the same thing they've done from a data tiering perspective, but now they don't have to change their app. They have to move data, yes, but they were moving data to begin with. Now they're moving data without needing to change the way that they call and retrieve data. Yeah, yeah, yeah.

Starting point is 00:28:20 I would quibble with the fact that which one is more developer-friendly, file or object. I mean, they both have different characteristics, obviously. Yeah, I don't think it's a matter of which one is more friendly. I think it's, you know what, this is what we've done. And to change what we're doing is more painful. Yeah, exactly. It's totally that.

Starting point is 00:28:39 And I'm glad that your reaction is that, you know, it's a good job that we've done on the latency thing. I think I personally and I think the team agrees that there's still room to go, right? Like we still want to close these gaps and make it really, really straightforward to use. And we'll just continue to simplify. And the thing that you guys are both pointing at on even the file object thing, I'll tell you, is like this is customers want that are kind of driven by the you know the physics of the technology and stuff so in the case of of express you see this like one zone offering instead of a regional offering and and you know there's there's a performance win but there's also a choice that that the customer has to make, right? That the engineer has to make when they're placing stuff versus just having it simple.

Starting point is 00:29:48 And so on the file object thing, if you look at the services, we're actually like actively working to get customers out of the spot where they ever have to make a decision across those things. And so if you look at S3, in the last year, we launched a new feature called MountPoint, which is a file connector for Linux.

Starting point is 00:30:12 It's not a full POSIX file system, but it's a very, very high throughput HDFS style semantics connector that lets you actually mount S3 buckets as files. And if you look at the flip side of it for much longer than the last year, FSX, which is our managed storage suite that a lot of enterprises use as a first step into AWS, FSX Lustre has these data repositories that can be backed on S3 and customers will hydrate those out of S3 and then run big HPC workloads against the data and then move their changes back to S3 under the covers. On FSx, you've got literally almost a half a dozen services, different services, depending

Starting point is 00:30:55 on what you've got. Solaris, you've got your standard file systems, you've got our friends in NENEP, et cetera, et cetera. Yeah. Yeah. Yeah. Yeah. Yeah.

Starting point is 00:31:04 Yeah. Yeah. And we Yeah. Yeah. And we continue to grow that and we continue to work on it. So like for folks that don't know what these are, FSX is a set of managed storage offerings, right? They're basically you can provision up a storage target as you would assemble a storage target in the enterprise. And so there's, like you say, there's Lustre, there's ZFS, there's NetApp on tap that we've built in partnership with NetApp. There's FSx Windows Storage Server. I'm sure I'm missing something. EFS. Well, EFS is not actually one of the FSxs. EFS is- It should be. Es is an elastic service but

Starting point is 00:31:47 the interesting thing with the fsxs is uh is they really meet an enterprise storage admin on their terms right like the the you know you guys have i'm sure talked to loads of those folks i spent loads of time talking to the difficult job of being an enterprise storage admin. You're always like stuck in the middle between a lot of tough stuff. And when we talk to those folks, the first thing they say is like,

Starting point is 00:32:12 don't make my life harder, right? Like don't change the stuff that I have to deal with to move through the cloud. And so with the FSxs, like ONTAP, for example, they can literally stand up a NetApp virtual filer on AWS. They can configure SnapMirror to move the data. And now they've got an ONTAP presence inside FSX and AWS. They have a disaster recovery site.

Starting point is 00:32:39 And we launched a big pile of huge performance improvements, like five to seven X read and write performance increases on those ONTAP FSX offerings at reInvent. And the thing that we're seeing those enterprise customers do is they'll use SnapMirror, they'll replicate the data into AWS, and then they'll run jobs with thousands of lambdas against the data into AWS, and then they'll run jobs with thousands of lambdas against the data. And they would never provision that much compute in their data center because they don't need it all the time. But being able to burst in and run a huge job like that is a spectacular opportunity for them to go and do that. And this is important. I think this is, and me and Ray have had this debate, like what's the future of AI storage?

Starting point is 00:33:28 And my pushback, the answer has always been yes. Give you kind of the example use case from when I worked in pharma and we had these sequencers. The sequencers, these things are multi-year, multi-decade investments. Yeah, totally. And they save data in file. The data scientists and the scientists working against these data sets are using file-based tools. Absolutely. Yep. Just like when we talked about the object-to-object data tiering, it breaks the workflow when we have to give data scientists and scientists tools with one interface at one level. And then when we need a different level of performance, we give them a different interface.

Starting point is 00:34:17 They despise that shift in workflow. Yep. Yep. Absolutely. And that friction is awful. And that is a thing that we are, we're absolutely focused on with, with a lot of this stuff. And so like in the example of like, like you say, like, you know, like a reasonably large pharma or genomics based firm that's got like an Illumina or whatever, and they're pulling stuff off in file and they've got, you know, got bioinformatics folks that have trained on Linux or whatever, Unix, and they're used to file

Starting point is 00:34:51 APIs to a file-attached bit of data. They want the cost basis and especially the performance scaling of object, but they're like, forget it. I didn't even write these tools. These are open source tools that work against file. And so that, um, that community, the bioinformatics genomics folks were one of the most vocally, um, excited when we did this, this initial, um, mount point launch for, for S3, the file APIs, because those folks are now... I'll come back to you with an example.

Starting point is 00:35:30 We've got a couple of really excited customers and I'm not sure if they're public references. No, no, no, it's fine. It's fine. It's certainly exciting for me. But one thing you did say, I just want to go back to this, and you just kind of threw it out, Andy.

Starting point is 00:35:43 FSx on tap can be configured to support Lambda. So I always thought Lambda was an object service. So an object would show up in a bucket and all of a sudden you'd fire up a storage, you know, a compute engine or something like that. But you can do this with a file? Well, inside Lambda, I mean, you launch the Lambda and it's a, it's a compute environment. And so you can do a NFS mount inside there. And so a lot of the, a lot of the more, the more powerful patterns, I would say on Lambda that you were talking about earlier, especially the bindings to, or maybe it was Keith that was saying like bindings to events where, you know, you do a put into an S3 bucket for one place and that triggers a Lambda launch. But stay tuned on that front. Like we're, you know, we're really pushing to open up a lot of

Starting point is 00:36:34 these things. I can't tell you how important this is to the enterprise middleware market. Like we've used tools that have been extremely inconsistent at best, that when a file is dumped to a directory, there's a process kickoff. Right. Yeah. It can be a million events in a day, a million files written to a directory and a workflow. And when it fails three times, it's important. It's a nightmare. And then you got to go and sweep and scrub and figure out all the stuff that you missed. And it's a really awful pattern to have to do.

Starting point is 00:37:13 So the fact that I can get rid of the middleware process altogether, use the Amazon event bus to basically trigger lambdas off of any time a file is written to a directory. So, you know, just think about the sequencing process. Every time a data scientist uploads or a clinic uploads a new set of sequences from their aluminum machine and it's and it reaches a directory in AWS, a Lambda is kicked off to do the conversion and analysis needed so that the scientists, the data scientists and the scientists looking for the end product, there's, there's no massaging of data. It's all, if there's ETL that needs to be done, all that is already done. And it's, it's, it's true in genomics. It's true with like media transcoding. It's true with folks working with like, uh, large medical images where they need to generate

Starting point is 00:38:10 thumbnails. Like there's this, this workflow pattern of, of events to like an ETL style transform, um, to ingestion is, is really common. And so just to be clear, we're, we're, we're there with, with S3, we're there with S3 and file on top of Mount Point. And we're filling it out, right? Like it's a thing that we're focused on. We just want to remove the need for an engineer or a designer to have to make a choice based on file protocol or storage protocol up front. All right. We're getting close to the end here.

Starting point is 00:38:44 I really haven't talked at all about S3 Express, what you guys have done internally to make this happen. You mentioned the single millisecond types of response time. It's still on disk. So there's got to be a lot of caching going on and stuff like that. So no, we've moved to, there's a few structural differences inside Express. So like I said, it's in a single zone. So it removes a whole bunch of network hops. That's a bit of physics that really speeds things up. We've moved to higher performance media under it. And so there's a big bump that happens there.

Starting point is 00:39:19 And then we've made some software changes inside. The system is entirely written in Rust, which has been a really big shift for us internally. How do you guys like Rust? That's a different question. We'll do that offline. Okay, go ahead. I mean, we're finding a lot of success

Starting point is 00:39:37 on that side for the teams. And we're finding that, you know, you get code that you don't have to go back and make as many bugging style changes to. And then we've done some other pretty cool stuff on the software side. So Express introduces a new bucket type for the first time since S3 launched.

Starting point is 00:39:59 The Express buckets are called directory buckets. And they're designed to be higher TPS, lower latency as a metadata layer. You can kind of think of it as the file system metadata layer inside the object store. And then we also did a super interesting thing as part of those directory buckets where in the session protocol, right? In the connection that you open up over the network to talk to S3 Express, we hoist a bunch of the authentication and access control checks up to connection time, and then have a lighter weight validation of those things when you access a file or an object. And so doing that work, it was kind of a bit of reinvention of the session protocol. And it means that we get a whole bunch of extra sort of latency improvement in terms of object access.

Starting point is 00:40:52 So there's a bunch of super cool stuff happening. That was great. A stateful protocol here. I mean, is that what you're saying? Is that what's going on? That's right. There's a bit of session state on the protocol. That's interesting for S3. So directory buckets are even faster than normal S3 express object, normal standard buckets?

Starting point is 00:41:21 It's kind of a reinvention of the namespace itself inside of S3. I mean, there was always a flat namespace. There is no namespace here. Well, there's been a whole bunch of stuff that we've put in there to really focus on starting to add some structure internally and really engineer to really, really high TPS workloads. We say TPS transaction per second, right? Yeah, that's right. That's right. Huh?

Starting point is 00:41:54 So you're putting structure inside of an object store? I mean, other than the bucket count and the bucket name and stuff. So yeah, we're, we're, we're doing a whole bunch of pretty interesting stuff under the covers. And I, I think we're kind of just getting started on the express stuff. So I'm, I'm pretty excited to see where it goes. I would say that, um, what we are seeing on the express stuff is when, you know, when we first started prototyping and we were first working on it, we really thought that it was going to be, I don't want to say like niche, but we thought it was going to be this like smaller set of really active workloads. And what we're seeing is even for like bulk analytics and things like those genomics workloads and things where I had assumed that the higher level applications were doing a really good job of backgrounding IO, it turns out that there's actually a lot of opportunity to speed those applications up by reducing latency on the object store.

Starting point is 00:43:11 And so what we see with Express is because of all of the changes we've made to faster media and wrapping a lot more, you know, compute around things, you see a price structure where the capacity costs more, but request prices are a lot lower. And so it's really intended to be, and the data is not flowing as far, it's really intended to be a request-oriented service. And as a result of that, what we're seeing is TCO savings on end-to-end workloads. And so we see customers, there's one example that I talked about in the storage talk at reInvent this year, where we took an image training benchmark that takes about 15 days to run all out on a set of P instances. And moving the data off of S3 regional from

Starting point is 00:43:59 standard into S3 express, we actually shaved a day off of the training time for that. And I don't remember what the number is. I think we shaved, I'm not going to say, I can't remember, but we actually got like a sizable cost savings for the workload. And that's something that we've seen. Pinterest had an example with about a 40% cost savings. We're starting to see from Express customers just getting pretty big wins because they're using less savings. We're starting to see from Express customers just getting pretty big wins because they're using less compute. They're running their compute less long

Starting point is 00:44:30 because they're not waiting for storage for these active ones. Yeah, so there's kind of like this double-sided advantage. One, you're getting the performance and the cost savings associated with performance. Then you're also getting kind of this, I don't know if you've quantified it, but this additional agility on the dev ops side of not needing to do this transformation for this tiering of storage.

Starting point is 00:45:01 Have you guys measured that cost benefit of not needing to redevelop that? If you have ideas on how we could measure that, I would love to bounce those around. I think it's going to end up being a bunch of anecdotal stuff. But we're definitely, the voice of the customer on that one so far, and we're only, what, we launched the thing less than a month ago? A couple months, right? Maybe just over a month ago.

Starting point is 00:45:29 The voice of the customer has been really positive. Folks are really talking about that ease of use and just being able to build on top of it. So it seems to be pretty positive so far. I think you're talking to the right people about trying to identify how to, how to talk about this. So yeah, I'll send a note to, uh, I'll send a note to you guys,

Starting point is 00:45:48 but that's, that's really interesting. I'm always about the, uh, operators experience and making and quantifying, or at least showing the operator side, because that's the thing that people get stuck up on and where projects fail because of that friction between the operator or builder and the platform.

Starting point is 00:46:06 Yeah, I agree. All right, gents, this has been great. so Keith, any last questions for Andy? We could go on for a couple.

Starting point is 00:46:14 You know what? This is probably one of those both podcasts, uh, Ray, where you probably shouldn't ask that question. I have plenty of questions for Andy, but we are, we're at time. I submit.

Starting point is 00:46:26 Yeah. Yeah. Yeah. Yeah. Andy, anything is, uh, you'd like to say to our listening audience before we close? Um, no, I mean, it's, it's so awesome to spend time talking to you guys and, uh, you know, uh, I'd encourage folks to, to kick the tires on, on any of the things that we talked about today. If you're an S3 customer, you should try Express. If you're an enterprise customer, you should take a look at the FSX family of products because we're getting a lot of really positive

Starting point is 00:46:53 reaction to those. Well, this has been great. This has been awesome, Andy. Thanks very much for being on our show today. Glad to have you back. Glad to be back. Thanks a lot, both of you. Yeah, yeah. That's it for now.

Starting point is 00:47:08 That's it for now. Bye, Andy. Bye, Keith. Bye, Ray. Bye, Ray. Until next time. Next time, we will talk to the system storage technology person.

Starting point is 00:47:19 Any questions you want us to ask, please let us know. And if you enjoy our podcast, tell your friends about it. Please review us on Apple Podcasts, Google Play, and Spotify, as this will help get the word out. Thank you.

CODACE Plant Stand

Grey Beards on Systems - 161: Greybeards talk AWS S3 storage with Andy Warfield, VP Distinguished Engineer, Amazon

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.

Your Ad Here

CODACE Plant Stand

Grey Beards on Systems - 161: Greybeards talk AWS S3 storage with Andy Warfield, VP Distinguished Engineer, Amazon

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.