Grey Beards on Systems - 161: Greybeards talk AWS S3 storage with Andy Warfield, VP Distinguished Engineer, Amazon
Episode Date: January 19, 2024We talked with Andy Warfield (@AndyWarfield), VP Distinguished Engineer, Amazon, about 10 years ago, when at Coho Data (see our (005:) Greybeards talk scale out storage … podcast). Andy has been a g...ood friend for a long time and he’s been with Amazon S3 for over 5 years now. Since the recent S3 announcements at … Continue reading "161: Greybeards talk AWS S3 storage with Andy Warfield, VP Distinguished Engineer, Amazon"
Transcript
Discussion (0)
Hey everybody, Ray Lucchese here with Keith Townsend.
Welcome to another sponsored episode of the Greybeards on Storage podcast,
a show where we get Greybeards bloggers together with storage assistant vendors
to discuss upcoming products, technologies, and trends affecting the data center today. We have with us here today Andy Warfield, an old friend and VP Distinguished Engineer at Amazon.
So Andy, why don't you tell us a little bit about yourself and what you've been up to with S3 at Amazon.
All right. Well, hey, Ray and Keith. It's awesome to be on the show. I'm an engineer at Amazon. I work across all of our storage products. I've been at Amazon for about six years, a little over six years. And I joined actually from a storage startup called Coho Data. I was a professor
at the University of British Columbia in Vancouver before it as well. And I was on the
Greybeards as part of that startup, I guess, probably.
A long time ago, yeah.
Yeah, it was, we all met Andy at Coho and did a number of Tech Field Day events with Coho Data
where Andy gave serious sessions on storage technology and what they were doing. So that
was great. So we're here to talk a little bit about S3 Express OneZone. I guess that's a new
technology that you guys rolled out? It is. And Ray, I can't let you off the hook on that thing first.
I had this thought.
I was driving in here today.
I was like, oh, this is the morning I've been looking forward to all week.
I'm on gray beards.
And then I was like, wait a second.
Howard's not doing it anymore.
And Keith's on it.
And I wonder if Keith now has a gray beard as a qualification for this thing.
And then I had this moment of like, holy smokes, maybe the reference is actually to me.
Maybe it's me with the.
Keith every once in a while have a gray beard.
I haven't seen one in a while, but I've seen it in the past.
Yeah.
I'm rocking one now.
All right.
All right.
You got to fit the show, Keith.
Yeah.
So let's talk about the S3 Express stuff.
I assume that this is still a pretty technical storage audience in the podcast, and I can just get into details on stuff.
S3 was a really interesting change for me to come and work on.
A lot of the storage work that I'd done as a researcher and in startups
was much more toward the lower-level primitives, block-level primitives and stuff,
and a lot of OS and hardware-level work with early NVMe and stuff like that.
Right, right.
S3 is kind of to keep with the Greybeards theme, I guess,
is like 18 years old now.
And I remember it launching, which is a bit terrifying.
And S3 as a storage system is a little bit weird because it's,
you know, as you guys know, it's like a REST API, right?
It's like it's almost a direct take on the HTTP protocol in terms of gets and puts and mapping the HTTP verbs to storage. And I guess to sort of like,
you know, explain the motivation behind Express, which is this really low latency version of S3 that we've just
launched. Um, when S3 originally launched, it was really, um, I think I found some old language
that talks about it being the, um, the file system for the internet was, was how some of the team
originally talked about it 18 years ago, but really it was more like the storage locker for the internet, right?
It was highly durable, you know, really, really secure. And it was, you know, the analog that I
kind of think about is it was the place that you would put like golf clubs and skis and stuff in
the back of your car and drive across town to and stick away knowing that they were safe, but also being a bit of a drive to get to.
And over the past almost 20 years, we saw that shift to S3 hosting a lot of analytics.
I think the Hadoop community in particular wrote a bunch of connectors, this connector
called S3A being kind of the dominant one that got included in stacks like Cloudera.
And folks started to move workloads off of on-premise HDFS type analytics file systems and onto S3.
And so...
I think the other thing that's been real prominent lately is all the AI data requirements have just exploded, right? I think it's really three things. You need enough compute to build these enormous models.
You need a lot of the algorithm sort of renaissance that we've had in terms of deep neural nets and a lot of the statistical work on machine learning.
But you also need a big, massive pile of training data.
And S3, I think, wound up being an enabler.
Certainly not the only one, but a lot of these large-scale cloud storage systems were really enabler, certainly not the only one,
but a lot of these like large scale cloud storage systems really enablers for
that.
Yeah. I find a lot of the, you know,
the typical standard AI data sets are all sitting on S3 someplace.
You know, I mean, if you look at any of the historic AI stuff that's gone on
in the last couple of decades,
the canonical data sets are sitting in S3 someplace.
Yeah, yeah, absolutely.
There's actually, as an aside, and we'll get to S3 Express eventually,
but as an aside on that, when I was relatively early on here,
when I was switching from being back at UBC after the last
startup to Amazon, I was doing a whole bunch of, uh, really fun research work with this, uh, this
botany group in, um, in, uh, in genomics at UBC. And they were, they were doing tons and tons of
analysis on, um, on sunflower genomes, which is kind of an interesting thing because a lot of the
genomics work that, that we talk to folks about is usually human genomics.
Yeah.
And the biologists have terrible senses of humor and they joke around that human genomics
were very boring because the human genome was very similar across humans, whereas their
joke was that sunflowers are much more promiscuous than humans.
And so they had a lot more noise in the data. But a lot of their data and a lot of the data sets that they were working with wound up being stored on this service that Amazon offers above S3 called Amazon Open Data.
And so there are all these like enormous free curated data sets, everything from like sequence data on COVID to like restaurant menus in New
York to train schedules to all sorts of stuff. And so there's,
there's totally like the common crawl data set as one example has been one of
the dominant data sets used for, for LLM training.
And and that thing's hosted over AWS open Data and just receives a ton of traffic.
Yeah, and it's an S3-based service, but there's
APIs on top of it apparently, right?
Yep, that's right.
Interesting.
I think that's been one of the more interesting things about S3 and the evolution of S3 over the past few years.
And before we get to kind of the express service, I'd love to talk a little bit about that evolution of abstracted services, from a blog post actually last night i read it from the author who said that amazon is really
bad at higher level abstractions and s3 might be probably the perfect argument against that so
and can you talk kind of like some of the natural evolution of abstracted services above, you know,
just get puts and deletes of records or objects in S3.
How has S3 kind of evolved over the years?
The whole Lambda thing is all kind of S3 driven too, you know?
Yeah, you can, you know, you can create,
you can create serverless applications.
Actually I was helping a buddy redesign an app and I was thinking
through how I would
generate an event to create a
Lambda and the whole bus
is there that if the object
is created, then it creates an event
and then I can run a Lambda
off of that. But even
before we even get into Lambda,
as our buddy Corey
Quinn will say, S3 is absolutely
a database service. Oh yeah. It's a backend for many of them.
Well, it's not just the backend for a lot, but a lot of folks are doing, you know,
heavy end database with S3 is the, you know, yeah. Yeah. It's the storage layer.
Key volume store for such a thing then?
Well, okay, Keith,
that's like six different awesome topics that we can go through.
Let me get through the API stuff real quick
because I think the database thing
is a super interesting topic.
So the API stuff, I mean,
I wish that I could say that,
and I wasn't here 18 years ago.
I wish, I think the team also wishes to say that they just had incredibly good taste and good foresight designing the APIs.
I think the reality was, like I mentioned, they stuck pretty close to the REST API initially. The REST API, when you look at the way that S3 uses it, turns out to be a pretty
interesting object API. And it, I think, was really influential on a lot of object systems.
I will say, though, that from the moment that I joined the team, the thing that was immediately
starkly apparent to me was that I think the scale that S3 operates under, I mean, S3 is,
I think the, I get this wrong. It's, I think the second oldest service. I think SQS may have
launched just ahead, but maybe we beat SQS. We definitely launched a little bit before EC2.
And so as a consequence, you know, there was a lot of stuff on S3. And I think as the API grew,
the team really become acutely aware of the burden of supporting any API changes that they made,
and how important it was to make really careful decisions there. And so from the beginning of my time working on the service,
I've been really impressed and kind of like, I don't know,
just really impressed and pleased with how seriously the team takes those decisions.
We have big internal fun technical arguments about how API changes should surface and whether we
should launch like API level feature changes for, for things. And there's a lot of like,
you know, we're going to be supporting this for, for the next, you know, forever.
So, so anyway, going, going around to the databases thing, that thing is super neat. I think that if you look back around the time that S3 was launching, at the time, the place where a lot of large-scale, read-heavy database workloads were was really around you would build a data warehouse for that.
Yeah.
And you would build a data warehouse by sitting down with a database engineer and designing a schema, right? Like you would kind of go and do the ERM and all that stuff.
Like all the database or something like that? Is that what you're talking about?
And the thing that's been really remarkable with S3 was the sort of, you know, the success of
things like Apache Hadoop and then MapReduce
and Spark and stuff. And then the success of the data lake pattern. And I think at its core,
the data lake pattern is basically, let's make the storage layer visible and accessible to any tool
that wants to use it. And let's build the engine as a separable thing. And that's proven to be pretty successful over the 18 years of S3
with things like, you know, customers ultimately adopting columnar formats like Parquet,
building stuff with like, you know, Athena, but then launching teams doing stuff with
Cloudera or Databricks or whatever tooling or even third party service they want. And then over the last probably four years,
three years, the massive growth of these open table formats, things like Iceberg,
Delta Lake and Hudi, we're seeing huge adoption there. And those things are really serving as a
sort of like a middle ground to close some of the gaps between building effective engines and working on top of object
style data.
And when you're saying the engine, the engine is effectively the key value store
and the mechanism that handles the API and then front end to back end kind of requests
and things like that.
No, the separation that I draw is the engine being the kind of query engine, right?
The thing that's actually taking your SQL query
or your whatever, Scala program or whatever.
And then figuring out how to issue that. And the
really interesting thing with these open table formats is
like with Parquet,
we, I mean, we've always had folks doing this kind of work on S3.
And historically, it was like flat log files, right?
CSVs or just flat logs or things like that.
And people would go and run grep against them or build whatever tooling against them.
And then with the columnar formats like Parquet, you suddenly move to a thing where
the data was semi-structured and you'd have a group of rows of a table and the group would
be divided into columns. And so if you were doing a query that only needed to look at two columns,
that was just two big bulk reads to that range of the table. And so the place, and that's been the state of things up until,
like I said, probably about three years ago. And the thing that these OTFs like Iceberg have
really shifted to bring is those parquet formatted tables, you kind of have to address them directly
and you build mountains and mountains of them and your schema can't really change and they're not really mutable and so what these open table formats are doing
is they're adding this indirection layer where you actually get a table
abstraction right they just build a bunch of extra metadata often stored
alongside the the parquet files often sort of in parquet actually and and now
if you want to add a column to your table, you can update the schema and
the OTF implementation, like Iceberg, just knows that those old Parquet files,
they're going to get a default value for those columns and things. And so it really does kind of
start to build a table style data store on top of the object storage,
but it preserves the thing that you have
like thousands and thousands of web servers
offering up the data.
And so from a throughput perspective,
you can like peak up to bursts
of hundreds of terabytes a second to run a query.
Hundred of terabytes per second?
Yeah.
Wait a minute, there's a whole different scale here we're talking about.
So the key value store is someplace in there,
embedded in either the engine or in the columnar store,
or it's part of this whole structure, I guess, someplace, right?
It's basically the storage layer of the stack.
That's right.
Yeah, and I think this is where people,
especially folks coming from traditional enterprises,
kind of don't get about S3 and how deep it goes.
I had a buddy from VMware go over to join the team
over at Amazon as a principal
engineer working on S3. And I'm thinking, wait, isn't S3 done? It has one, it's the object storage
of the cloud. And it's with these other services that it is feeding,
it's more than just gets, puts, and destroys for objects at high performance, but lending itself
to these higher level capabilities where it's essentially lending itself to be a built-in database service within itself when you need these basic capabilities at scale.
So now we're getting into kind of where S3, I think we're getting into the part where S3 isn't done,
like this express zone and this idea of one of Amazon's, I think, most resilient services. What's the data guarantee
for S3? The durability guarantee, you mean? Yeah. Yeah. I mean, the durability guarantee,
I mean, the two dominant components to it, and we split them out, are that it's designed for
11 nines of durability and that it's designed to
survive the loss of a single facility and we we tease those two things apart because there are
events that happen at like really different um granularities in time and so when we talk about
the 11 nines thing um the team internally in the design is looking at the failure rates that we're seeing at scale of physical servers and disks dominantly.
So we're looking across thousands and thousands of those things, like huge populations.
We run stats at a per model level, at a per data center level and stuff. And we are tuning the repair system to make sure that we are
replicating and then repairing data at a rate that preserves a goal of 11 nines of durability
inside the system. And so it's kind of like a phenomenal level of durability, but it's kind of like a phenomenal level of, of durability, but it's really a design guideline where we're like keeping an enormous
amount of, of buffer against failure.
And then separately from that end of the design,
we make sure that the way that we place data and the redundancy that we have
for data is such that even if we lost an entire facility, right?
So like, you know, like something horrible happened to a building,
that we would be in a position where the data would still be redundant and safe.
I was reading somewhere in S3 Express One Zone stuff
that they claimed an availability of 99.95%.
So I guess that's where we get into like the trade-off like that there's overhead
associated with 11 nines of durability and some of that might be performance and it's
express zones is for those few of us actually it's not a few that says you know i need
more performance out of s3 because this is where i3 because this is where I want my data to be.
I don't want to have to move to block or file
to get better performance.
I don't want to change my application architecture.
I just want faster object.
You know, it's very unusual in my history
as a gray beard and storage
to see storage solutions come out
that are not only orders of magnitude faster, but 50% cheaper.
How did you guys do that?
That's a craziness.
What are you guys, you're doing this wrong, Andy,
if that's what you're doing.
I'm just trying to say this.
You're leaving money on the table as a former graybird would say.
Holy smokes.
You guys are just like volleying the uh yeah so you wanted questions
we're here for questions okay so uh um i mean on the on on the um uh on the cost thing we're just
like absolutely ruthless about you know it's it's it's the amazon frugality thing i think where we're
the team is is absolutely ruthless about um efficiencies in the system and adopting more and more efficient hard drive capacities and working for platform efficiencies and stuff.
Flipping back to what Keith was saying, Keith, you were steering toward the express thing, but I'm trying
to remember the question. Yeah. The question is the compromise in high door B11-9s is typically
performance. And so this is an awesome question. So one of the other principal engineers on the team, and we'd been kicking, like the service had been kicking around this idea.
Customers have been asking, right?
Like first we were kind of that storage locker, and then we did the analytics workloads, and that really drove a ton of scaling on the web server side, and it was a throughput win, right?
Like we really increased the aperture, the width of the network coming into S3. And then over the past bunch of years, it's exactly the thing that you were saying, Ray,
that customers are starting to tell us that they would really prefer to use S3 as a building
block for primary storage for all of their data.
And to do that, we need to further close gaps on the sort of performance surface.
And it's latency now, right?
They need the data to be quick as well as, as voluminous. And so Keith, like to the, the way that you're,
you're talking about it, it's exactly right. That, that facility failure durability requirement,
Amazon takes this availability zone primitive so seriously.
And it's a thing that wasn't at all obvious to me coming out
of enterprise storage, that when we talk about building
a region with three availability zones,
those are three carefully surveyed locations
that are intended to have very strong levels
of physical fault isolation.
They're on all sorts of high levels of design separation for things like power and exposure
to various forms of disasters.
And there's some distance apart and things like that.
And so it is intended to be a building block for distributed systems because it's intended to be such that it's very, very unlikely that you would ever lose more than one of those short of a really severe regional event.
And so when we started to look at this, like ask from customers to build a quicker S3, right, to really bring the latency down. And, you know, to give you context, with the regional
S3 for object storage, you're in the like mid tens of milliseconds, you know, like, it kind of
depends on the workload and request sizes and stuff. But you're in the like, you know, 30 to 50
ish milliseconds roundtrip for at least first byte. And that's because it's an object API built on
top of HTTP, and there's all sorts of access control checks, and there's just like 18 years of
legacy. But it's also because that system is composed of a whole bunch of microservices,
and all of the microservices are concerned with surviving the loss of an entire building.
And so as we looked at taking the latency down, we realized that those
services internally were doing a lot of careful round trips to make sure the data was resilient
across at least three different availability zones. And so one of the other PEs and I did a
bunch of initial prototyping. We sat down. We actually carved off time every Friday for part of a year and built a prototype of Express.
And one of the initial things we realized was that we were going to have to build it
inside a single AZ if we really wanted to get down.
We were shooting to do low single digit milliseconds as an initial design goal.
And so Express actually keeps the 11 nines design. And so when we look at Express,
we're still monitoring host and drive failures. And under a steady state, we are designing such
that, you know, based on the failures of media or servers, we still are providing that really,
really high level of durability, right?
It's, you know, I don't know what the number would be
for something like a RAID 6.
It would depend on how fast you're replacing the drives and stuff,
but like we're way above that, right?
Like we're shooting for a huge amount of design resiliency.
However, we're not resilient against the loss of the AZ.
And so if there was any kind of an event that impacted the entire AZ, like many, many racks of things inside a building, then we wouldn't be resilient for that.
And so, you know, some customers really like hold a lot of S3 customers really hold the durability regionally to be kind of the
paramount thing. And so they'll keep their primary copy there and we're seeing a big pattern
where folks shift data into Express when it's active
and run workloads off of it there.
Well, I mean, how else are you going to do it? I don't know how else you do a single millisecond type of response time across multiple AZs.
You know, you have to, when a write occurs, you have to make sure it's all in all these locations, et cetera, et cetera.
It can't be done.
They're physically not possible.
Yeah, but I think the compromise is a great compromise because the types of applications we're asking to run this type of
high availability uh high performance read section off of we're stuck from a architecture perspective
right once we get to that scale and we've developed it on top of object storage and now the only way
to uh get that performance and we're thinking about tier and i've talked to
customers that's doing this where you know object was the long-term highly durable platform then
they move the high performance stuff to block or to uh foul and that would get them their
performance but now they have two different you know application development paths they have to to FAL, and that would get them their performance.
But now they have two different application development paths.
They have to break their app, their existing architecture,
to deal with FAL, which is not as friendly as Object
when it comes to development perspective.
Now customers are continuing to do the same thing they've done
from a data tiering perspective,
but now they don't have to change their app.
They have to move data, yes, but they were moving data to begin with.
Now they're moving data without needing to change the way
that they call and retrieve data.
Yeah, yeah, yeah.
I would quibble with the fact that which one is more developer-friendly,
file or object.
I mean, they both have different characteristics, obviously.
Yeah, I don't think it's a matter of which one is more friendly.
I think it's, you know what, this is what we've done.
And to change what we're doing is more painful.
Yeah, exactly.
It's totally that.
And I'm glad that your reaction is that, you know, it's a good job that we've done on the latency thing.
I think I personally and I think the team agrees that there's still room to go, right?
Like we still want to close these gaps and make it really, really straightforward to use.
And we'll just continue to simplify.
And the thing that you guys are both pointing at on even the file object thing, I'll tell you, is like this is customers want that are kind of driven by the
you know the physics of the technology and stuff so in the case of of express you see this like
one zone offering instead of a regional offering and and you know there's there's a performance
win but there's also a choice that that the customer has to make, right? That the engineer has to make when they're placing stuff versus just having it simple.
And so on the file object thing,
if you look at the services,
we're actually like actively working
to get customers out of the spot
where they ever have to make a decision across those things.
And so if you look at S3,
in the last year, we launched a new feature called MountPoint,
which is a file connector for Linux.
It's not a full POSIX file system,
but it's a very, very high throughput HDFS style semantics connector
that lets you actually mount S3 buckets as files.
And if you look at the flip side of it for much longer than the last year, FSX, which is our managed storage suite that a lot of
enterprises use as a first step into AWS, FSX Lustre has these data repositories that can be
backed on S3 and customers will hydrate those out of S3 and then run big HPC workloads against the data and then move their changes back to S3 under
the covers.
On FSx, you've got literally almost a half a dozen services, different services, depending
on what you've got.
Solaris, you've got your standard file systems, you've got our friends in NENEP, et cetera,
et cetera.
Yeah.
Yeah.
Yeah.
Yeah.
Yeah.
Yeah.
Yeah. And we Yeah. Yeah.
And we continue to grow that and we continue to work on it.
So like for folks that don't know what these are, FSX is a set of managed storage offerings, right?
They're basically you can provision up a storage target as you would assemble a storage target in the enterprise. And so there's, like you say, there's Lustre, there's ZFS, there's NetApp on tap that we've built in partnership with NetApp. There's FSx Windows Storage Server. I'm sure I'm missing
something. EFS.
Well, EFS is not actually one of the FSxs. EFS is-
It should be. Es is an elastic service but
the interesting thing with the fsxs is uh is they really meet an enterprise storage admin
on their terms right like the the you know you guys have i'm sure talked to loads of those folks
i spent loads of time talking to the difficult job
of being an enterprise storage admin.
You're always like stuck in the middle
between a lot of tough stuff.
And when we talk to those folks,
the first thing they say is like,
don't make my life harder, right?
Like don't change the stuff that I have to deal with
to move through the cloud.
And so with the FSxs, like ONTAP, for example,
they can literally stand up a NetApp virtual filer on AWS.
They can configure SnapMirror to move the data.
And now they've got an ONTAP presence inside FSX and AWS.
They have a disaster recovery site.
And we launched a big pile of huge performance improvements, like five to seven X read and write performance
increases on those ONTAP FSX offerings at reInvent.
And the thing that we're seeing those enterprise customers do is they'll use SnapMirror, they'll
replicate the data into AWS, and then they'll run jobs with thousands of lambdas against
the data into AWS, and then they'll run jobs with thousands of lambdas against the data.
And they would never provision that much compute in their data center because they don't need it all the time.
But being able to burst in and run a huge job like that is a spectacular opportunity for them to go and do that. And this is important.
I think this is, and me and Ray have had this debate, like what's the future of AI storage?
And my pushback, the answer has always been yes.
Give you kind of the example use case from when I worked in pharma and we had these sequencers.
The sequencers, these things are multi-year, multi-decade investments.
Yeah, totally.
And they save data in file.
The data scientists and the scientists working against these data sets are using file-based tools.
Absolutely. Yep. Just like when we talked about the object-to-object data tiering, it breaks the workflow when we have to give data scientists and scientists tools with one interface at one level.
And then when we need a different level of performance, we give them a different interface.
They despise that shift in workflow.
Yep. Yep. Absolutely.
And that friction is awful.
And that is a thing that we are, we're absolutely focused on with, with a lot of this stuff. And so
like in the example of like, like you say, like, you know, like a reasonably large pharma or
genomics based firm that's got like an Illumina or whatever, and they're pulling stuff off in
file and they've got, you know, got bioinformatics folks that have trained on
Linux or whatever, Unix, and they're used to file
APIs to a file-attached bit of data. They want the cost basis
and especially the performance scaling of
object, but they're like, forget it. I didn't even write
these tools. These are open
source tools that work against file. And so that, um, that community, the bioinformatics genomics
folks were one of the most vocally, um, excited when we did this, this initial, um, mount point
launch for, for S3, the file APIs, because those folks are now...
I'll come back to you with an example.
We've got a couple of really excited customers
and I'm not sure if they're public references.
No, no, no, it's fine.
It's fine.
It's certainly exciting for me.
But one thing you did say,
I just want to go back to this,
and you just kind of threw it out, Andy.
FSx on tap can be configured to support
Lambda. So I always thought Lambda was an object service. So an object would show up in a bucket
and all of a sudden you'd fire up a storage, you know, a compute engine or something like that.
But you can do this with a file? Well, inside Lambda, I mean, you launch the Lambda and it's a, it's a compute environment. And so you can do a NFS mount inside
there. And so a lot of the, a lot of the more, the more powerful patterns, I would say on Lambda
that you were talking about earlier, especially the bindings to, or maybe it was Keith that was
saying like bindings to events where, you know, you do a put into an S3 bucket for one place and that triggers a Lambda launch.
But stay tuned on that front. Like we're, you know, we're really pushing to open up a lot of
these things. I can't tell you how important this is to the enterprise middleware market. Like
we've used tools that have been extremely inconsistent at best,
that when a file is dumped to a directory, there's a process kickoff.
Right. Yeah.
It can be a million events in a day, a million files written to a directory and a workflow.
And when it fails three times, it's important.
It's a nightmare. And then you got to go and sweep and scrub and figure out all the stuff
that you missed. And it's a really awful pattern to have to do.
So the fact that I can get rid of the middleware process altogether,
use the Amazon event bus to basically trigger lambdas off of any time a file is written to a directory.
So, you know, just think about the sequencing process.
Every time a data scientist uploads or a clinic uploads a new set of sequences from their aluminum machine and it's and it reaches a directory in AWS, a Lambda is kicked off to do the conversion
and analysis needed so that the scientists, the data scientists and the scientists looking for
the end product, there's, there's no massaging of data. It's all, if there's ETL that needs to be
done, all that is already done. And it's, it's, it's true in genomics. It's true with like media transcoding.
It's true with folks working with like, uh, large medical images where they need to generate
thumbnails. Like there's this, this workflow pattern of, of events to like an ETL style
transform, um, to ingestion is, is really common. And so just to be clear, we're, we're, we're there
with, with S3, we're there with S3 and file on top of Mount Point.
And we're filling it out, right?
Like it's a thing that we're focused on.
We just want to remove the need for an engineer or a designer to have to make a choice based on file protocol or storage protocol up front.
All right.
We're getting close to the end here.
I really haven't talked at all about S3 Express, what you guys have done internally to make this happen.
You mentioned the single millisecond types of response time. It's still on disk. So there's
got to be a lot of caching going on and stuff like that. So no, we've moved to, there's a few
structural differences inside Express. So like I said, it's in a single zone.
So it removes a whole bunch of network hops.
That's a bit of physics that really speeds things up.
We've moved to higher performance media under it.
And so there's a big bump that happens there.
And then we've made some software changes inside.
The system is entirely written in Rust,
which has been a really big shift for us internally.
How do you guys like Rust?
That's a different question.
We'll do that offline.
Okay, go ahead.
I mean, we're finding a lot of success
on that side for the teams.
And we're finding that, you know,
you get code that you don't have to go back
and make as many bugging style changes to.
And then we've done some other pretty cool stuff
on the software side.
So Express introduces a new bucket type
for the first time since S3 launched.
The Express buckets are called directory buckets.
And they're designed to be higher TPS, lower latency as a metadata layer.
You can kind of think of it as the file system metadata layer inside the object store.
And then we also did a super interesting thing as part of those directory buckets where in the session protocol, right? In the connection that you open up over the network to talk to S3 Express,
we hoist a bunch of the authentication and access control checks up to connection time,
and then have a lighter weight validation of those things when you access a file or an object.
And so doing that work, it was kind of a bit of reinvention of the session protocol.
And it means that we get a whole bunch of extra sort of latency improvement in terms of object access.
So there's a bunch of super cool stuff happening.
That was great.
A stateful protocol here.
I mean, is that what you're saying?
Is that what's going on?
That's right.
There's a bit of session state on the protocol. That's interesting for S3.
So directory buckets are even faster than normal S3 express object, normal standard buckets?
It's kind of a reinvention of the namespace itself inside of S3.
I mean, there was always a flat namespace.
There is no namespace here.
Well, there's been a whole bunch of stuff that we've put in there to really focus on starting to add some structure internally and really engineer to really, really high TPS workloads.
We say TPS transaction per second, right?
Yeah, that's right.
That's right.
Huh?
So you're putting structure inside of an object store?
I mean, other than the bucket count and the bucket name and stuff.
So yeah, we're, we're, we're doing a whole bunch of pretty interesting stuff under the covers. And
I, I think we're kind of just getting started on the express stuff. So I'm, I'm pretty excited to
see where it goes. I would say that, um, what we are seeing on the express stuff is when, you know, when we first started prototyping and we were first working on it, we really thought that it was going to be, I don't want to say like niche, but we thought it was going to be this like smaller set of really active workloads. And what we're seeing is even for like bulk analytics and things like those genomics workloads
and things where I had assumed that the higher level applications were doing a really good
job of backgrounding IO, it turns out that there's actually a lot of opportunity to speed
those applications up by reducing latency on the object store.
And so what we see with Express is because of all of the changes we've made to faster media and wrapping a lot more, you know, compute around things, you see a price structure
where the capacity costs more, but request prices are a lot lower.
And so it's really intended to be,
and the data is not flowing as far, it's really intended to be a request-oriented service.
And as a result of that, what we're seeing is TCO savings on end-to-end workloads.
And so we see customers, there's one example that I talked about in the
storage talk at reInvent this year, where we took an image training benchmark that takes about
15 days to run all out on a set of P instances. And moving the data off of S3 regional from
standard into S3 express, we actually shaved a day off of the training time for that. And I don't remember
what the number is. I think we shaved, I'm not going to say, I can't remember, but we actually
got like a sizable cost savings for the workload. And that's something that we've seen.
Pinterest had an example with about a 40% cost savings. We're starting to see from Express
customers just getting pretty big wins because they're using less savings. We're starting to see from Express customers
just getting pretty big wins
because they're using less compute.
They're running their compute less long
because they're not waiting for storage
for these active ones.
Yeah, so there's kind of like this double-sided advantage.
One, you're getting the performance
and the cost savings associated with performance.
Then you're also getting
kind of this, I don't know if you've quantified it, but this additional agility on the dev
ops side of not needing to do this transformation for this tiering of storage.
Have you guys measured that cost benefit of not needing to redevelop that?
If you have ideas on how we could measure that,
I would love to bounce those around.
I think it's going to end up being a bunch of anecdotal stuff.
But we're definitely, the voice of the customer on that one so far,
and we're only, what, we launched the thing less than a month ago?
A couple months, right?
Maybe just over a month ago.
The voice of the customer has been really positive.
Folks are really talking about that ease of use and just being able to build on top of it.
So it seems to be pretty positive so far.
I think you're talking to the right people about trying to identify how to, how to talk about this.
So yeah,
I'll send a note to,
uh,
I'll send a note to you guys,
but that's,
that's really interesting.
I'm always about the,
uh,
operators experience and making and quantifying,
or at least showing the operator side,
because that's the thing that people get stuck up on and where projects fail
because of that friction between the operator or builder and the platform.
Yeah,
I agree.
All right,
gents,
this has been great.
so Keith,
any last questions for Andy?
We could go on for a couple.
You know what?
This is probably one of those both podcasts,
uh,
Ray,
where you probably shouldn't ask that question.
I have plenty of questions for Andy,
but we are,
we're at time. I submit.
Yeah. Yeah. Yeah. Yeah. Andy, anything is, uh, you'd like to say to our listening audience
before we close? Um, no, I mean, it's, it's so awesome to spend time talking to you guys and,
uh, you know, uh, I'd encourage folks to, to kick the tires on, on any of the things that we talked
about today. If you're an S3 customer, you should try Express.
If you're an enterprise customer, you
should take a look at the FSX
family of products because
we're getting a lot of really positive
reaction to those.
Well, this has been great. This has been
awesome, Andy. Thanks very much for being on
our show today. Glad to have you back.
Glad to be back.
Thanks a lot, both of you.
Yeah, yeah.
That's it for now.
That's it for now.
Bye, Andy.
Bye, Keith.
Bye, Ray.
Bye, Ray.
Until next time.
Next time, we will talk to
the system storage technology person.
Any questions you want us to ask,
please let us know.
And if you enjoy our podcast,
tell your friends about it.
Please review us on Apple Podcasts, Google Play, and Spotify, as this will help get the word out. Thank you.