The Pragmatic Engineer - How AWS S3 is built

Starting point is 00:00:00 AWS S3 is the world's largest cloud storage service. But just how big is it and how is it engineered to be as reliable as it is at such a massive scale? Milan is the VP of data and analytics at AWS and has been running S3 for 13 years. Today we discuss the sheer scale of S3 in the data stored and the number of servers it runs on, how seemingly overnight AWS went from an eventually consistent data store to a strongly consistent one and the massive engineering convexly behind this move. What is correlated failure, crash consistency, and failure allowances, and why engineers on S3 live and breed these concepts,

Starting point is 00:00:35 the importance of formal methods to ensure correctness at S3 scale, and many more. A lot of these topics are at ones that AWS engineering rarely talks about in public. I hope you enjoy these rare details shared. If you're interested in how one of the largest systems in the world is built and keeps evolving, this episode is for you. This episode is presented by Statsig, the Unified Platform for Flags, Analytics Experiments, and more. Check out the show notes to learn more about them and our other season sponsors. So, Mylan, welcome to the podcast.

Starting point is 00:01:02 Thanks for having me. To kick things off. Can you tell me the scale of S3 today? Well, if you want to take a step back and just think about S3, it is a place where you put an incredible amount of data. And so right now S3 holds over 500 trillion objects. We have hundreds of exabytes of data. And we serve hundreds of millions of transactions per second worldwide. And if you want another fund stat, we process over a quadrillion requests every single year.

Starting point is 00:01:41 And what's under the hood of all that is also pretty amazing scale. If you think about, you know, what's underneath the hood of Estuary that fundamentally we're disks and servers, which sit in racks, and those sit in buildings. And if you try to think about all of the scale of what is under the hood, we manage tens of millions of hard drives across millions of servers, and that is in 120 availability zones across 38 regions, which is pretty amazing if you think about it. So deep down, it all starts with hard drives,

Starting point is 00:02:14 sitting inside servers, sitting inside racks, and then you have a bunch of these racks, and then rolls of them, buildings of them, right? And that's what you said. So there's tens of millions of hard drives deep down in the bottom of this. That's right. In fact, if you think about the scale of this,

Starting point is 00:02:30 if you imagine stacking all of our drives, one on top of another, it would go all the way to the International Space Station and just about back. And so like that, I mean, it's kind of a fun visual to have for us who work on the service. But, you know, kind of fundamentally,

Starting point is 00:02:47 it's really hard to get your brain around the scale of S3. And so a lot of our customers, they don't. They assume the scale. scale is there. They assume that, you know, all of the drives are always there. And they just focus on what S3 is to them, which is it just works. It just works for any type of data and all of your data. Yeah, even, I mean, even for me for the scale, when you talk about exabytes, I actually had to look up exabytes because I know a petabyte, which is already massive. If a company has like one

Starting point is 00:03:15 or two or three petabytes of data, it's tons. And exabyte is it a, yes, it's a thousand petabytes is an exabyte. And you told me that you're, you're thinking. in that level, it's just hard to fathom. Yeah, we, I mean, we have individual customers that have exabytes of data. Individual customers have exabytes of data and what they call a data lake, although last week I heard a great term. We had the Sony Group CEO talk about what Sony is doing with data, and they refer to it as a data ocean, and not a data lake, but a data ocean.

Starting point is 00:03:48 And so, like, if you have exabytes of data in your data lake, it is, in fact, a data ocean, And that ocean is kind of fundamentally S3. Can you tell me how S3 started? I did some research and there was a story about a distinguished engineer sitting in a pub in Seattle. Who knows it was true or not? But I read this was a story that he was a bit frustrated with engineers at Amazon building a lot of infrastructure again and again. Yeah, if you think back into, you know, S3 development really started in 2005. And we launched as a first AWS service in 2006.

Starting point is 00:04:23 And if you think about the technical problems of 2006, you know, a lot of customers were building things like e-commerce websites, right? Like Amazon.com. And so the engineers at Amazon knew that they had a lot of data that at the time was very unstructured data. It was PDFs. It was images. It was backups. And they needed a place where they could store that at an economic price point that let them not think about the growth of storage. And so they built S3, and they really built it for a certain type of storage. And so the original design of S3 in 2006 was really anchored around eventual consistency. And the idea of eventual consistency is that when you put data in storage, for S3, you know,

Starting point is 00:05:07 we're not going to give you an ACC back on your put unless we actually have your data. So we have your data. But the eventual consistency part is that if you were to list your data, it might not show up because it's being eventually consistent. there, but it might not show up on a list. And so we did that at the time, that a consistency model at the time, we built that because, you know, we were really optimizing for things like durability and availability. And it worked like a champ for, you know, e-commerce sites and things like that because, you know, when a human was interacting with an e-commerce site and an image

Starting point is 00:05:40 happened to not show up exactly at the moment where you put the data into storage, it was okay because a human would just refresh. And so when we launched in 2000, Here's a fun fact for you. 2006 is actually when Apache Hadoop first began as a community as well. And so we had a set of what I think of as frontier data customers like Netflix and Pinterest who took a look at things like Hadoop. And they put it together with the economics and the attributes of S3, which is, you know, unlimited storage with pretty good performance and a great price point. And they decided to build their, you know, what we first began to call data lakes at the time. They decided to build to extend the idea of unstructured storage and include things like tabular data.

Starting point is 00:06:31 And so the first wave of frontier data customers were adopting, quote-unquote, data lakes in about 2013, 2015. Those were their frontier data customers born in the cloud. And around 2015 to, I would say, 2020, we started to see all the enterprises take that same data pattern of how can I use S3, the home of all the unstructured data, you know, on the planet and extend it to tabular data. And that's when about five years ago, 2020, I started to see a ton of exabytes of, you know, basically parquet files. And, you know, I have worked on S3 for a minute. I started working on S3 in early 20, I guess it was 2013. I'd been at AWS since 2010, so kind of a while. And the rise of parquet was really interesting because what people did is they said,

Starting point is 00:07:28 oh, okay, I like the traits and the attributes of S3, and I want to apply it to a table. And so I am going to run my own parquet data in S3. And then, you know, around 20, I would say 19, 2020, we started to see, basically the rise of iceberg. And Iceberg at the time, you know, is an incredibly popular OTF. And it gives the table attributes to the underlying parquet data. And customers started to do it in, you know, many of my largest data lakes across different industries and different customers. And so one of the things that we did in 2024 is we introduced S3 table. Just for those who don't know what iceberg is. So it's an open source data format for like massive analytic workloads, right?

Starting point is 00:08:13 That's right. If I ask our customers of these data oceans why they care so much about Iceberg, it's because they want to be able to have what a lot of customers are calling this decentralized analytics architecture, where, you know, they can have lines of businesses or different teams within their company that pick what type of analytics to use as long as it's iceberg compliant. And so if iceberg is the common metaphor for data, for tabular data, then you have choice. You have flexibility and choice for what type of analytics engines you use in a decentralized analytics architecture. And so I think that's one of the reasons why iceberg is just take it off, is that it makes it easy to use data at scale. But it also gives a business owner, the chief data officers or the CTOs of the world, it gives them future proofing for analytics. They can replace their analytics. They can change it out. They can adopt new types of analytics and AI because you have this iceberg. at the bottom turtle of S3. We lost S3 tables in December,

Starting point is 00:09:17 2024. This year we've had over 15 new features that we've added to S3 tables. And then this year, of course, we launched the preview of S3 vectors in July. And then last week, we were generally available. And so, you know, the story of S3, it's like a story that our customers have written for data,

Starting point is 00:09:40 but it's been super fun to work on all these different evolving attributes. As an engineer, what is the kind of basic architecture and the basic terminology I should know about when I'm starting to work with S3? When we first launched in 2006, the whole goal for S3 is to provide a very simple developer experience. And we've really tried to stick with that. In fact, when the engineers, and we're sitting around and we're talking about what do we build next, we always go back to that idea of how do you make things really simple to use us three. And so fundamentally, S3, we have a lot of different capabilities now, but it's really about the put and the get, the put of the storage in and the get of the storage out.

Starting point is 00:10:25 And where we can do that really well at scale, that is kind of the heart of S3. Now, we have a ton of extra capabilities that we've launched over time. But fundamentally, when customers think about using S3, they think about the put and the get. Yeah, so like put data, get data, and I guess some of the other like operations, it's a bit like HTTP, right? There's also delete, list, copy, a few kind of other like, I guess, primitives. There is. And, you know, if I think about where we have gone over time, we've added capabilities on top of that just based on what developers are trying to do, okay? Let's just take put. Okay?

Starting point is 00:11:04 We recently added a set of conditionals to the put capability. And like last year, we did put if absent or put if match. This year we did a copy if absent or a put if match and we did delete if match. And the core thing about for us with conditionals is that we can give developers the capabilities of doing things like the put, but to do it based on the behaviors of their application. Outside of the get-in, put the basic operations, I guess the base terminology that you should just know about is the buckets, objects, and keys, right? That's how we think about our data. Yeah, and now it's not just objects.

Starting point is 00:11:42 If you think about the two latest primitives or building blocks we've introduced as native to S3. One of them is the iceberg table with our S3 tables, and the other one is vectors. And, you know, under the hood of an S-3 table is a set of parquet file. that we're managing on your behalf, but that's not the case for vectors. A vector is just basically a long string of numbers, and that is a new data structure for us, and it's sitting in S3, just like your objects. My line was talking about the building blocks of S3,

Starting point is 00:12:16 like the put, get, tables, and vectors. Speaking of our primitives for building applications leads nicely to our season sponsor, WorkOS. WorkOS is a set of primitives to make your application enterprise ready, primitives like single sign-on, authentication, directory sync, CPU authentication and many others. One feature does not make an app enterprise ready. Rather, it's a combination of primitives altogether that solves enterprise needs.

Starting point is 00:12:40 When your product grows in scale, you can always reach for new building blocks for infrastructure from places like ADLBOS or similar. Similarly, when you need to go upmarket and sell to larger enterprises, WorkOS provides the application level building blocks that you need for this. WorkOS has seen the edge cases, the enterprise complexity, and solves this for you. So you can focus on your core product. One example of such a building block is adding authentication to your MCP server. This is a typical screen when you are about to authenticate with an MCP server.

Starting point is 00:13:09 If you would have to build it from scratch, it gets pretty complicated to set up the OAuth flows behind the scenes. But with work OS, it's a few simple steps. Add the AltKit component to your project, configure it via the UI. Then you just direct clients of your MTP server to authorize via Altkit, verify the response you get via some code. And that's pretty much it. This is the power well-built primitives. To learn more, head to workOS.com.

Starting point is 00:13:33 And with this, let's get back to S3 and how it all started. So I'd like to still go back to the beginning of S3. When it was launched, it was pretty shocking for the broader community because S3 launched with a pricing of 15 cents per gigabyte per month, which was about a third to fifth cheaper than anything else. The going rate at the time was something like 50 cents or 75 cents. And on the first day, I read that like 12,000 developers signed up immediately, a lot of companies immediately or very quickly moved over. And then the surprising thing was that S3 kept cutting prices.

Starting point is 00:14:08 It was unheard of before. You were there in the 2010s when some large price gets happened. Can you tell me what was it thinking inside the S3 team on this unusual pricing? It seemed customers would have been willing to pay more. And also the cutting of prices continuous. So even today, I think today it's something like $0.0.2.0.2. and something like that for the same storage as it was 15 cents on launch. Yeah, you know, I think part of this goes back to what the goal is for S3.

Starting point is 00:14:38 Okay. And so the mission of S3 is to provide the best storage service on the planet. Okay. And our goal, too, is that if you think about the growth of data, IDC says that data is growing at a rate of 27% year every year. But I have to tell you, we have so many customers that are growing so much faster than that. Yeah, I was about to say it sounds pretty low. I know, but that's an average across everything.

Starting point is 00:15:01 We have a lot of customers that grow twice or three times at that rate. But if you think about that, okay, you think about all the data that's being generated from sensors, from applications, from, you know, AI, from all these different. From just taking photos, I mean, every day, right? Photos, that's right. Like, you know, and, you know, if you think about your phone, too, think about the resolution and how the resolution of the cameras on the phone have grown. You just have this, like kind of what Sony talked about with the data ocean. Okay. And in order to have all that data and to grow it, you have to be able to grow it economically.

Starting point is 00:15:38 You have to be able to grow it at a price point where you don't really think, okay, what data am I going to delete now because I'm running out of space? You don't have that conversation with us three customers because of two things. One is, you know, we do lower the price of either storage or the capabilities of what we're doing, like, for example, we lowered the cost of compaction for us three tables pretty dramatically within a year after launching F3 tables. It's not just that. It's like the overall total cost of ownership of your storage. We give you the ability to tear into archive, right? Storage. We give you the ability to do something called intelligent tearing, which is if you don't touch your data

Starting point is 00:16:17 for a month, we'll give you an automatic discount on that data because we're watching your storage and you don't touch it for much. We'll give you up to 40% discount on that. that storage. And it's like dynamic discounting, so you don't even have to think about it. And so our whole goal is that you can grow the data that you need to grow because we know that's being used to pre-trained models. We know it's being used to fine-tune and do any type of post-training of AI. We know you're using it for analytics. We know you're using it for all these different things, either now and then the future. And so our goal is so that you can keep your data and you can use it in a way that advances whatever the thing is that you're doing,

Starting point is 00:16:57 whether it's life sciences or you're an enterprise, you know, in manufacturing, right? Whatever you need, the data should be there and you should be able to grow it and keep it and use it any way you want. I did want to ask you about this part. So there's intelligent tiering, which was launched in 2018, so like 12 years after S.C. was launched. One thing that I really got my attention, Amazon Glacier, which is, which launched in 2012, so a long time ago, and you can store data that you don't need immediate access to. You're okay waiting for some time to get access to it, I think, maybe even hours.

Starting point is 00:17:29 When it launched, it was only one cent per gigabyte per month, which was, again, this was back then the going rate for storage was about 15 cents, so almost 10 times is cheaper. How do you do that? Like, what is the architecture and thinking behind how you're able to have this tradeoff of, like, look, if you don't need your data quickly, we can do it a lot cheaper. How could I imagine the kind of trade-offs that you and the engineering team were thinking of making? Well, you know, I mean, as you know, you're an engineer yourself. And, you know, as you know, a lot of engineering is about constraints, right?

Starting point is 00:18:05 And that is the fun part about working on S3 is that when you think about constraints, you think about constraints that we have for availability. You think about constraints that we have around, you know, the cost of storage. we start to get really, really creative. Okay? And in S3, because, you know, we build all the way down to the metal of the drives and the capabilities that we have in our hardware, we're able to drive, you know, efficiencies at every single part of our stack, okay?

Starting point is 00:18:40 And so our engineers, when they get together and they talk about the constraints, they talk about the design goals, we'll do something like we'll set a target for, you know, the cost of a bite, and we'll drive for that. And we'll drive for it at every single part of the process. And the part of the process that we are also including is, you know, it includes a data center. How do our data center technicians be able to operate the service of S3 from a hardware and a data center perspective, like the physical buildings, just like we do the same thing for the software and the layers of S3 itself.

Starting point is 00:19:21 And when you have that, when you have that ability to run across the whole stack all the way down to the physical buildings, and we're thinking about so deeply about the cost and the lifetime of every bite, you're able to do things like Glacier. You mentioned something really interesting that when S3 started, it was eventually consistent,

Starting point is 00:19:42 which means that data eventually arrives. It might not be there and you might be behind. And there's a lot of things that you can do with this and it gives you some constraints. But you mentioned that the reason that the team launched is because durability and availability was more important. And I assume, of course, cost as well. But during those initial phases, while SC was eventually consistent,

Starting point is 00:20:05 what kind of benefits does it give to have eventual consistency? Is it a cost constraint? Is it just easier to do high? available systems from an engineering perspective? Well, I mean, from an engineering perspective, the main optimization was, it was availability. It was not necessarily durability, but it was availability. Okay. So if you take a step back and look at the original design of S3, we were really focused very

Starting point is 00:20:31 hard on availability. So let's take a step back. Okay. So when you talk about consistency, it's the property where the object retrieval, the object get reflects the most recent put to that same object. Okay. And so if you think about, you know, what parts of the system of S3 that really hits, a lot of it just kind of starts with our indexing subsystem. So if you think about the indexing subsystem in S3, that holds all of your object metadata. And so that's like its name, it's tags, its creation time. And the index,

Starting point is 00:21:05 our index is accessed on every single get or put or list or head. or delete, any API call like that. And so every single data plan request where you go back into our storage system to go get an object goes through our index. And if you think about it, more requests go through our index and our storage system because, for example, it's serving thing like head requests and lists requests that don't actually end up going back into our storage system at all. Those are metadata or index requests.

Starting point is 00:21:37 So, you know, if you think about our indexing system, we have a, um, a storage system in there, okay? And that is a really central concept. A storage system in the middle of our index system. So you need a storage system for your index system, right? That's right. And so we have to configure and size the system to deliver on our, you know, our design promise for both availability and durability.

Starting point is 00:22:05 Okay. And so the data is basically in our index system, is stored across a set of replicas, and it uses something called, you know, it's basically a quorum-based algorithm. Okay? And a quorum-based algorithm tends to be very forgiving to failures. And so if you think about how we implemented quorum in our index system, we start first from servers that are running in these separate availability zones.

Starting point is 00:22:30 And the reason we do that is that it lets us avoid correlation on a single fault domain. Okay? And since the failure of like a single-dilability, you know, and the failure of like a single-discope, disk, a server, a rock, a zone, it only affects a subset of data. It never affects all of the data for a single object or even a majority of the data for a single object, which we have sharded across, you know, a wide spread of servers. So like this core of availability for us is this idea that we spread everything. And so when a read comes in, it's coming into the S3 front end, and we just heavily cache

Starting point is 00:23:09 objects across their systems. When a read comes in, it could route at random, and you could create a situation where you're creating an inconsistent read. And so when we have quorum at the index storage layer, we can see reads and writes overlap,

Starting point is 00:23:26 but in the cache they don't, because we're optimizing for availability. So just so I understand, the first part, the eventual consistency, correct me if I'm wrong, that you can just write to all these distributed nodes, and you ask one of them, and if it doesn't have it, no problem because it will be eventually

Starting point is 00:23:42 consistent. You now have high availability because you don't need to worry about all of them being insane. That is correct. And that's phase one of AWS. And it gives you availability. And now you're now explaining how you're able to, behind the scenes, turn this into a strongly consistent. The strong consistency means that it's guaranteed to have the whole system's state, which is

Starting point is 00:24:06 hard to do because you could have distributed failures. et cetera. And this replicated journal, you know, it took us a while to build. I won't lie. We don't talk about this stuff very, very much, okay? Because this is kind of the secret sauce of S3. But, you know, again, like our engineers who are in the room, they were thinking about how do you deliver on both the strong consistency without compromising availability?

Starting point is 00:24:32 So I go back to constraints, okay? So in that case, we were not trading off the, um, consistency. and availability anymore. And so the engineers had to come up with a new data structure, basically. We do this in S3. Vectors basically is a new data structure that we came up with as well. But if you think about what we had to invent for strong consistency at S3 scale without relaxing the constraint of availability, is we had to build this replicated journal.

Starting point is 00:25:03 And the replicated journal is basically a distributed data structure where we're chaining nodes together, so that when this riot is coming into the system, it's flowing through the nodes sequentially. Okay? And so our reader write in a strongly consistent system for S3, it flows through these storage nodes in the journal sequentially, and so every node is forwarding to the next note. And when the storage roads get ridden to,

Starting point is 00:25:28 they learn the sequence number of the value along with the value itself, and therefore on a subsequent read, like through our cache, The sequence number can be retrieved and stored. And so now you have this strongly consistent and highly available capability in S3. And the heart of that is actually this replicated journey. Okay, but what's the catch? On one end, because there's always something with tradeoffs, you always have something.

Starting point is 00:25:57 So on one end, you obviously have more complicated business logic. And then I guess the second obvious question is, what about failures? Because in the case of eventual consistency, you don't worry too much about one failure. Clearly, in this case, what if a node in the sequence fails either at the first time or later? Or how does the system monitor this, recover? Because I guess that's going to be the tricky part, right? There's another piece to this puzzle that we implemented, which is, you know, it's basically a cash coherency protocol. And the idea is that this is where we built what we think of as a failure allowance, where in this mode,

Starting point is 00:26:37 we needed to retain the property that like multiple servers can receive requests and some are allowed to fail. And so it's kind of this combination of this replicated journal as a new data structure. Plus, we implemented this new cash coherency protocol that gave us a failure allowance. And those two things working in concert gave us this strong consistency. I will say too, this does come at some actual cost. I was about to say, like, nothing is free in engineering, right? There's hardware cost in this because you can imagine we've done some more engineering behind the scenes. But I remember sitting in the room with our engineers on S3, and we did a debate on this.

Starting point is 00:27:23 We debated it. We said, you know, there's costs. There's like actual costs to the underlying hardware for this. And do we pass it along to customers or not? And we made that explicit decision not to. We said. Really? Yeah, we said that when we launch this, we should launch strong consistency.

Starting point is 00:27:41 We should make it free of charge to customers, and it should just work for any requests that comes into S3. We shouldn't sort of say it's only available on this bucket type or what have you. This should be true for every request made to S3. And part of that mindset for S3 is like how can we provide these type of capabilities and how can we make it something that becomes a building block, like part of the building block of S3, and you shouldn't have to think about the cost of it. This was the very surprising thing of this launch, by the way, that suddenly AWS said like,

Starting point is 00:28:19 okay, everything is strong existent. It does not cost you more. Latency-wise, your latencies shouldn't have changed significantly. I mean, I'm sure when you roll out initially, you do your measurements, et cetera, but that was the problem. That was why I couldn't really believe it when I reread history because it typically doesn't happen. Typically, strong consistency does add latency or it increases costs if it doesn't add latency. There's always this tradeoffs. And I mean, it sounds like you either swallowed the costs or costs caught up, but it's, it's very unusual.

Starting point is 00:28:51 So if I think about that, one of the things that was also very important for us, and we haven't really talked about this as much, but it's, we think about it a lot on the S3 team is correctness. Okay. So it's one thing to say that you're strongly consistent on every request. It's another thing to know it. And so when we built this strong consistency, you know, I talked about our new caching protocol. I talked about this replicated journal as a new data structure. You know, that took a little bit of time to do and to get right.

Starting point is 00:29:22 But at S3 scale, we could not say that we were strongly consistent unless we actually knew we were strongly consistent. Okay. And so what does that mean? How do you do that at S3 scale? When everybody is using it for every last workload, in fact, one of the reasons why people use it is because our scale is such that we're decorrelating workloads and you can run absolutely anything on S3.

Starting point is 00:29:47 But how do you know? Myelon just talks about how strong consistency made it so much easier to trust S3. Trust is something that is just as important when writing code, especially when with AI we write more code than before. And this is a good time to talk about our season sponsor, Sonar. What is the impact that AI is having on developers? Let's look at some data. A new report from Sonar, the state of developers' survey report, found that 82% of developers believe they can

Starting point is 00:30:12 code faster with AI. But here's what's interesting. In this same survey, 96% of developers said they do not highly trust the accuracy of AI code. This checks out for me as well. While I write codes faster with AI agents, I don't exactly trust the code it produces. This really becomes a problem at the code review stage where all this AI generated code must be rigorously verified for security, reliability, and maintainability. Sonar Cube is precisely built to solve this code verification issue. Sonar has been a leader in the automated code analysis business for over 17 years, analyzing 750 billion lines of code daily. That's over 8 million lines of code per second. I actually first came across Sonar 13 years ago in 2013 when I was working at Microsoft

Starting point is 00:30:56 and a bunch of teams already used Sonar Cube to improve the quality. of their code. I've been a found since. Sonar provides an essential and independent verification layer. It's the automated guardrail that analyzes all code, whether it's developer or AI generated, ensuring it meets your quality and security standards before it ever reaches production.

Starting point is 00:31:13 To get started for free, head to sonar source.com slash pragmatic. And with this, let's get back to the importance of strong consistency at AWS. How do you know that you're strongly consistent? And that is why we used automated reasoning. What is automated reasoning? for those of us who are not as familiar with this,

Starting point is 00:31:30 which will be most people outside of very few domains like S3. Yeah, it's, I mean, S3 uses automated reasoning all over the place, okay? And automated reasoning is a specialized form of computer science, okay? And Gurley, if you kind of think about if computer science and math got married and had kids, right, it would be automated reasoning. Is it formal methods? It is formal methods. That's exactly way.

Starting point is 00:31:54 I mean, I studied computer science. So, yeah, that's fun. So it's actually proper formal methods that you're using. That is right. And we use formal methods in many different places in S3, but one of the first places that we adopted was for us to feel good that we actually had delivered strong consistency across every request. So what we did is we proofed it, right?

Starting point is 00:32:16 We basically built a proof for it. And then we incorporated our proof on check-ins into this index area that I talked about, right, where you have your caching and then you have your storage sub-layers of the index capabilities. And so when somebody, anybody, is working on our index subsystem now, and they're checking in code into the code paths that are being used for consistency, we are proofing through formal methods that we haven't regressed our consistency model. And can you just give us a rough idea? Because the formal methods that I have studies, they were pretty abstract, the things like designing languages, how to have like the different

Starting point is 00:32:57 operators. And of course, there are some maths involved as well. But what are they like primitives like servers, network, et cetera, and models being built, data flows? Like how, how can I imagine a simple proof of something inside S3 roughly at a really high level? Yeah. I mean, if you go back to the fundamental notion of a proof, you are proving something to be correct. Okay. And so the places that we use these proofs, we use them in consistency, where we built a proof across all the different combinatorics to make sure that the consistency model is correct. We use it in cross-region replication to prove that a replication of data from one region to another arrived. And we use it in different places within S3 to prove the correctness of API. In all of these cases, you know, we talk about durability, we talk about VAL, they'll, they.

Starting point is 00:33:53 We talk about cost, but just as strong of a principle, a design principle for us across S3 is correctness. It's a correctness of, you know, a thing, an API request, you know, and operation, as it were. And the key thing for us, too, is that you don't want to just proof it once. You want to prove it on every single check-in, and you want to prove it on every single request. so you can verify, you can validate and verify that you are doing, in fact, what you say you do. And I think for us, you know, at a certain scale, math has to save you, right? Because at a certain scale, you can't do all the combinatorics of every single edge case, but math. Math can save you and help you on this at S3 scale.

Starting point is 00:34:43 And so we use formal methods in many different places of S3. We have some research papers too. I can send you some links to some research papers. We're talking about it. Yeah, please, please do. And we will put it in the show notes below so anyone can check it out because I think it's really interesting. I feel formal methods are not really a thing in a lot of startups and even infrastructure startups yet.

Starting point is 00:35:06 But it sounds very reassuring to me to actually have an ongoing proof of that. And speaking of which, I want to ask about one thing that is related to this is durability. Amazon S3 has very, very high durability promises. I think it's 11-9s, which I had to do a double check on because in back-end systems, whenever you say 3-9s, it's like, eh, when you say 4-9s of availability,

Starting point is 00:35:32 we're not talking to availability, for-n-nines is already hard to achieve, and beyond that, it just gets very expensive. And I have never heard of 11-9s of durability. Now, this is durability and not availability. One question that I got when I, when I share this stat publicly, what one thing people were asking,

Starting point is 00:35:49 and I was also thinking, how can you prove that, not just in a formal way, but you're now storing, as you said, 500 trillion objects, which is now large enough that, just by this durability problem,

Starting point is 00:36:02 is you should be, you might be losing some of them. Do you actually, like, validated on the actual data as well, on outside of the proof? Because I assume in the proof, you will have assumptions on hardware failure rate, which might or might not be true.

Starting point is 00:36:16 So my question is that at Amazon S3 level, when you are able to look at the, are we living up to, for example, our durability promise, how do you go about that? And what are your findings? Yeah. So we just spend a lot of time talking about our index subsystem, because that is the subsystem that is related to consistency. But when you think about durability, I mean, you think about it all on different levels of the S3 stack, but we really think about it in the storage layer. And so if you think about it in the storage layer, you have this design, this promise of, you know, the design here. And underneath that is a combination of things. It's software, but it's also the physical layout of where our data is across everything that we have in S3. And, you know, one of the things that I talked about is that we have, you know, disks and servers, which sit in racks, which sit in buildings. And we have tens of millions of these hard drives. We have millions of servers. And we have a hundred, and 20 availability zones across 38 regions. Yeah, and one availability zone is like two availability zones are two physically separate locations, just to be clear. They're physically separate and sometimes they're ways away from each other. And in some of our regions, we have more than three available.

Starting point is 00:37:28 I mean, the availability zones gives us a different domain, a fault domain. If I were to think about durability, I think the most important thing for us is our auditors. So you think about a distributed system, we talked about the put and the get. We have many, many, many microservices that are all doing one or two things very well in the background. Okay. And so we have many different varieties of health checks. But we also have repair systems and we have auditor systems. And our auditor systems go and they inspect every single bite across our whole fleet.

Starting point is 00:38:04 And if there are signs that there is repair needed, you know, another repair system will come in place. And these are all, you know, in the world of distributed systems, these are all microservices working together, loosely correlated, but communicating through well-known interfaces. And so that, you know, collection of systems, which are over 200 microservices now that all sit behind one S3 regional endpoint, and a fair number of those subsystems, those microservices, are all dedicated to the notion of durability.

Starting point is 00:38:36 So they will go and check and log and report back, So do I understand correctly that in any given time frame at S3, someone or some people or some systems can actually answer the question of what is our durability the past week, month, year, and so on? Yes. Okay, great. So you can actually verify your durability promise that check if the math is mathing. Yes. And, you know, part of our design is at any given moment in this conversation that you and I have had, just today, we're having servers fail. because servers fail.

Starting point is 00:39:11 And so what we are building and what we've built in S3 is an assumption that servers fail. And so a lot of our systems are always, you know, first of all, they're, they're checking to see, you know, where any failure might hit an individual node, how does it affect a certain bite, what repair needs to automatically kick in place. And so this system is constantly moving behind the scenes, if you will, while, and that is a completely separate thing from the get and the put. The get in the put is what the customer sees. There's this whole universe under the hood of how do we manage the business of bites at scale?

Starting point is 00:39:49 I'm just thinking because for a lot of us engineers who are building like moderately sized systems, I'll say compared to S3, they can already be big, but a failure is a big deal. Like, you know, like a machine going down again. I have a small side project and my storage filled up. I started to give errors. And this is a big deal because it rarely happens to me. This is the first time it happened in three years. But I understand it in your business or when you work at S3 scale, this is just every day. And the question is not when. It's just how often, how do you deal with it? I guess it's a different world. It is a different world. And the trick is to really think about correlated failure. Okay. So if you're thinking about availability at any scale, it's the

Starting point is 00:40:33 correlated failure that'll get you. And what is a correlated failure? Okay, so that's super interesting. So if you think about what I talked about with, you know, eventual consistency, we talked about quorum, okay? And quorum is okay for one node to fail. But if all of the nodes go south, for example, and they're in the same availability zone or on the same rack, then you're really going to be messing with your availability of the underlying storage, okay? You just lost your failure allowance that I talked about with the cash because they all fail together. And so, like, a correlated failure is an incredibly important thing to think about when you're thinking about availability. And so when we're designing around correlated failures, the thing is that we have to think about is like, do we expose or how are those workloads exposed to different levels of failure? So when you upload an object to S3 with a put, we replicate that object, okay? We don't just store one copy of it. We store it many times. And that replication is important. It's important for durability.

Starting point is 00:41:36 But what's interesting about it, it's also important for availability. Because if any of those correlated failure domains fail, like if a whole AZ fails, there's still a copy somewhere else. And the data is still available somewhere, even though an availability zone has failed or a rack has failed or a server is failed or so forth. Okay? And so that idea of how do you manage and design around correlated failures with both our physical infrastructure as well as our logical infrastructure is super important for S3 for both availability and durability. We also do things like we think about something called crash consistency.

Starting point is 00:42:18 I mean, Gregi, you can tell I can go on and on about this. So you just have to stop me. No, but this is the interesting stuff. All right. So the whole idea of crash consistency is that a system, any system that you build, it should always return to a consistent state after a fail-stop failure. And if you can do things like reason about the set of states that a system can reach in the presence of failure and you just always assume the presence of failure, then you also assume

Starting point is 00:42:49 the presence of consistency and availability, then you just design all of these different microsurricular. services to all work together in an underlying capability like S3. But that's what our engineers do. They think about like crash consistency. They think about correlated failures. You know, they think about failure allowances and caches, right? And it's it's all that deep distributed system work that our engineers come in every day to work on. Can we talk about how you think about failure allowances?

Starting point is 00:43:22 Because again, there is a concept of error budget. outside in other companies as well. I feel it's a bit like loosely handled, whereas I feel this is kind of your bread and butter. So what is a failure allowance? And how do you measure it and what do you do if you overstep it or overspend it? Yeah. I mean, I think the idea of a failure allowance is want to have it. Like you have to have it. If you assume for no, you know, that you'll never have a failure, you will, you'll actually have a very bad day for your customer. And so we account for failure allowances. And but the most important thing is, Let's just talk about the failure allowance in our cash.

Starting point is 00:43:58 So how do we manage that? Well, we manage it in such a way that you'll never experience it because we size it, right? And if you're sizing the cache and you're making sure that the underlying capabilities and the hardware are always there, and we have, like I talked about those distributed subsystems, those microservices that are all in our operating under the hood, we have a ton of them that do nothing but just track metrics, right? and like, you know, the sizing of our cash is all related to the metrics and the size of our underlying system. All the metrics, yeah.

Starting point is 00:44:33 Yeah, that's right. And so one of the really big benefits of running on S3 is because our system is so huge, you have these massive, you know, layers, right? And the massive layers are all managing things like correlated failures and failure allowances. And because they are so huge at the scale of S3, any application that's sitting on top of S3 gets the benefit of it. Let's take a break a minute from S3 to talk about a one-of-a-kind event I'm organizing for the first time, the Pragmatic Summit in partnership with Statsig.

Starting point is 00:45:06 Have you ever wanted to meet standout guests from The Pragmatic Injury Podcast, plus folks from Kudegas tech companies and learn about what works and what doesn't in building software in this new age of AI. Come join me 11 February in San Francisco for a very special one-day event. The Pragmatic Summit features industry legends and past podcast, cat guests like Laura Tacho, Kent Beck, Simon Willison, Chip Huyen, Martin Fowler, and many others. We'll also have insider stories on how engineering teams like cursor, linear, Open AI, ramp, and others build cutting-edge products. We'll also have roundtables and carefully

Starting point is 00:45:38 created an audience where everyone and everyone is interesting to meet and chat with, something I'm hoping will make this event extra special. Cs are limited and you can apply to attend at pragmatics summit.com. Talks will be recorded and shared, and paid subscribers will get early access afterwards as well, and to thank you for your additional support. I hope to meet many of you there, and I am so excited about this event. And now let's jump back to S3 and the massive scale of the service. To get a sense of what the reality is like working as an engineer, an engineering leader inside an organization like this, I read a quote from a distinguished engineer Andy Warfield,

Starting point is 00:46:14 who said, I'm just quoting what he said. early in my career, I had this sort of naive view that what it meant to build large-scale commercial software, that it was basically just code. The thing I realized very quickly working on S3 was that the code was inseparable from the organizational memory and the operational practices, and you know the scale and the scale of the system. Since you've now been more than a decade in S3, how do you think of this beast, this really complex system? Hundreds of microservices, data that is hard to fathom, you know, unless you think of the hard drive, stacking all the way to the space station.

Starting point is 00:46:50 And how do an engineer's kind of wrangle this? Because it does feel a bit intimidating. I'm not going to lie. Well, I think so much of this just comes back to the culture and the commitment on the team. And, yeah, I've worked on S3 for a very long time now. And I have such deep respect for the engineering community on S3. And, you know, honestly, I mean, this is true for all of the services in our data and analytics stack, but we have engineers in S3 and they come in every single day with this deep commitment to the durability and availability and the consistent of your bite.

Starting point is 00:47:29 And so the type of conversations that we have are so interesting because we have people. And really, you know, these are people who are early out of school. There are people who've been working on S3. We have engineers who've been working on S3 for 15 years. And everything in between. The creativity and the invention of S3, like you have this tension, which is like on one side, you're like, you have to be very conservative with S3. Right. And on the other hand, like, I mean, we have this principal engineering tenant called respect what came before.

Starting point is 00:48:01 And that's an Amazon engineering tenant, which is if it has worked for many, many years, you have to respect that. But then there's also this tenant. These two tenants are a little bit in tension with each other, which is kind of what makes it so fun. Amazon engineering tenant is called Be Technically Fairless. And I believe that the S3 engineers are just amazing at this, at respecting what came before. Because if we build new capabilities in S3, we have to maintain the properties, the traits of S3, which is it just works. And you get that durability availability, et cetera. But at the same time, we have to be technically fearless because our ability to go into the world of conditionals, our ability to go into the world of, you know, native support for iceberg or for vectors, means that we are extending this foundation of storage in a way that helps customers build whatever application they need now and in the future.

Starting point is 00:48:57 And so that combination of the two things, that is sort of when I think about our S3 engineering team, I think they come in every day. and they embody that. Now, going back to the evolution of S3 from unstructured to structured data, you were mentioning how Hadoop, the data warehouse, what was a big use case where customers started to use it on top of S3

Starting point is 00:49:20 and then at S3 you noticed what a lot of customers are some of your biggest customers doing and then you kind of built it yourself with more structured data. And then S3 tables came along and then vectors. Would you mind sharing a little bit more

Starting point is 00:49:34 on how you evolve S3? Because this was another question that when I asked people about what they'd like to know about S3, one of the question was like, is it done? Is it finished or is it still evolving? Because there is this notion that S3 can store anything already, right? Like any object, any blob. What new thing is there? And yet we have a lot of new things.

Starting point is 00:49:54 Yeah. And if you kind of go back in time a little bit and you think about, you know, the rise of parquet. Okay? So the rise of parquet data in S3 started about 2020. And we started to see more and more people store their tabular data in S3. And if you think about what Iceberg provided, it provided a replacement for Hive. Okay? So if you think about Hive and Hadoop, Hive was basically giving your file system access

Starting point is 00:50:22 into S3 unstructured storage. Iceberg is giving that iceberg, that tabular access, including the, you know, the compaction and all the table maintenance that goes along with. it into your parcade data. And I actually think that the world's data for tabular data is going to live in the future in S3. And if you just think about the launch that, for example, Supabase did last week, Superbase announced that their Postgres database is now going to, is just going to do secondary rights directly into an S3 table. Just like their Postgres extension for vector is going to integrate directly with S3 vectors. And so if the world of

Starting point is 00:51:03 of data base, if the world is data as a source, if you will, goes directly into a nice S3 table, what does that mean for the world's data? Okay. So SQL, as we know, is a lingua franca of data. And the world's

Starting point is 00:51:19 LLMs have all been trained on decades of SQL. And therefore... And Python. SQL and Python and the stuff that's already out there. And so if you think about this, you know, we have many, many AWS customers who know the S3 API pretty darn well by this point. It's pretty simple API.

Starting point is 00:51:39 But now you have the ability to interact with data in S3 through SQL. And what that means is that you don't have to be, you know, somebody who's building cloud applications or no S3. You just need to know SQL. And this is with S3 tables, right? With S3 tables. And so you can just write SQL into an S3 table. And whether you're an AI agent or a human, right? you're introducing the lingua franca of data as a native property of s3 with s3 tables and i think you're just going to see that take off in the upcoming years and your latest launch is s3 vectors can you share a little bit what it takes to build a new data primitive like vectors just just behind a sees how long it takes how the seams comes together and

Starting point is 00:52:27 maybe what are some engineering challenges of launching something like this and again we're talking about vectors, right? So like you can use embeddings. Whenever you have LLMs, you create an embedding. It's a vector. You want to store that somewhere. You will need to do search on it. There's specialized vector databases. There's specialized vector additions, et cetera. So I'm assuming this is the function that S3 vector supports very nicely. Yeah. And, you know, I mean, today a lot of customers use vector databases. Just like back in the day, a lot of people put their, you know, their tabular data in just databases, okay? And they just use the structure of the database in order to, you know, take advantage of being able to query their data, but they didn't really need to use a database.

Starting point is 00:53:12 They just put it in a database. And then S3 came along. And then we introduced this way, you know, with the help of open formats like Apache Parquet and being able to store that structured data in S3. That's kind of what we're doing with vectors right now, okay? And if you think about vectors. Vectors are basically a bespoke data type. A vector at the end of the day is a very, very long list of numbers. And vectors have been around for a long time, and they've been in vector databases for a while, but they really kind of took off in people's, you know, data worlds in the last couple of years with the rise of, as you said, the embedding models. Okay? And so if you take a step back and you think about one of the great ironies of data,

Starting point is 00:53:57 It is that you have to know your data to know your data, right? You have to know what your schema is. You have to know what the data types are. You have to know where it is. And as these data lakes become data oceans, you have this situation where it gets harder and harder to know what's in your data, right? And the beautiful thing about embeddings is that embedding models will understand your data so that you don't have to understand your data.

Starting point is 00:54:28 And the format that these embedding models puts this semantic understanding of your data is, in fact, a vector. And so when we talk to customers and we, you know, they're so excited about how these embedding models are getting better and better, they want to apply more and more basically semantic understanding to their underlying data, whether it's unstructured or structured, that they have in storage. And so they kind of want to store billions of vectors. But just to say when you say they won't understand, it's correct me.

Starting point is 00:55:01 I'm right. Hypothetically, you have a bunch of text data or maybe some image data. And you're saying that a lot of people, customers, teams, they would like to write queries to say, like, hey, can you find an image that looks like a puppy? Or can you find an article that contains this or this? And embeddings are, as we know, are great for that. But then you need to actually create the embedding, build a system, etc. Right.

Starting point is 00:55:23 Yeah. And like exactly what you're saying. Like, I mean, if you think about what vectors can do, if you think about all the data that a given company has, you know, your knowledge across your business or your knowledge across your life isn't organized into rows and columns like a database. It's in PDFs. It's in your phone, right? It's in audio customer care recordings, which capture the sentiment of how a customer actually feel. about their interaction with you. It's whiteboards. By the end of this day, this whiteboard is totally filled up with ideas and it's in documents across dozens of systems. And so it's not

Starting point is 00:56:05 that you don't have data. You have tons of data. But understanding what data you have across all of those different formats is a real problem and it's one that AI models can help you with. And so the capabilities of those AI models have gotten so much better in the last 18 to 24 months. But we needed a place to put billions of vectors, billions of, you know, the semantic understanding of relationships. And that's what we built S3 for. The state-of-the-art embedding models combined with the ability to have vectors across S3 is like a really important part. And it's not a database. I mean, it's the cost structure and scale of just S3, but it's for vector storage.

Starting point is 00:56:54 And then do I understand that, did you need to build new primitives to store this, like going down to the metal, figuring out exactly where we do this, or did you build it on top of your existing, you know, like, existing primitives of all, like, blob storage, et cetera? It's actually a new primitive. And so, you know, we had talked about S3 tables. S3 tables is building on objects because those individual. parquet files, at the end of the day, they're an object. Vector is totally different.

Starting point is 00:57:21 So with vector, we built a new data structure, a new data type. And, you know, it turns out that when you're building vectors, searching for the closest vector in a very high dimensional space, which is basically vector space. Yes. It's often really hard to find the nearest neighbor. And so you basically, in a database, you have to essentially compare, every vector in a database. And that's often like super expensive. And so what we do in S3 is because

Starting point is 00:57:56 we aren't storing all of our vectors in memory, we're storing it on our fleet of S3, very large fleet, we still need to provide a super low latency. And in our launch last week, you were getting about 100 milliseconds or less for a warm query to our vector space, which is actually pretty fast. It's not database fast, but it's pretty fast. And the way that we do that is we pre-compute a bunch of, think of them as vector neighborhoods. Okay? And so it's basically a cluster, a bunch of vectors that are clustered together in similarity, like, you know, a type of dog as an example. These vector neighborhoods, if you will, they're computed ahead of time offline. They're computed ahead of time asynchronously so that when you're doing your query, it's not going to impact

Starting point is 00:58:45 your query performance. And then every time a new vector is inserted to S3, the vector gets added to one or more of these vector neighborhoods based on where it's located. And so when you are executing a query on S3 vectors, there's a much smaller search that's done to find the nearest neighborhoods. And it's just the vectors and the vector neighborhoods that are loaded from S3 into a fast memory. That's where we apply the nearest neighbor algorithm, and it can result in like really good sub 100 millisecond query times. And so, you know, if you think about the scale for S3 will give you up to 2 billion vectors per index, you think about the scale of a S3 vector bucket, which is up to 20 trillion

Starting point is 00:59:34 vectors, and you think about that combined with 100 milliseconds or less for warm query performance, that just opens up what you can do with creating a similar. understanding of your data and how you can query it. It sounds very interesting and also challenging because you had to build this for scale from day one. I guess that's one of the benefits and curses of working at S3, that everything that you launch, you need to prepare for what will be extreme data elsewhere, but here it's just Monday.

Starting point is 01:00:02 We have S3 service tenants as well. And one of the tenants and one phrase that I use all the time and our engineers do too is scale is to your advantage. So if you are an engineer and you think about that, and you think about one of your tenants for anything you build is that scale must be to your advantage, it just changes how you design. It means that you can't actually build something where the bigger you get, the worse your performance gets or the worst some attribute gets. It has to be constructed so that the bigger you get, the better your performance gets. The bigger S3 gets, the more decorrelated the workloads are that run an S3. That is a great example of scale is to your advantage.

Starting point is 01:00:51 And so when we built vectors, just like we build everything in S3, we ask ourselves, how can we build this such that scale is to our advantage? How can we build this such that 100 milliseconds or less is just the start of the performance that we're going after? And how can we make sure that the more vectors we have in storage, the better the traits of S3 for vector. I have a different question about the limitations of S3. I read that the largest object you can store in S3 is 50 terabytes.

Starting point is 01:01:22 Why is there a limit on the largest object? I mean, I think we can imagine this will be through either multiple hard drives and so on, but why did you decide to have a limit? I'm just interested more in the thought process of how the team comes up with like, okay, this will be the limit and this is why. I mean, I think, first of all, that limit of 50 terabytes is 10 times greater than what we launched with. We launched with 5 terabytes, and now we're 50 terabytes. And sometimes we sit and tell customers that and they go, what am I going to store that's going to be 50 terabytes?

Starting point is 01:01:55 And we're like high resolution video. Right? And so, you know, if I think about. A known customer. Right. And so if you think about this sort of thing, you know, like if you think about, I don't know, size. size limits, generally speaking, we do try to optimize for certain patterns. And when you raise the size of an object by 10 times like we did, we're just optimizing for the performance and scale

Starting point is 01:02:22 of the underlying systems. It's like we increase the scale of our batch operations by 10 times last week, too. And the idea behind that is that the underlying systems, we're just optimizing for distributions of work that are the new norm for how people are doing things. And we'll just keep on changing. We don't have too many limits, to be honest, but we'll just keep on, you know, looking at what customers are doing across the distribution of workloads and seeing if there's something that that needs to be changed. The big thing for us, you know, again, we did have a lot of conversations with customers and they're like, really? Like, I don't have that many individual objects that are that big. But with the increase of, you know, cameras and phones and things like that,

Starting point is 01:03:09 we are seeing more and larger size objects. And we just wanted them to be able to grow unfettered in S3. And so how does S3 evolve and how is the roadmap change? Because so far what I picked up is everything that you told me is saying, well, you know, our customers were doing this or that. And you obviously hear you live in breed data. So you see the patterns, you see stats, you. You see stats. You see the objects. You also talk with them. Is it only you talking with customers seeing what's happening, what they're struggling with, what they're using more of, and then deciding to improve that, may that be the limits,

Starting point is 01:03:45 may that be figuring out we need a new data type because they're now building their own data types on top of it? Or is there also some kind of more kind of, all right, here's a vision, here's a roadmap of what we'll do? It's a great question. And in fact, one of the things that we talk about, all of the. the time is the coherency of S3, right? And so there are certain things that people always expect from S3. It's the traits of S3. It's the durability, availability, attributes that we talked about. And so a fair amount of engineering goes on under the hood for that. Okay? And it's a set of capabilities that, you know, we may or may not have talked about today. In fact, if you think

Starting point is 01:04:27 about, I think back to 2020, I think we've launched over a thousand new capabilities since 2020 in S3. And some of them are what we think of as the 90% of the roadmap, which is what people ask for explicitly. Okay. And so, for example, some of our media customers want the bigger object size. And so we delivered that. We have other customers that do a lot with batch operations. But then we have some things that we invent because, you know, we look at what customers are doing with the data, and we ask ourselves, how can we build that? Vector kind of falls into that category. For Vector, when we looked at S3 and how S3 is evolving, we told ourselves, like, look,

Starting point is 01:05:10 you know, we can continue to make S3 the best repository for data on the planet, and we will. We will. We have engineers that come in every day working to make that so. But there's this other element of how do you make sure that the data that you have is, in fact, usable. And how do you make sure that it's usable in a way that's, you know, industry standard, like that iceberg layer on top of our tabular data? But it's usable because AI models have now gotten so good at embeddings that you can have AI give you a semantic understanding of your data if only you had the cost point of putting billions of vectors into storage so you could actually

Starting point is 01:05:52 understand and use your data in a different way. And so for us, a lot of it is kind of taking a step back and looking not just at what customers ask us for, but we want to remove the constraint of the cost of data, which is what we do in S3. And we want to remove the constraint of working with your data, which is what we do in S3 too. And when we can do both of those things, if we can make it possible that your data grows as your business needs it, and you can tap into all the capabilities that you're getting with AI and how the world is changing for data, then then we have a shape. We call it a product shape. Then we have a product shape. Product shape. What's a product shape? It's sort of like an emerging, like when I think about

Starting point is 01:06:39 S3, I think of it is almost like this living, breathing organism where the shape of the product is evolving, but it's evolving with coherency around what you expect for the traits of S3, but it's evolving in a way that lets you steer into how you want to use data. And how do you want to use data not just now, but in the future? And we will continue to evolve the product shape of S3 based on what you want to do with data. And so in a lot of ways, we're sort of transcending the boundaries of what object storage was, or what a database, traditionally was because now we have tabular formats, we have conditionals. And we're evolving into this new shape. And it is ultimately uniquely S3. It kind of sounds like you have all these

Starting point is 01:07:28 microservices. It's kind of evolving, almost like a plant or a living organism, though? Yes. I am, in fact, a former Peace Corps volunteer from forestry. And so, you know, a lot of times I will go back to the natural world for my, my metaphors. And, yeah, I mean, S3 is this living, breathing, repository of data that lets people do things with data that they never thought possible. It's just interesting because I think as engineers, we don't often think to relate the systems that we build with like a living organization. When in fact, I mean, obviously there's code, but as you said, there's people, there's

Starting point is 01:08:05 servers, there's failures that now happen at a cadence. You can humble is just, you can probably. predict how many hard drives are failing today, in fact, at your scale already, which, again, maybe is, do you think it's because of the scale? When things become large enough, they start to have these characteristics because what I find fascinating talking to you is the way engineering works inside of S3 feels very different to how it works inside a smaller organization, your kind of startup, which again, does, you know, like terabytes of data or maybe even a few petabytes, but that's kind of it. And you've seen some of these organizations. What changes at this large scale?

Starting point is 01:08:44 What do you think that makes it, it feels pretty different, the world that you and the S3 teams work in? It does, but, you know, in order for us to sustain the traits of S3 and to evolve it over time, we have to constantly go back to simplification. We have a very complex system with all of our different microservices, but I kind of go back to those microservices have to do one or two things really well. And we have to stay true to that. Otherwise, you know, the complexification of a distributed system, you know, it's unmaintainable over time. And for S3, this concept of, okay, there's a simple in S3. And the simple in S3 is a couple of things. One, it's a simplicity of the user model, where not only do you have a simple API, but now you have the simplicity of using SQL

Starting point is 01:09:37 with S3, or you have the simplicity of being able to leverage these AI embedding models, which makes semantic understanding of your data so much easier than having to annotate, you know, whole metadata layer. And so that concept of simplicity is in the user model of S3, but under the hood, if you are said on any of our engineering meanings, you will hear our engineers talk about how do we make sure that we implement this capability with the greatest simplicity that we possibly can. I'm thinking of which, what type of engineers do you typically hire to work at S3 in terms of what kind of traits, potentially past experience do you look for? Well, we hire all kinds of engineers. You know, we have a lot of engineers on S3 who are early career.

Starting point is 01:10:25 They're straight out of school or they're at a, you know, undergrad or graduate school. And like I said, we have like a ton of engineers who have been on S3 for a long time and everything in between. I think there's a really strong element in our teams that work on data around ownership. It's, you know, people feel this like personal sense of commitment. I feel it. I feel it every day I come in. Where I feel a personal sense of commitment to your bite, to the preservation of your bite, to the usefulness of your bite, to the ability for you to think about what your application does next and not the types of storage that you need or how you grow it.

Starting point is 01:11:10 And that deep sense of ownership and that deep sense of commitment is a very, very common thread across our data teams because we know that at the end of the day, every modern business is a data business and everything that people are trying to do with traditional systems, AI, whatever, is based on your data as shaping the core of your application experience. And so that data is our responsibility and we feel it very deeply. And what would your advice be to, let's say, mid-career software engineer, someone who has a few years of experience working at different places, who would, who is actually, after listening to this, gets really enthusiastic and decides like one day I'd love to work on a deep, strong infrastructure team like S3, or like, let's say like more experienced folks. What are experiences, activities that you might look for that, that might help you consider

Starting point is 01:12:08 these folks more? There's a strong value in relentless curiosity. Okay. And, you know, I talked a little bit about coloring within the lines and how when you work on S3 or a large-scale descriptive system, which continues to reinvent what storage means, you're not really coloring within the lines. you're just kind of looking, you're taking a step back and you're saying, you know, I will draw what the lines are today and I will know that I might have to rub those out and draw new lines in the future for wherever things go. And so, you know, I have three kids who are in university. I have two kids in university and one in grad school.

Starting point is 01:12:45 And that is one thing that I, you know, I think is really important is to always take a step back, take a look at the latest research. And some of the papers that I'll share with you are around how we, you know, we either took formal methods and we brought them into storage systems, right? Or we thought about failure in a different way where that creativity, that relentless curiosity and that creativity with engineering, I don't think you can go wrong with that. I think the next generation of software, no matter if it's built in S3 or elsewhere, it is all driven by the creativity of the engineering. mind. And it is in all of us. We just have to kind of unlock it and unleash it and we will build amazing things like S3. And I also love that with S3, not only has S3 created something that did not exist. And I think it just was unimaginable because it didn't exist. But now I'm hearing startups that are building on top of S3. I think turbopuffer is a good example. You know,

Starting point is 01:13:47 they're building innovation because now they have a base layer. And I feel there's different levels of innovation, you decide where you want to innovate at the very lowest level, one level higher, and so on. And you just use the right primitive, right? In your case, this is just doing hardware and storage better than anyone. In the other layers, it will be using the right primitives better than anyone. Yeah, it's very exciting for us to see so many different types of infrastructure built on Astero. And as closing, what is a book or a paper that you would recommend a reading that you enjoyed? And why? I read a lot of different papers. I am fascinated by how quickly the evolution of embedding models are coming along now. And in particular, a field of science that I'm quite interested in is the multimodal embedding model, because as you know, the world that we experience is multimodal. And therefore, the understanding that we have of data should be multimodal as well. And so there's this whole field of science that's emerging quite rapidly around multimodal embedding models.

Starting point is 01:14:51 And so that is something that I encourage people who are working in the field of data to look at, because I think that is the next generation of data. If you think about, you know, the next world of data lakes, I think it's actually going to be on metadata. It's going to be on the semantic understanding of our data. And understanding how that is created through vectors and how it's being searched and done across multiple modalities, I think, is, is an important area of both research and advancement.

Starting point is 01:15:24 And so that's what I would encourage people to look at in the world of data. I think vector is going to be quite big, particularly at the price point that we've introduced for S3 storage for vectors. And I'm excited about it. I think, you know, I think we're just getting started with data and an understanding of our data. And I can't wait to see what comes next. Amazing.

Starting point is 01:15:45 And do you have any book recommendations? I will give you a book recommendation. just in case your readers are interested, it won't be in the field of computer science. It will be about the evolution of the ecology around us and supporting the bees, the native bees and insects around us. So a tiny bit farther afield, but I'll give you a book recommendation. And if your readers are interested, they can take a look at how to support the bees of the planet. Well, Myline, thank you very much. This was fascinating and very interesting to get a peek into this massive world of scale of data

Starting point is 01:16:24 and respecting the bite and treating it and making sure that it's durable. It was great talking to you. And thank you to both yourself. I know you're a fan of Esri. And to all of your listeners who use S3, we quite literally wouldn't be able to do what we do without the feedback and the encouragement from everybody who uses S3 today. So thank you for that. Just wow.

Starting point is 01:16:49 I always suspect that there's a lot of complexity behind a system like S3, but I just did not realize the scale of it. Whenever I worked on systems with even hundreds of virtual machines, failure of one machine was a rare event and not something that we really counted on. During my conversation with my launch, she casually mentioned that several machines have failed during our conversation, which is something that the S stream knows and prepares for and treats it like an everyday event.

Starting point is 01:17:13 I personally really liked how AWS has two conflicting tenants heavily used on the S3 team, respect what came before, and technically fearless. For such a massive system, it will be easy to say, let's move conservatively because of how many companies depend on us, but if they did so, S3 would fall behind. Finally, I'm still in awe that AWS put strong consistency in place rolled it out to all customers, and did not increase pricing, nor did they increase latency.

Starting point is 01:17:39 At S3 scale, this is an absolutely next level enduring achievement. In fact, it was probably one of the lesser-known enduring feats of the decade. I hope you found the episode as fascinating as I did. If you'd like to learn more about Amazon and AWS, check out the exclusive deep dive I did with AWS's incident management team on how they handle outages in the show notes below. In The Pragmatic Engineer, I also did other deep dives about Amazon and AWS. They are also linked in the show notes.

Starting point is 01:18:04 If you enjoy this podcast, please do subscribe on your favorite podcast platform and on YouTube. A special thank you if you also leave a rating on the show. Thanks and see you in the next one.

The Pragmatic Engineer - How AWS S3 is built

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.