Software Huddle - NoSQL Transactions in DynamoDB with Akshat Vig & Somu Perianayagam from AWS

Episode Date: September 5, 2023

Amazon's DynamoDB serves some of the highest workloads on the planet with predictable, single-digit millisecond latency regardless of data size or concurrent operations. Like many NoSQL databases, Dyn...amoDB did not offer support for transactions at first but added support for ACID transactions in 2018. Akshat Vig and Somu Perianayagam are two Senior Principal Engineers on the DynamoDB team and are here to talk about the team's Usenix research paper describing how they implemented support for transactions while maintaining the core performance characteristics of DynamoDB. In this show, we talk about DynamoDB transaction internals, performing user research to focus on core user needs, and staying on top of cutting-edge research as a Principal Engineer.

Transcript
Discussion (0)
Starting point is 00:00:00 One thing you need to learn, and this will go throughout your career, never trust anyone in the distributed system. That's the default rule. But I think a key point which Dynamo was emphasizing on and we wanted to do is that we want to build a protocol which is scalable and predictable. And what is the interface you want to provide to customers? Generally, transactions are considered at odds with scalability. One of the things we really actually considered and debated a lot was multi-version concurrency control. But supporting multi-version concurrency control in Dynamo would actually mean we have to change the storage engine. Hey folks, this is Alex Dabri and
Starting point is 00:00:37 I just love today's episode. You know, I'm a huge DynamoDB fan and today we have Akshat and Somu. They are two of the senior principal engineers on the DynamoDB team. I have huge respect for them. They've both been there, you know, before DynamoDB was released. So it was a great conversation with them. The DynamoDB team has written some really great papers, one each in the last two years, just talking about some of the infrastructure behind Dynamo. And the one this year was about distributed transactions at scale in DynamoDB. So we talk about that paper here.
Starting point is 00:01:04 We talk about database internals. If you like to nerd out about this stuff, I think this is a really good talk. One thing I always love about these Amazon papers, especially the DynamoDB team, is just how well they talk about thinking about user needs and what users actually want. How can we simplify this down
Starting point is 00:01:18 and what's the technical implementation to make that happen for them? One thing after we got off the call, Akshan and Sunwe, they wanted to say, hey, make sure you shout out the other people on those papers. So thanks to all the other authors on that paper.
Starting point is 00:01:29 They especially called out Doug Terry, who helped with the paper, with their talks and presentations. And I think just with the ideation and implementation of transactions in DynamoDB. So if you like the show,
Starting point is 00:01:40 you know, make sure you like, subscribe, give us a review, whatever. Also feel free to reach out with suggestions, guests, anything like that. And with that, let's get to the show. Akshat Somu, welcome to the show. Thanks, Alex. Thanks, Alex.
Starting point is 00:01:51 Thanks for having me. Thanks for that. Yeah, absolutely. So yeah, you two are both senior principal engineers on the DynamoDB team at AWS, which is a pretty high position. Can you give everyone a little background on what it is you do on the Dynamo team, how long you've been there, things like that? Yeah, I can go first.
Starting point is 00:02:09 So I joined Amazon, I think 2010. And from there, I first was working in Amazon India. And then when I saw AWS getting built, I was like, hey, I want to work here because, you know, the problems are super fun. So I joined first SimpleDB team. And at the same time, DynamoDB was incepted. So I've been with DynamoDB right from its inception and have been able to contribute a lot of bugs and a lot of features
Starting point is 00:02:36 to DynamoDB over the years, like DynamoDB streams, point-in-time backup restore, transactions, global databases. And we're going to talk about transactions today. So like Akshat, I've been with Amazon for about 12 years now. I started in Dynamo, and I've been working in Dynamo. I've worked in all components of Dynamo, front and back and control plane. But my areas of focus right now are replication services, transactions.
Starting point is 00:03:10 So replication services is global secondary indexes, global tables, what we're doing for regional table replication and how we make it highly available. So much of my focus has been around this stuff, but around all the multi-region services we have as well at this point in time. Awesome. Great. Well, thanks for coming on because like, you know, I'm obviously a huge DynamoDB fan and big fans of you two. I'm excited to talk about
Starting point is 00:03:40 your new paper. You know, there's a really good history of papers sort of in this area, right? Like the original Amazon Dynamo paper, not DynamoDB, you know, in, in 06 or so really kicked off a lot in the NoSQL world. Last year, the, the Amazon DynamoDB paper that basically said, Hey, here's what we took some of those learnings, made it into this cloud service and what we learned and what we built with DynamoDB. And now this year, the, this new transactions paper that came out, which is distributed transactions at scale on DynamoDB, if people want to go look that up, just showing how you added that on
Starting point is 00:04:12 and how transactions can work at scale. So I'm excited to go deep on that today. Maybe just to get started, Akshat, do you want to tell us, like, what are transactions? And, you know, especially why are they, what are the uniqueness of transactions in NoSQL databases? Yeah, so I think if you look at NoSQL databases, a lot of NoSQL databases either do not support transactions
Starting point is 00:04:37 because NoSQL databases, they are, you know, generally the key characteristics that are considered good or that the reason people choose them is high availability, high scalability, and single digit millisecond performance. DynamoDB provides all three. So specifically, generally transactions are considered at odds with scalability. And scalability here I refer as two things one is predictable performance and second is unbounded growth like your table can be really tiny in the beginning and as you do more traffic it can scale it can partition so mostly i think previously we have seen like a lot of
Starting point is 00:05:18 no sql databases they shy away from implementing transactions so or some do implement but they implement it in a form which is like constraint where you can do transaction on a single partition all the items that that reside you know at a single machine so when we started hearing from our customers that hey we would like to have transactions in DynamoDB. So you're like, okay, first, let's just understand why do you actually need it? Because we have seen a lot of workloads that are running on DynamoDB without actual transactions. So what exactly are you looking for in transactions?
Starting point is 00:06:01 So I think we went through that journey and took the challenge that, hey, we really want to add transactions which provide the asset properties, atomicity, consistency, durability, and isolation for multi-item and multi-table writes
Starting point is 00:06:18 that you want to do, reads or writes that you want to do on your database table in DynamoDB or across tables in DynamoDB. And that's how we started. Absolutely. And so transactions were released at reInvent 2018. So this is six and a half years after Dynamo's been out. I guess how soon after Dynamo being out, were you starting to get requests for transactions? How long did that sort of user research period last?
Starting point is 00:06:46 Like you're saying, like, what do you need these for? What sort of constraints do you have here? Yeah, so before we actually added transactions, I think there was a transactions library that was built by Amazon, like one of the developers in our team, David Janicek. He built a transactions library that was essentially doing trying to provide the same experience of like asset properties
Starting point is 00:07:11 on your database so this was I don't remember exactly but this was I think 2016-ish time 2014-ish 2015-ish I think something around those times but I think the pattern that we were seeing
Starting point is 00:07:26 was a lot of, for example, control planes that are getting built or a lot of teams in Amazon who are using DynamoDB. And at that time, there was also a push that, hey, we want to move all the workloads to DynamoDB and get away from the relational databases
Starting point is 00:07:43 that we have seen have like scaling limitations so transactions became like really important for making that transfer from SQL databases to NoSQL databases and at that point like transactions library was one thing which we saw that okay the adoption of transactions library is increasing so that was one thing which we saw that, okay, the adoption of transactions library is increasing. So that was one signal. And second is people started telling us about, hey, the transactions library is great, but there are certain limitations that we are seeing with that, which is for every write, we have like 7x cost we have to pay because transactions library essentially was trying to maintain a ledger and the whole state machine of where the transaction is how far it has gone forward and
Starting point is 00:08:30 in case the transaction is not going to finish it has to do the rollbacks and things of that nature so all the complexity was actually encapsulated in this library as an abstraction given to the customers so overall i would say the signal of people adopting that library a lot more and direct conversations with the customers hearing about it, this is a specific use case we're building. And it would really simplify
Starting point is 00:08:55 if there was like acid properties, like full atomic transactions across multiple tables and multiple items in DynamoDB. Yeah. It's interesting to see that trade-off between the client-side solutions, like the transactions library or a few other ones that Dynamo provides,
Starting point is 00:09:13 and then the actual service solutions. Given that Dynamo sort of gives you all the low-level access to most of the stuff, you can perform that or be kind of like a query planner or a transaction coordinator client-side if you want. But then it's nice when that can move up into that server layer. I guess once you decided, hey, we're building transactions, how long does it take? You know, you already had tens of thousands of users, you know, probably millions of requests per second,
Starting point is 00:09:41 things like that. How long did that take to build and deliver that feature where it's available, you know, at reInvent in that November? I think once we kind of decided that we wanted to build transactions, we had a bunch of people go and figure out like, hey, what are the algorithms which is doable? So going back to your initial question of like, NoSQL databases usually shy away from
Starting point is 00:10:05 transactions, it's the scale and complexity of it, right? Like there are different algorithms you can do or implement. And then you have, once your transaction fails, how do you kind of recover transactions? And a lot that has to go into like, what is the algorithm you're going to choose and build? But I think a key point which Dynamo was emphasizing on and we wanted to do is that we want to build a protocol which is scalable and predictable and what is the interface we want to provide to customers because traditional transactions
Starting point is 00:10:31 have been like, hey, begin transaction, end transaction. And a lot of customers are used to that. Right? But that would take away a key tenet of Dynamo, which is like predictable performance because you now don't know how long your transaction is going to be, right? So how do we kind of balance
Starting point is 00:10:47 that trade-off? How do we kind of expose this to customers? What are the protocols we're going to choose? I think we spent a lot of time on that. And then when we were closer
Starting point is 00:10:54 to knowing what the protocol was and what the APIs were, I think it was roughly about a year, I would say, that it took us to kind of... Yeah, and I think a lot of time, I would say, goes into, kind of yeah and i think a lot of time i would say goes into as sumo saying a lot of time goes into like understanding state of the art like
Starting point is 00:11:11 what already exists and then doing trade-offs pocs to actually like actually you know figure out what how much time it took us to uh decide this is the right one because you know there is like dynamo as i was saying right acid like atomicity for single item was already there consistency like you have consistent reads and eventually consistent reads and you know when you do a right you preserve the correct state so consistency you you already get that isolation i think was the main thing and atomicity across multiple items was the was the thing that we wanted to add. So I think a lot of time, I would say,
Starting point is 00:11:48 goes into two phases. One is just figuring out what to do. And once you figure out, building, I think, is the fastest. That last part is actually proving what we have built is correct. So, yeah. You talked about different constraints
Starting point is 00:12:00 that different ones have on it. You talked about, you know, some of these only implemented on a single char or node or partition, whatever that is. I assume that wasn't really feasible for Dynamo just because that's sort of invisible to you and because those partitions are so small.
Starting point is 00:12:13 But that other constraint of, hey, it has to come in as a single request and all get executed together, as someone mentioned, like, was that something you narrowed in on pretty early of like, hey is this is what we're going to do and where you check with users like is that going to be okay will that still give you what you want or or is that something that came you know took a while to hash out and figure
Starting point is 00:12:33 out yeah so i think for for like that specific journey if i if i recall i think we did like a lot of i would say experiments and research on that. And it involved trying out like some of the workloads. So we actually went and talked to customers to understand, hey, why do they use this concept of begin and end transaction? And specifically, I think the reason we chose one of the biggest reason is that, you know, if you let someone do like begin transaction and then send a bunch of writes and reads and also like other operations maybe someone puts a sleep there so the resources are tied up for that long for that particular transaction and then when the resources are tied
Starting point is 00:13:18 up you also don't get predictable performance so i think a lot of these decisions went into defining the tenets for what transactions should look like. So we essentially defined goals for it that we want to execute a set of operations atomically and serializably for any items in any tables with predictable performance and also no impact to non-transactional workloads.
Starting point is 00:13:42 So a lot of like techniques, standard techniques, like two-phase locking and, you know, the begin and end transaction approach, like a lot of those just like did not make sense for us. And even, I think, for example, one of the things we really actually considered and debated a lot was multi-version concurrency control. If we could build something on that, you know, you get like read isolation. So your reads could be isolated from writes. But supporting multi-version concurrency control in Dynamo would actually mean we have to change the storage engine. And if you build MVCC, you need to track multiple versions,
Starting point is 00:14:24 which means the additional cost that comes with it of storing multiple items, then you have to pass that cost to the customers. So, you know, that particular also, we had to, all these basically standard approaches, we had to reject. And then we nailed it down to, okay, we want to do like a single request transactions
Starting point is 00:14:41 based on these goals or tenets that we have defined. So then we went to some teams in amazon.com and said that hey if we provided these two apis would this would you be able to convert your like existing transactional workloads into like a dynamo db transaction and we did similar exercise with some external customers as well to validate what we are building has you know that does not have like obvious adoption blockers and things of that nature and turns out all the use cases that we actually discussed with the customers they were you know able they were they were we were able to convert them into the two apis that we added two operations in the dynamo db apis that we added one is transact write items and second is transact get items.
Starting point is 00:15:26 And just to explain transact write items and transact get items a little bit, essentially with transact write items, you can do a bunch of writes, which could be update, delete, or put request. And you can also specify conditions. The conditions could be on these items which you're trying to update, the DynamoDB standard of like OCC, right, that you do. Or you can also do a check item, which is not an item that you're updating on a transaction. And similarly for transact gets a separate API, where you can do multiple gets in the same call, which you want to read in a serializable banner. Yep, absolutely. Yeah, I love that single request model.
Starting point is 00:16:07 And I think you're right that like almost anything that can be modeled into it and the ones that can't are probably the ones your DBA is going to advise you against doing anyway on your relational database, like where you're holding that transaction open for a while and maybe calling some other API or something like those can really get you into trouble.
Starting point is 00:16:23 You mentioned sort of like, hey, what's the state of the art in terms of protocols and patterns and things like that? Like, where do you go look for research on transaction protocols or just different things that's happening? Is that academia? Is that industry papers? Or where are you finding that stuff? Oh, right. I think there's been a lot of good work in academia
Starting point is 00:16:42 starting from the 60s about transactions. It is very interesting because the inspiration we took was from one of the papers published by Phil Bernstein. And this was in the 1970s when most of us were not even born. Right. So I think academia has a lot of the good research. And then there's been a lot of good research in the industry as well now, like there's been, industry's been doing a lot of research
Starting point is 00:17:09 and we've been publishing recently as well, so industry's also been doing a lot of research. So we look back at a lot of the papers which are published in standard computer science conferences like Usenet, Sigmar, OSDI, and then learn from what has worked in the past
Starting point is 00:17:27 and what has not worked in the past and what will work for us technically, right? Like in case of transactions, the timestamp ordering, why does it work for us? We will definitely go into details. And there's an element of that as well here as like what makes sense for us. Yeah.
Starting point is 00:17:44 What does that look like at amazon like is it mostly just informal like hey did you see this new paper or are there like you know scheduled reading groups or or different things like that to make sure everyone's up on the latest stuff what does that we have scheduled reading groups because we have people of varied interest and we want to kind of learn a lot about what's happening what's not happening and we may not get to do that in a on a day-to-day in a job basis, right? So we have people who have focused reading groups who read papers all the time
Starting point is 00:18:08 and talk about like, hey, pros and cons. What did we understand? What did we not understand? What did the paper do well? What did the paper not do well? Like we had them. And we talk a lot about
Starting point is 00:18:20 how to use the different things. Like, for example, a big thing within Amazon is like how do we use formal modeling tools like TLA Plus or P modeling, right? And we have scheduled groups which kind of go dive deep
Starting point is 00:18:33 into that stuff. So there are scheduled groups for everything like data structures, algorithms, distributed systems. And I know like I've seen a lot on TLA Plus at Amazon is that something that you know
Starting point is 00:18:47 both of you are doing or is that something like hey there's a group that's really good at that or a few people that are really good at that and they'll come help you through like how often are you
Starting point is 00:18:54 actually using those those sort of methods so there there are very few people who use TLA Plus partly because it's more complex but it's very helpful
Starting point is 00:19:04 like for example with the Plus style it's made a lot life a very helpful like for example with the plus style it's made life a lot easier for you and me to go write something back in the day the TLA plus specification was harder to write but with plus style
Starting point is 00:19:13 it's very easy when they convert it to TLA plus it's easy to write the P modeling is something which we kind of have all developers now kind of use because it's closer to
Starting point is 00:19:22 the code you would write and it is easier to kind of used because it's closer to the code you would write and it is easier to kind of prototype and p model and take a model in p and then run with that stuff i think that's that's something we have asked all developers to write um tla plus has usually been like a new set of developers we use this stuff for a really um very critical set of problems like dynamo when we started when we did dynamo first we we had a tla plus model of problems like Dynamo. When we did Dynamo first, we had a DLA plus model for all of Dynamo operations to ensure that everything is correct.
Starting point is 00:19:51 And that's still the foundation for Dynamo in some ways. And same for transactions. We did a similar thing for transactions as well to prove the correctness of the algorithm. And similar to that, we actually also have like a verifier, ACID verifier, which runs in production to, you know, since like whatever time has the transactions has been launched, we still run the ACID verifier on just to, you know, make sure that we have not like any gaps that we have any blind spots or anything, things like that to, to ensure protocol
Starting point is 00:20:22 is correct. Yeah, absolutely. Okay, one more thing before we get into internals of transactions. Like, you're both senior principal engineers. You've been at Dynamo for 12 years.
Starting point is 00:20:31 Like, obviously doing a lot of higher level stuff. I'm sure writing documents, writing these papers, giving talks. But Amazon is also known for being very, like, practical hands-on
Starting point is 00:20:41 for their advanced people. Like, how much during, how much time during the week do you still sit down and write code? So I think it varies, varies on like different phases of the project. Um, like overall, I would say like in terms of if, if I look at the like full year, um, a lot of time I think is spent in figuring out like what we are doing
Starting point is 00:21:03 and how we are doing and whether it is like you know correct or not and then second phase is I think where you write like the p modeling stuff that that someone was talking about I think a lot of time gets spent in that and third is I think POCs where you come up with an idea you write a POC to prove that hey this actually makes sense or this actually whatever we are claiming is going to be what we would, is what we are claiming is actually going to be achieved. So that's one. And then third, I would say the last part is, you know, reviewing and ensuring that operationally we are ready and ensuring that the testing that we are doing, we have like good coverage. So I would say like writing code,
Starting point is 00:21:47 testing, remodeling, writing docs, it's like equal split in terms of like the time spent. And if I am working on a project, I would usually take something no other developer wants to take or non-critical because I'm not blocking them
Starting point is 00:22:04 in any way or fashion because I'm doing a bunch of other things as well simultaneously. So I think, like Akshat said, it depends on the face of the project.
Starting point is 00:22:10 If it's something which is an ideation at this point in time, we would write a bunch of code to kind of prove it works, it doesn't work. Or we're doing some
Starting point is 00:22:17 modeling stuff at this point in time, right? So that's how we can ensure that we are up to date and hands-on on the stuff as well. And the other part is also code reviews,
Starting point is 00:22:27 which still keep you very close connected. So that because operationally, I think if you're not connected operationally, it's very hard to debug things when you get paged at night at 2 a.m. Yeah, yeah, exactly. Cool. Okay, let's get into transaction internals. First thing, two-phase commit, which is the pattern you use here on the transaction coordinator.
Starting point is 00:22:51 Do you want to explain how two-phase commit works? Yeah, so before that, let's just talk through a high-level DynamoDB normal put request that comes and flows through. And then I'll add the two-phase, how we implemented that. So first, any request that like a developer or an application sends to DynamoDB, it first hits like load balancer. From there, it goes to a request router, which is like stateless fleet. The request router has to figure out where to send this request. Like if it's a put request, it sends it to a leader replica
Starting point is 00:23:26 of a partition. Now DynamoDB table is partitioned for like scale and that number of partitions are identified based on the size or the read and write capacity units that you want for your table. So you might have a table
Starting point is 00:23:41 which has like 10 partitions and this item that you're trying to put will reside in a specific partition and that partition has three replicas and one of the replica is the leader replica. So the write request goes to that leader and it replicates it to two other replicas before two other followers which once it gets acknowledgement from at least one more so two copies are durably written We acknowledge it back to the client. To find out which storage node to route the request, there is a metadata system which we use.
Starting point is 00:24:12 Now for transactions, we introduced transaction coordinator, which has the responsibility of ensuring that a particular transaction that is accepted has to go through completely. And so a request that customer makes, like a transact write item request, it goes to the request router,
Starting point is 00:24:33 goes to the transaction coordinator. First thing the transaction coordinator does, it stores it in a ledger and ledger is like a DynamoDB table and we can come back to it. But the main point of Ledger is to ensure that we can, whatever request we execute it atomically, like either the full request succeeds or it does not succeed.
Starting point is 00:24:56 And second part is fault tolerance, that if a transaction coordinator, which is processing a request crashes, since the request is stored in the Ledger, any other transaction coordinator can pick it up and run with it, right? So transaction coordinator, once it stores it in the ledger, it is kind of doing like checkpointing and state management of where the transaction is.
Starting point is 00:25:17 So once it is stored in the ledger, it sends prepare messages to all the storage nodes involved. So let's say you are doing a 10 item transaction, which are for 10 different tables, and there could be 10 completely different partitions, all in the same account.
Starting point is 00:25:36 Now, once that request is sent for prepares, at that point all the check conditions, like if you're doing an OCC write with a put item or you're purely doing just a check item. And just to interrupt you what's OCC? Yeah so optimistic concurrency control so if you want to do a write saying that hey I want this write to succeed only if certain conditions are evaluated to true if that happens then only accept this right otherwise
Starting point is 00:26:02 you know reject this particular write request that we are saying sending to you so prepare messages are evaluating that and it also evaluates if there are any any of the validations that like item size 400 kb item things like that if any of those will be will not be met then you should just reply back saying i cannot accept the transaction but assuming that every storage node all the 10 storage nodes in the 10 item transaction will not be met, then you should just reply back saying, I cannot accept the transaction. But assuming that every storage node, all the 10 storage nodes in the 10 item transaction case, reply back saying that, yeah, this particular transaction prepare,
Starting point is 00:26:34 we can accept. The transaction moves on to the commit phase. And once it has passed the prepare phase, i.e. the transaction coordinator got acknowledgement from every storage node and it is also durably written in the ledger that the transaction has finished the prepare state, it moves to the commit state, which is making sure the actual write happening at that particular point. So the item is taken from the ledger and then sent to the specific storage node to finish the transaction. And once the commits are done, your full transaction actually is finished. So at high level, that's the two-phase protocol. Gotcha. Okay. So we have prepare and commit. Prepare is
Starting point is 00:27:17 just basically checking with every node saying, hey, is this good or not? If they all come back with that accept, thumbs up, then it comes back and says, okay, go ahead and commit. And once it's in that commit phase and then tells them all to execute, is there basically like no going back? Even like say one of those no's failed originally or something happens, like we're just going to keep trying until that, like we've already decided this transaction
Starting point is 00:27:37 is going through at this point. Yes. Once the transaction has reached the commit phase, then it's executed to completion. Failure of transaction coordinator or failure of a node which is hosting the partition was not going to stop it.
Starting point is 00:27:52 It's going to kind of finish it complete to completion. If a transaction coordinator fails, another one is going to pick it up, pick it up and say, hey, the transaction is in commit phase. I'm just going to send commit messages to all the items
Starting point is 00:28:03 which are involved in the transaction, no matter whether it knows whether a single item is sent, commit has been sent or not. If a storage node fails, it's the same thing. When nodes fail all the time, a new leader is elected and the new leader can complete the commit. It doesn't need any prior knowledge
Starting point is 00:28:21 of the transaction at this point in time. Okay. So tell me about that transaction coordinator failing. How does a new one pick up that stall transaction and make sure it gets executed? So all transaction coordinators run a small component of the recovery. So they keep scanning the ledger to say, are all transactions getting executed? And if they find a transaction which is not executed for a long period of time, then they would say, this transaction is not executed at this point in time. So either we kind of take it forward. So let's say there's a
Starting point is 00:28:54 transaction in prepare state. So transaction coordinator may say, you know, this transaction has not been executed for a long time. It's in prepare state. So I don't know what happened to all the prepares. What I'm going to just do is cancel this transaction, I'm not going to execute this transaction. So I'm going to move this into a canceled phase and then send cancel notifications to all the members involved in the transaction. Or it can decide, oh, the transaction is in commit phase, let me just take it to completion and send everybody a commit message at this point in time, right. So this is a small recovery component. There's a small piece
Starting point is 00:29:28 we missed, which is like when we do the prepares for an item, every storage node has a marker saying, well, this item has been prepared for this particular transaction. And let's say that for some reason that the transaction has not been acted upon for some period
Starting point is 00:29:44 of time and the storage node looks at the item and says, hey, this item is still in prepaid state for quite some time. It can also kick off a recovery and say, hey, can you please
Starting point is 00:29:51 somebody recover this transaction, recover this item for me because it's been like a long time since the transaction started. And when you say a long time, how are we talking here? Are we talking like seconds
Starting point is 00:30:01 or like a minute or what does that look like? We're talking like five minutes, seconds at this point in time, right? Yeah we talking like seconds or like a minute or what does that look like? We're talking like five minutes, seconds at this point in time, right? Yeah, seconds, seconds, yeah. And I think
Starting point is 00:30:10 the most interesting part out of this also is there is no rollbacks. That's why there are no rollbacks here, right? Like because the prepare phase is actually not writing anything.
Starting point is 00:30:19 It's just storing that marker that Somu pointed out. And hence, if any of the prepare fails or we identify that this transaction cannot be completed, we just send cancellation, which is basically not, yeah, aborting the transaction. Gotcha.
Starting point is 00:30:36 And if anything is in the, I guess, the prepare phase where a node has accepted it and then send back accept, but maybe the transaction is stalled for whatever reason. Are rights to that item effectively blocked at that point until it's recovered? Yes. So the rights to that particular item
Starting point is 00:30:52 cannot be now serialized. So you would have to have the transaction complete to have the rights serialized. So any other singleton right would be kind of rejected. Say, hey, there's a transaction conflict at this point in time.
Starting point is 00:31:02 We can't reject it. But we can talk a little bit more about this because we did talk in the paper about some optimizations we can do there and we know that we can do this optimization. But in reality, we have not seen this happen. Customers mixing traffic of transact rights with singleton rights. So we
Starting point is 00:31:18 kind of don't see this thing much in practice to kind of go and say, we have to go and implement this optimization where we can serialize these rights. Oh, that's interesting. So most items you see are sort of either involved in transaction rights
Starting point is 00:31:31 or singleton rights, but not both. That's interesting. Which is kind of like a recommendation, I think, from Cassandra. They're like lightweight transactions because I think you can get
Starting point is 00:31:41 some bad issues there with that. But it's interesting that customer patterns sort of work out that way anyway. Yeah, and I think you can get some bad issues there with that. But it's interesting that like customer patterns sort of work out that way anyway. Yeah. And I think the part
Starting point is 00:31:49 of like if there is like a transaction stuck, as someone pointed out, if there is a write request that comes to it and the transaction has been stuck for a
Starting point is 00:31:55 while, that also will go off like, you know, recovery automatically. Yeah. Plus, I think when we devised these algorithms, we
Starting point is 00:32:03 actually thought about, you know, we want to support for like contention as well. So that's why we chose timestamp ordering and where we can do some interesting tricks, which we talked about. And we actually, you know, also tried some of those implementations before we went ahead with this approach. Yeah. Okay. And for a transaction that's stuck, like what happened to the client there? Is that just hanging until, you know, it times out at like 30 seconds, whatever the client timeout is? Or if something picks it up, is it going to be able to respond back to that client?
Starting point is 00:32:33 Or is that basically just like, hey, we'll clean it up, but the client, you know, they're short on their own at that point. So the transact write item requests, they're actually item potent. actually idempotent so if let's say like a request that took longer than the client timeout clients can just retry using the same client token which is the idempotency token and that token is used to identify you know what based that token uniquely identify the transaction and based on that we can tell you that hey this transaction actually succeeded if you come back or this transaction failed. But again, most of these transactions are we are still talking millisecond.
Starting point is 00:33:08 We're not talking seconds to finish, right? Most of the transactions are still finishing in milliseconds and getting clients are getting an acknowledgement back. Yep. Should I, you mentioned the idempotency in the client request token on a transact, right? Should I always include a client request token? Like there's no I mean
Starting point is 00:33:26 not cost but even like there's no like latency cost on that or any sort of cost is there of just including that
Starting point is 00:33:34 that's a recommendation from DynamoDB if you're if you're using transact right item request use the client token so that you can
Starting point is 00:33:40 recover really easily and retry as many as many times. There is a time limit for which this client item potency token will work because you might be trying to do a different transaction. So there is a time limit after which it won't.
Starting point is 00:33:56 And so, yeah, it is recommended to use it. So the nice thing about client request token, Alex, is that let's say your client for some reason timed out, but the request was executed successfully on Dynamo side. You can come back with the same thing in Dynamo and say, hey, this transaction was successful. You don't have to kind of execute this stuff. I think that's a super nice thing about the client request token. And also the fact that, let's say that if for some reason if for some reason
Starting point is 00:34:25 you come back and the item potency token is expired I think it was that window was 10 minutes at this point in time we would try to re-execute the transaction, right?
Starting point is 00:34:34 But most of the transactions usually have conditions in them and the conditions will fail and then we will say okay, you know this transaction has a condition failure so we won't be able to execute this stuff.
Starting point is 00:34:43 Yeah, and this client token actually was not something we initially planned to add. This was, again, when we built it, we gave it to a few customers, they tried it out, and they were like, hey, this particular use case, you know, we don't know if this transaction succeeded or failed because we timed out. So this was, like, I would say
Starting point is 00:34:57 in the later part of the project, we designed it, implemented it, and launched it. So quite a flexible and iterative process. Yeah, cool. And, okay, so you mentioned that there's like the 10 minute window where that request is sort of guaranteed to be identifiable if you're including that token. So are you just keeping records in that transaction ledger for 10 minutes, like expiring at some point, but at least they're hanging around for 10 minutes is the point there. Okay. Okay. And then you mentioned like looking for stalled transactions. Is that
Starting point is 00:35:29 just like, you're just like sort of brute force scanning the table, like taking all the transaction coordinators, each one's taken a segment and just continually running scans against it? It's a parallel scan. So the letter is a DynamoDB table. I think we talked about this before. And I think it's very heavily sharded, to put it nicely. So you can do a lot of scans on this table. And it's
Starting point is 00:35:53 a paper request table, right? So it's, and we have all the transaction coordinates. They can pick a small segment of it and say, I need to scan a thousand items. So, and they all can scan it quite quickly
Starting point is 00:36:08 and figure out any transactions that are stalled. Yeah. Okay. Tell me about that DynamoDB table that's used
Starting point is 00:36:16 for the ledger. Like, is that, is there like a different Dynamo instance somewhere that's used for these internal type
Starting point is 00:36:24 things like the ledger or like, or is it just like a loop of writing back to itself? Like, what does that, you know, different dynamo instance somewhere that's used for these internal type things like the ledger or like or or is it just like a loop of writing back to itself like what is that you know dynamo as a service right is a multi-tenant service so all these customers across a region or a lot of customers within a region are using the exact same dynamo service you know they're um like i guess like how does that sort of foundational dynamo instance work? Is that a separate instance that's sort of different and special or anything like that? No, this is a normal user-level table. Like, the transaction coordinators are just another user and there's no normal user-level table at this point in time.
Starting point is 00:36:57 As you mentioned, there is a circular dependency here, so you can't use transactions on this table, but we don't have a need to use transactions on this table, right? So this is a Dynamo-level table. I normally use a level table, so we get all the other features of Dynamo, which we can use. Wow, that's pretty amazing. Okay. Alright, you mentioned
Starting point is 00:37:18 timestamp ordering a couple times. What, I guess, what is timestamp ordering? How does it, how do you use it in transactions? Yeah, so timestamp ordering. How does it, how do you use it in transactions? Yeah. So timestamp ordering. So we talked a lot about atomicity till now, the two phase protocol, right? Like for serializability, we decided to like borrow timestamp ordering technique,
Starting point is 00:37:36 which so- And hold on serializability, this is like a confusing topic, but just like high level, we'll spend hours on that. If you could do like what no one else has managed to do and describe that in like one or two sentences, like what's the high level idea of serializability? So I think it's mainly around concurrent access. If you have like concurrent access of like data in a database, right, you need to define an order in which these transactions are executed.
Starting point is 00:38:02 So timestamp ordering has this like very nice property that if you assign a timestamp to each transaction, the timestamp basically is the clock that is being used from the transaction coordinator. The assigned timestamp defines the serial order of all the transactions that are going to execute on a set of tables that you're doing. So that basically defines the serial order of the transaction, even if you have like concurrent access from multiple users trying to do like transactions on the same set of items,
Starting point is 00:38:35 timestamp ordering give this nice property where we can serialize or define a serial order of these transactions. It's like kids are coming and asking us something, right? And then you say, hey, hold on, your brother asked me something first. I'm going to kind of execute his request first because we need one parent at this point. Right.
Starting point is 00:38:50 So that's exactly what timestamp ordering allows us to do is to have concurrency control to say, hey, which transactions get how, what is the order in which transactions will get executed. Awesome. Awesome. awesome awesome and then I love that example because that helps bring it up what again sort of like two-phase like what other options were there
Starting point is 00:39:09 in terms of ordering and serialization that were considered over or single tax two-phase locking is one
Starting point is 00:39:17 where you like lock the items on which you're executing the transaction and then you finish the transaction
Starting point is 00:39:23 then move on to the next one but locks means deadlocks. Locks means like a lot of things that you have to take care. So we didn't want that. That's why timestamp ordering, which gives you this nice property of,
Starting point is 00:39:34 like if you assign timestamp, as I said, then and then the transaction executes or appear to execute at their assigned time, serializability is achieved. And if you have like, the nice property is, if you have the timestamp assigned, you can accept like multiple transactions.
Starting point is 00:39:52 Even if let's say one transaction is prepared, I accept it on a particular storage node. If you send another transaction with a timestamp, you can like put it in the specific order and execute them because there is a timestamp associated with it. There are certain rules which you have to evaluate whether this particular second transaction
Starting point is 00:40:11 you should accept when there is already a prepared transaction or not. But yeah, that's the key thing with timestamp ordering. It's also simple in the sense that let's say that I accepted a transaction with timestamp 10 and I get a transaction with timestamp 9. I'm going to say, you know what, I accepted a transaction with timestamp 10 and I get a transaction with timestamp 9 I'm going to say you know what I accepted already something with 10 I'm not going to execute 9 anymore please go away and come back with a new timestamp right it's it's in the order it's like in anywhere else like a dmv where maybe they kind of accept 9 but you know they don't accept something which is very old still yep yep yeah i thought that was one
Starting point is 00:40:45 of the most interesting parts of the paper just talking about like the different interactions and sort of optimizations on top of that you know interacting with you know singleton operations rights or reads and how that can interact with conflicting transaction they're you know conflicting uh or conflict conflicts among transactions and things like that. I thought that was really interesting. I guess like one question I had on there, it mentions that the transaction coordinator, that's what assigns the timestamp, I believe, right? The coordinator node.
Starting point is 00:41:16 Okay, so that's using the AWS TimeSync service. So that should be within like a couple microseconds or something like that. But it also says like, hey, synchronized clocks are actually not necessary across these, and there's going to be a little bit of discrepancy. So I guess, why aren't they necessary?
Starting point is 00:41:32 And why is it useful? You know, why aren't they necessary? And then why is it helpful, I guess, to have them as synchronized as possible? I think, so for the correctness of the protocol, synchronized clocks are not necessary, right? Because the clocks just act as like a number at this point in time, just a token number.
Starting point is 00:41:47 And if there's two transaction coordinators which pick different numbers, then it gets automatically resolved and who comes out first, right? So clocks don't have to be synchronized. It's for correctness. From an availability perspective, you want to have clocks
Starting point is 00:42:05 as closely as possible, synchronized as closely as possible. So the same example I just gave you a couple of minutes ago is that let's say there's a transaction coordinator whose clock is off by a couple of seconds, right?
Starting point is 00:42:17 It's behind by a couple of seconds. Then always its transactions are going to get rejected for the same items, which another transaction coordinator assigns timestamp because its time is behind. So it's always going to get rejected for the same items, which another transaction coordinator assigns timestamp because its time is behind. So it's always going to get a,
Starting point is 00:42:27 I already executed a transaction of timestamp X and your timestamp is less. So I'm not going to execute your transaction. So from an availability perspective, it's nice to have clocks closely in sync. And that's exactly why we have, we use timestamp because we have some guarantees around like how much a clock drift is going to be there and uh we can control the precision of the clocks yeah so it's
Starting point is 00:42:51 to avoid like unnecessary like cancellations because of these variable time stamps and for load we have different transaction coordinators so time stamps could vary but we also have like guardrails in the system where we identify a particular coordinator has has the time drifting we just like excommunicate node that node out from the fleet or you know if uh if storage node also has checks in place where if a transaction coordinator sends a request which is like way out in future it will say that hey dude what are you doing i'm not going to accept this transaction. So we have guardrails across like, you know, different levels of guardrails in place to
Starting point is 00:43:29 ensure that we keep high availability for these transactions. I was just going to ask that because it seems like everywhere in Dynamo, it's sort of like everyone's checking on each other all the time. And it's just like, hey, if I get something goofy, I'm going to like send that back and I'll have to tell them to get rid of that one this was like when I joined the SimpleDB team
Starting point is 00:43:48 I was working with like a guy David Lutz and he was like I asked him I had not built distributed system he's like you know one thing you need to learn
Starting point is 00:43:54 and this will go throughout your career never trust anyone in the distributed system that's the default rule that's amazing yeah I want to see it
Starting point is 00:44:04 okay we talked about serializability and I know like one thing that comes up I'm so really yeah I want to see it okay we talked about serializability and I know like one thing that comes up
Starting point is 00:44:09 a lot around this is like isolation levels which again is like
Starting point is 00:44:15 a whole other level of depth in terms of that but tell me
Starting point is 00:44:20 a little bit about I guess like the the isolation levels we'll get
Starting point is 00:44:24 especially across like different levels of operations in Dynamo. Yeah. So I think like if you think about it, like transact get item, transact write item. And there is actually a documented page as well on this. But transact get item, transact write items, they are like serialized. For get items, if you do like a consistent get request you are like essentially getting like read committed data so you always get read committed data there is nothing
Starting point is 00:44:51 which you're getting which is not committed right and if you are doing let's say a non-transactional read on a item which is which has already transactions going on, as Somu pointed out, those requests will be serialized with that transaction. So if you have a transactional workload and you do like a normal get item, those will also be serialized. But they also are giving you like a read committed data.
Starting point is 00:45:23 So your get request won't actually be rejected you will get the answer back with the whatever is the committed state of that item at that particular time um and then i think with batch rights and i would say for batch rights and transact right items you have like similar at level, the same serializability. I think that's a key part is that it's very hard to define these in some ways, because there are certain Dynamo APIs like batch writes that can span different items, which are provided just as a convenience for customers, right? Like customers don't have to come back and go back. Then how do you define serializability
Starting point is 00:46:05 of a single batch write across a transactional write? And it's hard to do that because each of these individual writes are serializable by itself, but the entire batch write operation is probably not serializable with the transact write item. And helping customers understand that nuance is very, very tricky.
Starting point is 00:46:23 And it's where we kind of have this whole documentation, lengthy documentation based thing that, yes, each individual write within the batch write is serializable, but the entire operation is not serializable against a single transact write item. So I think the nuance is there for batch write
Starting point is 00:46:41 and likewise, even for scan, right? Like when you're doing a scan in a query, you're always going to get the read committed data so if a transaction is executing across the same items in the um the scan then you're going to get the latest committed data um always yep yep absolutely so yeah um and just so i understand it and maybe put into practical terms like if i do a batch get item in dynamo, let's just say I'm retrieving two items. And at the same time, there's a transaction that's acting on those two items. Each one of those get operations within batch got will be serializable with respect to that transaction. But it's possible that my batch get result has, you know, one item before the
Starting point is 00:47:20 transaction and one item after the transaction. Yes. Okay. Yep. And then there's the issue, I guess, potentially of, I guess, read committed. Okay, so read commit. I always get tied up on this stuff. I think some people see read committed, especially like in like the query respect or also the basket respect. Like, hey, I'm getting read committed.
Starting point is 00:47:44 It's not serializable here. And then I think of like, okay, what are the isolation levels and what sort of anomalies can I get if I think of like the sort of database literature? And the thing that comes out to me is like, yes, it's sort of, that is true, but like you don't see the anomalies that you might,
Starting point is 00:48:03 from my point of view in a relational database where you have a long-running transaction like if you talk if you look at like the read committed isolation level now you can have what like phantom reads and and non-repeatable reads but that's within the context of a transaction but that's not going to happen in dynamo because you have like that single shot single request transaction you don't have like to begin run a bunch of stuff and then yes whatever um type of thing so you don't have like the begin, run a bunch of stuff and then whatever type of thing. So you don't see those type of anonymity just because you can't do that type of operation.
Starting point is 00:48:32 Am I saying that right? Yeah. Yeah. Okay. And I think as you, as you pointed out, like just to reiterate, I think the, between any write operations, serializable isolation is there. Between like a standard read operation, you also have serialized transact write item
Starting point is 00:48:48 and transact get items. Like if you care about what you were saying, where, you know, I did a transactional write and then I want to get a fully serializable, like the transaction should not give me an answer back on a bunch of items because I read them as a unit. Transact get item is what you should use to ensure that you're getting like isolation as a unit as well.
Starting point is 00:49:09 But if you do batch write and batch get, you get an individual item level serializable isolation, but not as a unit. Yeah, gotcha. Okay, on that same note, transact get items. What like, I almost never tell people to use it. What do you see people using it for? Like, what's the core needs there around? I think it's a use, I'm not saying it's not a useful thing,
Starting point is 00:49:30 but like, I think it's one of those things like sort of the strongly consistent read on a Dynamo leader that maybe you think you need it less than you actually do need it. Is that, I guess, where are you seeing like the Transact get items use cases? I think a lot of them I've seen and like where you,
Starting point is 00:49:50 I agree with you that most of the cases you can actually model with just like consistent reads or eventually consistent reads. But there are certain use cases where you really want, as I said, as a unit,
Starting point is 00:50:02 like you did an operation as a unit, let's say you are moving state in a state machine in control plane that you're building where you have like three items which like together define the final uh thing that you want to show to the customer right and you don't want to read any of those items in an individual uh individual manner and show something something to the user so that's where I think it makes sense to use Transact Get Item where even if any of the one,
Starting point is 00:50:29 one of the items that you read is you cannot accept that to be recommitted. That's when you use Transact Get Item. But the space is very narrow. I agree with you. For the classic example would be, Alex, like this happened like a couple of days ago today, is like you're transferring money between your two accounts, right? And then you want to view both the balances
Starting point is 00:50:47 together. If you'd land up doing to a batch get, you may have been a temporary state of euphoria or like, surprise. So you want to use Transact Get Item to say, okay, I did the transfer, I need to know what happens to use the Transact Get Item there, right? There are control planes have such use cases, banks and banks and such use cases where you kind of finally want to display this stuff. So those are cases where Transactual Item is super useful.
Starting point is 00:51:09 It's almost like preventing end user or like user-facing confusion rather than, you know, your application and some of the business. Like if it's like a background process,
Starting point is 00:51:19 you almost don't need to use Transactual Item. Yes, but if you're depending on one of the, if you're depending on both of them to be consistent in the database, right? This is a key word, right? Like let's say that I see that
Starting point is 00:51:32 order status is gone from in warehouse to shipped, then I expect something else to have been done. Then that consistency, you will not get with a batch get. And if you want a consistent read, then you want to do the transact get to kind of read both items together. Yep.
Starting point is 00:51:49 Okay. All right, cool. I want to stir some stuff up a little bit because there was some like consternation on Twitter. So at the end of this DynamoDB transactions paper and also the DynamoDB paper last year, there are some just like charts showing different uh benchmarks and things like that that i think are really useful and um you know showing
Starting point is 00:52:11 um i guess how does like latency change as as the number of operations you have against your table increase the number of like transactions you're running against your table increases or like more items in your transactions or or moreention on it, all those things. And some of those charts, all those charts don't have labels on the Y axis showing, you know,
Starting point is 00:52:31 how many milliseconds it takes at all these different levels. Why, like, why no labels? We just forgot it. No, I'm kidding. But I think... Akshat forgot to put them in.
Starting point is 00:52:42 Akshat forgot to put them in, even in the last check. No, I think partly, I think we could them in, even in the last check. No, I think partly, I think we could have done a little bit better job there. The point was not to show the numbers as such, right? I mean, the numbers, I think anybody can grok. It's a very simple test to go run and everybody can run the test and grok the numbers.
Starting point is 00:52:58 The point was to show the relative difference between, like, for example, singleton write versus a transactional write, what's the cost, right? Latency cost and it's X amount or more. I think that was the whole point. And we didn't want to kind of give absolute numbers, which doesn't make sense, right?
Starting point is 00:53:17 One of the lessons was we could have done a little bit better job of normalizing the numbers and presenting the normalized number on Y-axis. But I think that's a lesson for us to kind of take away next time. Yeah, I like it. I agree. Like last year when I first read that Dynamo paper, I was like, where are the numbers on
Starting point is 00:53:33 it? Like, why wouldn't they show that? But then the more you think about it, like Dynamo's whole point is consistent performance no matter what, right? It doesn't matter how many items you have in your table, how many concurrent requests you're making, you know, all those different things.
Starting point is 00:53:49 And I think these benchmarks are trying to show that at different levels. Like, hey, it's still the same whether you're doing one item, whether you're doing
Starting point is 00:53:56 a million transactions per second. It's still... And we keep making all these, like, optimization in the stack to improve performance across the board as well.
Starting point is 00:54:04 So I think it's just like, again, as someone pointed out, these numbers will be more distraction than actually help because you might run an experiment like 10 years later and the performance will be like even better. Right. So what's the point? And the key point is that you get consistent performance as you are scaling your operations that's the key message we wanted to actually like take away from the from that not that hey this transaction operated at like five millisecond or 10 millisecond or 20 millisecond or whatever that is yep exactly yeah because a lot of those benchmarks can be gamed or who knows like what's going on and and just are they representative of things but i think yeah showing um like you're like you're saying like it doesn't really matter like what those other those other factors are are mostly unimportant um to the to the scale you're going to get there um i guess like how um i you know consistent performance with dynamo is is just
Starting point is 00:54:56 like so interesting and such a key tenant on on on just everything in terms of like the the apis and features that are developing and all that stuff. I guess like how far does that go? And if you had like, I don't know if this is even easy to think about, but if you had some sort of change that would reduce latency for, you know, your P50, your P90, something like that, but would maybe increase your P99 by 10, 20%, like something like that. Is that something that's like, no, hey, we don't want to, we don't want to increase
Starting point is 00:55:27 that spread. We don't want to decrease our, our P 99 at any cost. Like, is that maybe that sort of thing just never comes up, but, um, I guess like how, how front of mind is that consistent performance for Dynamo? I think it is like, as I said, it is one of the core tenants in the beginning. That's like one of the core tenants. Whenever we do a new thing in Dynamo, we have to ensure that. So wherever we look at the lens of like improving latencies,
Starting point is 00:55:51 I think we start from entitlements. Like what exactly, like if we have to do this operation, like what each like hop in the overall stack, how much latency is attributed or allowed for each hop to actually take from the full request. And we go from there. So if there is like network distance between two, like that's one of the entitlements, right? So it varies from like when you're looking at a problem, if you find an opportunity to improve the latency at P50, I think the goal is to make sure the variance between P50 and P99 is also not too high
Starting point is 00:56:28 because consistent performance is about giving you, at any time when you make a call, you get the same performance on the read and write operations that you're doing. Very cool. Okay, one thing on the latency
Starting point is 00:56:43 I wanted to look at, it was just like on one of the charts, especially showing, um, how latency changes as you increase the number of transactions you're running. There were, there was like a spike at the end of P 99 for very high request rates. So if you're doing lots of transactions per second, there was a little bit of spike at P99 uh compared to like you know even slightly lower request rates and you mentioned it was like a java garbage collection issue um i guess like is that something that like when you see that you're like hey we need to you know if it's like a gc issue do we need i know you are like doing some stuff in rust is
Starting point is 00:57:21 that something you're like hey we need to change that because that tail agency is is um so unacceptable or is it also like you know if it always shows up i think it was like doing some stuff in Rust, is that something you're like, hey, we need to change that because that tail agency is so unacceptable? Or is it also like, you know, if it always shows up, I think it was like doing a million ops per second, which was they're doing three ops per transaction. So 333 transactions per second. Do you not have that many users doing that to where it's a big issue and that is okay at that point? Or like, is that something you're actively thinking about? I think so. That one was a very interesting one because I know we went back and forth on those numbers on what are the issues with that stuff. And that was specifically with the 100 item transactions.
Starting point is 00:57:53 So when you're doing a 100 item transaction, a transaction coordinator is holding onto those objects for a longer period of time, ensuring that they're kind of talking to a hundred different nodes. And so the P99 there has been higher. We do kind of want to address the P99 issue there. But the number of customers
Starting point is 00:58:09 using 100 item transactions are also very, the number of applications using 100 item transactions is also far low, right? So we would address that on a, if those customers, those applications are using 100 item transactions,
Starting point is 00:58:21 they're already paying the latency penalty at this point in time, you have 100 transactions. So as long as it's consistent, we are okay, weatom transactions. They're already paying the latency penalty at this point in time. You have 100 transactions. So as long as it's consistent, we are okay. We will address it, but maybe not as soon as, but we will kind of
Starting point is 00:58:36 definitely address it. But we don't want that to regress, right? We want to keep it where it is at this point in time and measure it and see what happens. And we actually run Gendrys across all the different AZs, all the different like endpoints
Starting point is 00:58:49 that we expose to actually find issues in latency before our customers do. So we have like canaries running all the time, acting like customers doing these variable size transactions to identify if there is any issue in a particular stack, in a particular region or anywhere in the stack, we get paged, figure out what the issue is and resolve it as well.
Starting point is 00:59:11 So yeah, we have like, we don't take this lightly. Yeah, very cool. I remember that from last year's paper about how you do monitoring and sort of those performance segregation tests and you have like all those canaries,
Starting point is 00:59:24 like you were saying, but also I think some of your like high, high traffic amazon.com tables, right? You should get direct access to their monitoring and be able to pick up some of the latency degradation there, if any. Yeah, pretty cool to see. So that's it. Cool.
Starting point is 00:59:40 Okay. Transactions, that's great stuff. I want to just sort of closing here, like, you know, you've been, you both been working on Dynamo since it was released now. What does that look like to, I guess, not do new feature development, but, but maintaining or updating the foundations of Dynamo? And how much does some of that stuff changed? And I mean, you know, you would know this better than me, but just like, I don't know, as we've seen changes from like hard disk drives to SSDs to like NVMe, like is that something that is like a regular change
Starting point is 01:00:11 or even like the storage engines you're using or like how much of that foundational work? How often does that, is that something that gets updated every couple of years or is constant maintenance or what does that look like?
Starting point is 01:00:21 So our architecture is constantly evolving. We're finding new things, right? And the best part about Dynamo is customers don't have to worry about the stuff. Like that's the best thing. There's a lot of things in the back changing all the time. And our key tenet is like customer availability
Starting point is 01:00:36 or latency should not regress because we're doing something in the behind. And we do a lot of things. A classic example would be like when I worked on encryption at rest back in 2018, I would say. I keep forgetting these numbers, but anyway, it's 2018, right?
Starting point is 01:00:51 There was a whole thing where we kind of totally integrated everybody under the covers with KMS and this was a whole sweep and customers never saw a blip. So yes, there are things constantly
Starting point is 01:01:01 changing in the background. We're trying to improve latencies. We're trying to kind of make things more efficient. And all these customers don't get to see and that's the best part of being a fully managed service uh um and to answer your question it's constantly happening but nobody gets to know about this stuff yeah and i think a lot of like developers also ask this question to me like who are interviewing at our team as well, that, hey, you have been here for that long.
Starting point is 01:01:26 Like, are you not bored? I'm like, no, every year there is like some fun problem that we have to launch. And the best part is as soon as you launch, you don't get one customer. You get like so many customers
Starting point is 01:01:37 who want to use your feature and traffic also, you don't get one request or two requests. You get like, you know, millions of requests. So you have this like fun challenge that you have to solve,
Starting point is 01:01:45 which has all, like Dynamo has like so many fun problems that still keeps us excited. Yep. Yep. Do you get the same thrill of releasing a
Starting point is 01:01:55 like public feature, a very visible feature like transactions as when you're releasing something like, you know, adaptive capacity, which,
Starting point is 01:02:04 you know, for those listening, it was more like just how Dynamo is like splitting your provision throughput across the different partitions in your table. And it was something that was mostly under the hood. You didn't even know about it until you all published like a really good blog post on it and then further improvements and including on demand note and stuff like that. But like, do you still get the same thrill when like those sorts of releases come out and you're like, man,
Starting point is 01:02:25 we just solved a huge problem for a lot of people and they might not even know for a little while. Like, what's that like? That one specifically, yes. Because a lot of the customers were complaining about it as well.
Starting point is 01:02:34 Like, you know, like they do it right away. And I think it was super excited about, yeah, I think everything we do in Dynamo kind of is very exciting
Starting point is 01:02:42 at the end of the day, right? Because you have direct customer impact one way or the other. It's just boils down to what the impact is. I remember once, I don't know which year it was, but I think me and Somu
Starting point is 01:02:53 actually worked on a problem which reduced like number of like operational ticket fees to get from like really a big dent, like 10x improvement on that. So yeah, I think we get the same thrill. It's where you want to put your mind and solve the problem. And as I said, Dynamo has so much fun problems to solve.
Starting point is 01:03:11 Yeah. Okay, cool. Okay. So last two years, you've written some really great papers, the DynamoDB paper last year, transactions this year. What are we getting next year? What's the next paper coming down the pike? Do you have one?
Starting point is 01:03:22 We have to think about it. First, you have to build something and then we, you know. Yeah, there is definitely a lot more we are thinking and evaluating what we should do. We have also started doing like a lot of talks and at different venues and different conferences and like, yeah, getting like feedback from customers. Like transactions paper, actually, the way we decided
Starting point is 01:03:43 was also, I would say, customer driven. We wrote the paper on DynamoDB and I was just looking at like how the response has been on different blogs. And a lot of blogs had this theme where people were asking like, oh, I wish there was like details about how transactions were implemented in DynamoDB.
Starting point is 01:04:02 It was like a bunch of people had left that comment. So that's when we picked it up and we wrote this paper. So we'll see how the response to this paper is, figure out what customers want and write back. I think it's, yeah, that's,
Starting point is 01:04:13 like Akshat said, it's mostly what's going to be the next takeaway message here, right? Like, for example, with Dynamo, we said, these are our learnings from the past 10 years. With transactions, we said,
Starting point is 01:04:22 you know what, you don't, you don't always need like a long running transaction on an OCQ database. You can build a fast scalable transaction with a single request transaction. So the next one is going to be what's the next takeaway message for us from us to the community in general. And that's what it'll be focusing on hopefully soon. Yep. I agree. I hope we soon. And now that point you're making about like, you know, what can we take away in long-running transactions? I think both papers are very good at just being like, really thinking about user needs from first principles and be like, okay, you know what, we can, you know, other things might have all these features, but you cut off like this 5% of features and you actually eliminate a whole host of problems. And as long as you're fine with that constraint, you can get a lot of other benefits as well. So I think just like the framing
Starting point is 01:05:06 of user needs upfront in both papers is so good and helpful in understanding like how this is working. So I love that. Hey, Akshat, so thank you for coming on. I respect you both much. I love Dynamo and I'm really grateful for you
Starting point is 01:05:20 coming on to talk today. Alex, super thanks for having us, by the way. And you're one of the biggest DynamoDB proponents. Your book is probably referenced a lot. So super thanks for having us. And it's like a privilege to talk to you about the transactions paper. Yeah, same here.
Starting point is 01:05:40 I think you have been doing amazing work and I've been following you for like a long time. Thanks for all the great work that you do. Cool. Thank you. I'll link to the paper, but everyone be sure to check out the paper because there's a lot of great stuff
Starting point is 01:05:53 we didn't even get into here. So make sure you check that out.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.