Disseminate: The Computer Science Research Podcast - Pat Helland | Scalable OLTP in the Cloud: What’s the BIG DEAL? | #50

Starting point is 00:00:00 Hello and welcome to Disseminate the Computer Science Research Podcast. I'm your host, Jack Wardby. The usual reminder that if you do enjoy the show, please consider supporting us through Buy Me A Coffee. It really helps us to keep making the show and keep the lights on, so to speak. So yeah, go and do that if you'd like. And we also have a listener survey out at the moment. So if you are interested in shaping the future of the podcast, please go and fill it out and give us some feedback on today's episode it gives me great pleasure to say i'm joined today by pat helland who will be talking about his cider 2024 paper amongst other things i'm sure scalable ltp in the cloud what's the big deal pat works at salesforce he's previously

Starting point is 00:01:01 been at tandem microsoft amazon and hal Amazon, and HAL Computers. He's been building databases and distributed systems since the late 70s. Pat also blogs at Scattered Thoughts on Distributed Systems, and he also writes for ACMQ. Pat, welcome to the show. Well, thank you, Jack. It's nice to be here. And my blog, you can find it at pathellen.substack.com. Fantastic. We'll put a link to that in the show notes as well, so the listener can go and find that nice and easy. So let's get started then. So I like to ask my guests at the very start

Starting point is 00:01:30 kind of to tell them more about themselves and tell us about their journey and how they became interested in research and databases and distributed systems. Oh my, now you're going to go way back. I went to UC Irvine from 1973 to 1976 when I married into three kids and had a fourth when I was 20. So I figured I'd get a job.

Starting point is 00:01:48 So I never got a bachelor's degree. And it was not the best job for a while. And then I kind of worked my way forward. And my friend from college said, come on up here and program in Sunnyvale, Silicon Valley. And what did we build in a database? We didn't know what a database was. There were no classes in the early 70s on databases. We had to read papers and read papers. And, you know, I read a bunch of papers by this fellow named Jim Gray, and we kind of figured it out. One of the questions at first in the, you know, 1978, 79 timeframe is, well, do you do a network or a hierarchical or relational? This relational stuff is kind of interesting ideas, but seems impractical. And so we were trying to build and learn and stuff. And then when that little startup became a shutdown,

Starting point is 00:02:29 I ended up deciding I wanted to go to work where, you know, Jim was. And I went to Tandem and inherited the transaction processing engine for the nonstop system, which you may or may not remember that stuff, but it was a multi-processor message-based system with all the peripherals, had dual port access, and it was designed to be fault tolerant and designed to run transactions for, you know, OLTP transactions. So, oh man, was I learning stuff like crazy, figuring things out, trying to keep the system from crashing, keep it correct. How do you make sure the database is perfectly there? And that's when I really fell in love with what, you know, you might call enterprise computing, which is make it perfect, make it fast, never screw up, never lose the data and make it as highly available as at all possible. You know,

Starting point is 00:03:06 how many nines can you get to four or five nines and, you know, handle failures and do the right thing. And that always seemed to me to be the fascinating, fascinating journey to take. And, you know, eventually I ended up going into Microsoft, which is in the early nineties, built these little desktop things that were like these toys that were nice to use as a one person access, but getting them into running businesses and making things rock solid was just another passion. And then the cloud starts coming along and how can you do this reliable stuff on top of this kind of sea of resources that fail all the time because it's supposed to fail all the time that turned it inside out. So that's been a really really fun journey for me making things very robust very solid you know keep the business running super high availability never lose the data and yet run it on top of things that are constantly failing

Starting point is 00:03:54 underneath is a really hard problem that's awesome so you started off being in academia and then you kind of went down the industry path i was never in academia i backed my way into it jim showed me this stuff dragged me to conferences i love academia i've always been in industry went down the industry path. I was never in academia. I backed my way into it. Jim showed me this stuff, dragged me to conferences. I love academia. I've always been in industry. Yeah. I mean, I don't have a degree, not my bachelor's degree. I'm certainly not, you know, a knighted PhD. And so, but I have respect for a PhD. I have total respect for the work it takes for people to do that. And I believe in that, you know, academia is to push the envelope of knowledge and to upbring, you know, uplift the new folks coming

Starting point is 00:04:31 into the field and give them their training. And I'm passionate about, you know, the future of all the people in the industry and the, you know, folks who want to do it. To me, the hallmark of a wonderful person is someone who just can't get that passion for whatever they've got their passion for. And they just want to work and work and work to do the best and push the envelope as much as they can. Whenever I see that for anyone, no matter what age they are, all I want to do is figure out what can I do to help. And so that is a thing for me is that growth track. And, you know, I did not make it easy on myself. I did not make it easy on myself dropping out of college.

Starting point is 00:05:04 I had so many more hurdles to do. But then again, you know, I had a family and the whole growth and I've got great grandchildren now. And, you know, I'm 68 and I'm not dead yet. They're going to take me out feet first. This stuff is too darn fun. And I want to continue doing it as much as I can. And, you know, but there's a part of me which wants to see the rock solid engineering projects done and deliver business value. There's a part of me that wants to see the, you know, ideas pursued so we can understand what really makes things happen and makes things work. And so I kind of split between those two. And I think it helps in that way. You know, I see professors, I see people graduate with a PhD and then they want to go into academia, but if they haven't gone into industry also there's a

Starting point is 00:05:45 piece of it that they don't completely understand but that's true in both directions personally i feel the same on that like i i went straight into kind of academia and did the phd straight away through no no industry experience doing it and sometimes i kind of regret that i think that i would maybe want more out of my phd if i'd have had a stint in industry first as well like i don't know i think it makes sense to have both because I don't know. It makes you more aware of both, more well-rounded. I don't know. Maybe it allows you to generate better ideas or better insights.

Starting point is 00:06:10 I'm not too sure. But yeah, I mean, we all walk. Yeah. Everybody's going to have their own passion, right? You know, people get a PhD. I don't know. What do you want to do? If you want to teach, you kind of need it, right?

Starting point is 00:06:20 And being an academic, you kind of need it. It's hard to not have that. If you want to go into industry, the question question is how much do you want to have your experiences grow and what is it going to do so there's no one answer for anyone i mean i needed money when i was in my early 20s i have four kids to feed so there you go there you go yeah cool cool let's talk about um about the paper then so scalable OLTP in the cloud so let let's kick things off with some background for the listener who might not be necessarily too familiar with this space. So what is OLTP? And even with the transaction, right,

Starting point is 00:06:51 let's go back to that and start with a proper background. Well, the word transaction has gone through some history. In the 70s, it actually meant a business operation, a human interaction. And so the word transaction in that IBM products and the competing products in the seventies was I'm going to do a particular kind of work. I'm going to do an airline reservation transaction. I'm going to do a debit credit transaction. But the theme was there's humans talking to computers as opposed to big batch jobs.

Starting point is 00:07:19 And so how can I get stuff in, get things back to the human before they're frustrated and want to walk away. Okay. And so that whole keeping that timing vibrant for the humans was a piece of the puzzle for that. But transaction processing typically meant human interaction. Now at IBM in that window of time, wrapping around an atomic bunch of changes to the database was called a logical unit of work, not a transaction. As the academic paper started to emerge, defining, you know, what is transactions and the whole ACID concept came along and so forth, the word transaction morphed and it started to be applied to the collection of changes that, you know, stuck atomically and didn't see weird things from other things, right? And that's the whole ACID things that we've come to view as a transaction. And, you know, at the same time, the word database morphed.

Starting point is 00:08:09 It used to be the big database system did your app processing. It ran the code that was the app code. It talked to the block mode terminals where it would fill a screen out and the human would fill some fields and go enter and then get the next answer back. And so all of that was integrated into one system that was kind of thought of as the database and it became disentangled over time, better and worse, better and worse. And so now when we talk about a database, we talk about an abstraction of something that does large changes to data.

Starting point is 00:08:35 And there was this transition too where stored procedures started running in the database, right? And that kind of was, this wasn't the database, it was the app environment at some level. And so the boundaries of that have just emerged culturally to gravitate around SQL, which did win and has brought a ton of value. SQL is one of the rare huge success cases where, you know, creating that standard, you know, 40 something years ago, almost 50 years ago, allowed innovation below and innovation above. And the set oriented stuff is clearly a win.

Starting point is 00:09:05 And but it also kind of works in a system that's less distributed. And so the tension as it becomes more distributed is stuff I like to write about. That's why the sequence of columns I write for ACMQ is called escaping the singularity. It's not your grandmother's database anymore. Because to make SQL relational stuff work, you have to kind of suspend time with transactions and suspend space by bringing everything close, which is comfortable for humans. And so it works until it doesn't, if that makes sense. And so there's that question of that abstraction. And by the way, when you go back to OLTP, the original benchmarks tried to put pressure on concurrent updates. TPCA, TPCB, the original standardized efforts

Starting point is 00:09:46 to do benchmarking to talk about whose database does what, deliberately added an update, a concurrent update to the debit credit, the branch balance with the debit credit, that was to force high performance concurrent update to data. As things have gotten bigger and stretched, that's not quite so common because it doesn't work so dang well yeah i guess we're going to touch on that a little bit more in this podcast as well that whole idea of concurrent updates and things so yeah we've got a nice nice solid base on there and kind of what a transaction is and kind of what all tp is and kind of these old tp systems and the thing that i liked when i was reading this paper i don't know this is like a thing

Starting point is 00:10:24 that i'm personally just guilty of and that when i've kind of viewed databases i just view my sort of journey with them stops at the interface of sql i very rarely think about the application necessarily on top of it and treating them as one whole system so i kind of this reading this paper me i had a few sort of like why have i been thinking like that for the whole time like there i should be thinking about them as a collective, but that's just something that I kind of had an observation I personally had when I was reading the paper. But anyway, Pat, what's the big deal with scalable OLTP in the cloud? Well, the big deal was my way of framing a problem.

Starting point is 00:11:01 You know, and so the – let me give you my personal story behind this paper. First of all, I've been working on scalable transaction processing for decades and decades. It's been the thing I've focused in on. And I'm still trying to figure out how to push the envelope. And I always felt, always felt, there would be a limit where the database would say, I ain't going to do anymore. You know, it's too much concurrency. I can't do that. And so I work on this in my day job. I work on how to make things scale in databases. And personally, just personally, in 2022, you know, my back went out, my vertebrae slipped. I was in agony. I saw you, Jack, at the conference where I took wheelchairs to get there and I was like, not a good time. And so I ended

Starting point is 00:11:40 up having back surgery December 8th and then coming home from the hospital a couple of days later, said the drugs are suck. And I stopped taking them a day or so after that. And I was going to watch movies in my new adjusto bed. But then all of a sudden I saw this paper just dropped from CIDR 2023, which was scalable OLTP in the cloud. Is it a solved problem? And I read the paper and I just kind of couldn't figure out what was wrong. For me, it was it asked a question that I'd never asked. Hat tip to Toby Ziegler, hat tip, because he said, what's the

Starting point is 00:12:12 asymptotic limit to scale for OLTP systems? And I just was like, I never asked myself, when does it, when does it hit the wall? Now let's define asymptotic limit. Asymptotic limit means you can add more resources and it ain't going to help. You just, it's not a question of getting more resources. If you add more, you're not going to go farther because you hit that asymptotic limit. And so when is it not about resources and when is it about something fundamental? Okay. And, you know, and Toby did a good job, you know, sketching the architectures for database systems and saying, here's the things that cause you to hit the wall. And it made me understand back and say, well, why is that the architectures? Why are we limited to that? And I'd always, you know,

Starting point is 00:12:55 I've been doing 40 something years in this world of building databases for this stuff. And what is it that left me unhappy? And I kind of went into a rabbit hole of trying to figure it out, reading papers and completely annihilated my holiday on 2022, which was wiped out anyway, because recovery from surgery. I spent like 12 hours a day for three weeks, you know, reading papers and wondering what was bugging me. And what was bugging me was that I, like everyone, and Toby, everyone figured data lived somewhere. Every record was somewhere. So if you wanted to change it, you went and found the latest one, and then you worked on that. Okay. Every record was somewhere. So if you wanted to change it, you went and found the latest one and then you worked on that. Okay. And that was fine. So that was a thing that kind

Starting point is 00:13:31 of was this existential change. Like just as you mentioned, Jack, my paper, which talked about the scalable app and the scalable database kind of, you're just like, oh, I thought about it that way. Right. Well, this is a thing for me. I was like, oh, I hadn't thought about it that way, right? Well, this is a thing for me. I hadn't thought about it that way. It kind of rocked me to my core. And then I had to take it apart and put it back together. And so when I wanted to constructively riff off of that cool paper, which sent me into such a spin, right? Because it was great. It caused me to do these things. I was trying to figure out what is OLTP? What's the boundary? What do I want to do? How do I define that? And I knew the space was ginormous. I know when you do CRDTs, you do all of these other things. I actually love that area. I think it's fascinating, but you don't typically call that a database, right? What's a database, right? And so I kind of said, well, let's go to the conventional

Starting point is 00:14:19 term for a database. You know, it's a SQL system with an isolation model. What isolation model should I pick? You could pick serializability, or you could pick the one that is used most often, which is, you know, snapshot isolation, or not snapshot isolation, but MVCC, right? And you see variants of that where you have statement level rollback, but you still read the database with a snapshot. And to me that, you know, the practical side in me said, I'm not going to do the academic theory thing. I'm going to pick up the one that seems to matter more to people. And so I had to disentangle in my head, what was it that people were using? Well, realistically, the vast majority of the applications out there, at least more than 50% for sure, have gravitated towards using multiversion concurrency control.

Starting point is 00:15:08 Because rather than being pure, the way the academics argue one should, they simply say, hey, does that get the job done? Now, I've known for 40 years, there's a phenomena which I say sometimes you date your operating system vendor, you marry your database vendor. Because if your database has some weird old behavior and your app does weird old things when you run it on your database, you fix your app. Define fix. It could be, you know, render immutable or spay or neuter, but you do something to your app so it solves the business problem in spite of what the database did to you. And so all the idiosyncrasies of a database end up permeating an app. As the app gets bigger, they're kind of there.

Starting point is 00:15:52 If they're just there, it's hard to deal with it. And it's a major challenge to take a significant large app and move it to a new database because the app has picked up on these idiosyncrasies and modified it. The app gets modified so it succeeds in that environment. And so people have gravitated towards using multi-version concurrency control because it scales more. And the scaling matters more than the purity. And they tend to stomp out the anomalies that you see with the impurities of the snapshot.

Starting point is 00:16:26 And that's kind of an iterative process. And no one's getting religious when you're building your business solution. They're just kind of doing whatever the heck it takes to get the business solution. And now you're there. It's like, let's get this thing working. I always want to make some money, right? Yeah. So I started saying, well, okay, what do I want to focus in on in this paper?

Starting point is 00:16:42 What do I want to say now that I've got this epiphany that maybe there's a new way of building a database which doesn't have a specific place for a value for the record because I have these multi-version concurrency control I'm reading as of the past. If I'm reading the past, if you look at a multi-version concurrency control database, and by the way, I mean, Oracle does this, Postgres does this, SQL Server does this, you know, DB2 doesn't, but most of the big databases do this thing where you say, I want a snapshot I'm going to read and then nothing moves till you get the next snapshot. And that allows more concurrency. So I stood back and asked a whole bunch of questions in my head. Like, how did this drift? Back when I was seeing this in the 70s, 80s, and 90s, you know, it was like there was a purity of the isolation models.

Starting point is 00:17:33 And serializability was viewed to be great, but then it wasn't as easy to make the throughput. And so people continued to kind of walk their way back on the guarantees of the isolation model in order to get the scale. And I remember being this consternation about, well, that's just not right. That's not ethical. It allows these funky things. And then, you know, the older I get, the less religious I get and the more pragmatic I get. It's like, well, you know, the business ran. Yeah, we made some money, so it's fine, right? We did. We made some money and all kind of, you know, and so you can stand back and ask, how do you feel about that? But at the same time, it seems like it's the way things are going. And so all of the vendors have, you know, pretty much the database vendors have gravitated towards'm not trying to be religious. I'm trying to be kind of pragmatic about it. But that's the reality. So then I said, well, it feels like the relationship between the database and the application is a thing to think about, a thing to focus in onfield, I would probably go into things like CRDTs and interactions that are not that way. But the reality is people map all that stuff on top of SQL because it's so darn successful.

Starting point is 00:18:54 So let's just say that's what we're doing. Okay. And so I said, let's talk about that as a big deal. It's a contract. It's an interaction between the application and the database, and it is that contract. It's an interaction between the application and the database, and it is that contract. What can I understand from the contract that allows me to reason about scale? And so that's why the paper got titled, What's the Big Deal? Plus, I got to have silly titles because that's who I am. I love the titles of all your work, but they always smile at me. So yeah, keep doing it. I try to, it's more fun to me if I have a good title for it.

Starting point is 00:19:25 And the big deal was this duality between it's an important thing and it's a contract between the two. And so focusing in on the contract. Now, one of the sections of the paper talks about how we've eroded or improved, depends, evolved, devolved, that interface for the isolation semantic to allow a weakening while we allow more scale. And it continues to increment down as we do that. And, you know, one of the tricks I found fascinating as I was thinking about this, and I'd seen this in big scalable apps, is the use of lock no wait or skip locked. They're kind of duality. Skip lock's a little bit more convenient, which basically says, yeah, I want you to go get me things out of the database as

Starting point is 00:20:11 of the snapshot. But if somebody else has got it locked, just leave it out. And that's incredibly useful for queues, for example. And so that was one of the steps of how the isolation semantic was relaxed, right? You go from serialized ability to read committed, I'm not going to see things. And what do I see as I do repeatable read? Well, repeatable read means I'm in a transaction and I read record X and I come back and I read record X and it hasn't changed. If you don't have repeatable reads, you can read record X and get a committed operation. You come back later and you read it and somebody else committed something.

Starting point is 00:20:43 I see that. So all of those are the relax isolation and increase concurrency. And so there's a trade-off that makes it a little bit harder to reason about the application, but it also means that I can get more scale through the system, if that made sense. Yeah. So I tried to take all of that apart in my head and say, what have people, what have people pragmatically gravitated towards? And then I started thinking about, okay, scalable applications. And I just, just stood back and thought about it. And I concluded that no one in their right mind builds an application for scale at the beginning. And I'd written about that a few years ago. I wrote a paper called the best place to build a subwayway, which basically said, look, if you want to build a subway, you better figure out how to route around the cathedral and the, you know, the

Starting point is 00:21:47 ancient, you know, buildings over here, because it's going to be a reality you deal with. And when you think about a scalable OLTP application where, you know, you see applications with tens of millions of lines of code, right? Those are huge investments over decades and decades. And you don't necessarily stand back and think about scale at the time. You're trying to get some little tiny thing started and get people to give you money for it. And you want to add functionality and you want to add features. You don't think about scale. And so then scale is something you bump your shins on and you whack it into submission over and over.

Starting point is 00:22:23 And that's the pattern I've always seen in scale is, you know, oh, I'm going to fix it when it's broken. And so that iterative process is also interesting. And then I tried to stand back and say, what is it that impedes scale? So if you say the big deal is the big deal, if you say it's that contract, one of the things when you look at snapshot isolation

Starting point is 00:22:44 or, you know, that they're, you know, consistent reads and where I can have a statement and within the statement, it's stable. And then I can do a rollback and, you know, Oracle does this and SQL server does this and Postgres does this. The behavior is we promise if you update the same record concurrently, we promise you will have trouble. That's the guarantee. You know, if you try to change record X and somebody else is changing record X, you're going to get blown up or you'll get blown up, right? That's the guarantee allows you to reason about not doing the wrong stuff. And so that's a promise from the database. So that means if you want your app to scale,

Starting point is 00:23:20 you better not pound on record X too much, which, by the way, the opposite of how I was raised in database implementations. I was raised about the worship hotspots. I vividly remember a couple of years after, you know, like 82 or 83, I saw the 1981 paper on optimistic concurrency control. And I said to myself, wow, natural selection will take care of folks who use that. Because to me, the database was about hotspot access, rapid access to the same record. That's what we were trying to accomplish then. Scale that becomes not what you do because it hurts. And so now you see lots of interesting uses for optimistic concurrency control. It works great as long as you don't bump into the next guy very often. Yeah, on that really quick there, on this sort of focus so much on sort of kind of making

Starting point is 00:24:13 right contention and these highly contention situations more performant rather than avoiding the contention altogether. This is another thing that I took from your paper and sort of like moving that up to being the responsibility of the app almost and the way you design your app and sort of don't do things. Don't both try and write to the same record at the same time. If you can, if you can in your application, just wait a bit or do it slightly different. And that for me was a really nice stuff. Pushing the coordination away almost up into the business logic.

Starting point is 00:24:39 I don't know if that kind of makes sense. Like kind of, yeah. Unless you change the big deal, unless you say, I'm not working against a database with read rights against records with that database semantic. If I'm not doing updates, I can do a CRDT style thing. I can do a reservation.

Starting point is 00:24:57 I can say, please reserve this hotel room and somebody else handles that concurrency against the hotel room. I actually believe that's a strategic area. If I wasn't worried about my day job and doing what I'm doing, I'd go do research on that because it's cool too, right? Linearizability has got that abstraction you could do, right? When you think about, you know, the linearizability papers is, you know, arbitrary operations against an abstract data type. You know, gee, reserve hotel room works that way.

Starting point is 00:25:22 And now the concurrency could be behind the curtain. But that's not what we do when we build it on top of SQL. You can emulate it, though. You can update a record in SQL, which is, please submit something to someone else who figures out how to reserve me a hotel room. And so I'm dropping a record in the database and somebody else notices the record in the database and figures out if I can have a hotel room. And so that ends up kind of being a, you can layer these concepts on top of the database and somebody else notices the record in the database and figures out if I can have a hotel room. And so that ends up kind of being a, you can layer these concepts on top of the database, but the database itself says you, you whack on record X and this other transaction whacks on record X, one of you is going to lose. It's the rules. So that was the big deal. And so

Starting point is 00:25:59 it caused me to think about what is the implications on the app? What are the implications on the database? The next thing I tried to say is, okay, let's assume I got this database and this app interface. I got this big deal. Why, what stops a database from scaling? This was all thought experiment, all me staring at the ceiling, annoying my wife. Okay. She's like, why didn't you answer my question? I don't know. I was staring at the ceiling thinking, well, that's not going to be good for our marriage. And so you have to like manage that. But that's kind of what was going on in my head for months is how do I, what is it? Well, to me, it meant, wow. Okay. If I'm going to have a scalable database and the reads are by snapshot, they're an hour old. They're a millisecond old. You cannot read a multi-version

Starting point is 00:26:48 concurrency controlled database as of now. It's not in the interface. You can read the past. You can propose a change that will, to modify a record, update a record, that will stick as long as no one else fiddled with it since the past you saw as of the snapshot you saw. Okay. So that's a tentative, please update it if

Starting point is 00:27:12 I don't bump into anybody. Okay. But I'm reading the past and big chunks of the reads are not things I'm going to update. They're things I'm just going to read. And so now you're in this world of how do I make sure the reads scale? Well, the reads can't scale if they're fighting with writes. Okay, and then you start looking at the implementations people do in databases for multiversion concurrency control. Well, Oracle has these undo blocks. So if you want to update a record on the block, you go to the block and you get the block and you look in the block. I want to update it.

Starting point is 00:27:44 Oh, no, no, no. It's this. I want to. I want to update it. Oh, no, no, no. It's this. I want to read it as of a snapshot. Oh, no, no, no. I got to go read it over there. So I have to get the one being updated before I can read the one from the past. OK, and the similar story in Microsoft SQL Server. You've got to go to the current home and then you say, well, and then I can go to the past to read the past. Same thing in Postgres. I got to go to the current home and if it's being changed, then I can read the past. Well, now that's, that does add a ton of scale over not being able to read because somebody's got it locked, but you're still friction to go to the home for the record. Now, in that paper that Toby Ziegler did that sent me on this mind bendingbending excursion, right? You know, he said there's kind of

Starting point is 00:28:27 three archetypes of databases. There's a single writer, which has got its own log and it can only scale till it scales. There's a partitioned writer, which works great until the partitions don't match the write skew patterns, right? Or there's the multi-writer

Starting point is 00:28:42 where I'm fighting over the data itself, similar to Oracle Rack, you know, is one example of how that works. And he did a really nice prototype in his paper and his thesis to use RDMA to implement shared blocks. OK, but there's still a place you go. There's still fighting to read with writes. And so if I'm pushing the asymptotic limit, which is I'm going to go forever until I can't. OK, anytime you're fighting the reads for the rights in my brain that impeded the asymptotic limit. Realize it's just a thought experiment, just a question of where does it top out? And so if the reads and the rights fight my argument, then that means you're having a problem. And it turns out, you know, I'm quite a fan of immutable data. I wrote this paper, Immutability Changes Everything.

Starting point is 00:29:31 And in the cloud, you basically can't have changeable data. You can read only data in the cloud, and you can have a log of changes that you're appending to. And so, you know, the whole cloud depends on append. That's all it does, right? And then you read through that crap to get to the other crap. That's the only thing you get in the cloud because everything's gushing around everywhere. And compute and storage are changed. And so the storage has to be findable no matter what. So dang well, better have a unique ID and read only, right?

Starting point is 00:30:02 And so all that comes around to change now is the hard point. Right? But you see this in OLTP and OLAP and HTAP and all that kind of stuff. It's not hard to read the past. It's hard to fight for the now. And the tighter the now, the tighter that turbulence of now, the harder it is to reason about the intersection between changes and reading the past. So if you're just willing to wait a tiny bit, it's OK. But you see this endemically in scalable systems. I'm pretty sure most of the listeners have had the experience of going to a major e-commerce site like Amazon or something.

Starting point is 00:30:48 And it says usually ships in 24 hours. That tells you nothing. And it's incredibly useful. Okay. If something isn't going to usually ship in 24 hours and I'm waiting the last second to send a grandkid a birthday present. Oh, crap. Let's go find something else. Right.

Starting point is 00:31:03 Because it's not the one I want to put in the cart and miss the birthday time. So I'll go for the usually ships in 24 hours, even though it's a guest. Right. And so whenever you're dealing with scale, things that are happening right this moment are hard to get perfect. You can get it approximately. So as I started trying to explain what that meant to an app, I realized the only choice you have in the app is to not update the actual facts of now. Because then you're fighting like crazy. But what I can do is I can record that, oh, gee, I've scheduled a shipment for the inventory. And so the inventory balance in the warehouse is going to change. And so now I'm in this thing where I can say, well, do I think it's probably going to be

Starting point is 00:31:51 fine and I can fix it later if it's not, right? And that takes me back to a paper I wrote in 09 called Building on Quicksand, which was a thought exercise of asynchronous checkpointing. And, you know, what do I know if I don't have the truth? I can get a pretty good guess. I can do probabilities. I can figure it out, which is not terribly different than if you have a joint checking account back in the day when you wrote paper checks, right? When you wrote paper checks, it's like, I sure as hope the wife didn't write anything. You know, and you kind of are guessing on what's flying out there because you only know approximately, right? And then heaven forbid you deposited your brother-in-law's check and then that's going to bounce, right? You think it's there and it's not,

Starting point is 00:32:30 it's that kind of a thing. And all of this stuff has been an artifact of reality in distributed systems with businesses. Whenever I think about a distributed system, I pull back and I say, how did it happen with my grandparents' days in the early 20th century? Well, you know, these pieces of paper floated around and they got reconciled. And how did all that work? Well, there was better have an overdraft charge, right? So people don't abuse it. And it's a guess and it's an apology and it's a pretty good guess.

Starting point is 00:33:00 And you get better at guessing well, because it's drag when you don't. And that's a complete antithesis of what, you know, one of the craziest things, I stand back and I think about distributed systems and computing and worry and stew about doing over? Well, a mulligan is a do-over. And mulligan stew is this thing where you keep adding crap to the pot every day and it keeps changing. Is it rabbit stew or chicken stew? I don't know. I've added that oil in every day. And so I'm thinking about that.

Starting point is 00:33:38 Well, one of the ones I wish, there's two I wish I could do over and that just jumped to the top, which is one of the word consistency and how it's interpreted by different communities. It means a different thing here and a different thing there. And it's inconsistent and I hate it. It drives me nuts. The word is great. And I wrote a paper called don't get stuck in the con game,

Starting point is 00:33:58 which tries to disentangle that discussion. That's a thing I wish we could do over and just have different words and it'd be okay. You know, like lock and latch. It's not, most days I'm saying, which kind of lock do you mean? Right? The word lock is different things, right? And so that's, those are things. And the other one I wish I could disentangle is the notion that we should distribute a database as opposed to invoke an app on the other side. I mean, databases don't work across trust boundaries. Databases don't work great across a distance unless you're using them in a stylized fashion, right? I mean, it's one thing to get a snapshot of the product catalog before I do an order for an online system. And I wrote about

Starting point is 00:34:43 this in my autonomous computing paper, right? You know, I'm going to read last month's catalog to figure out what to order. You know, when I was a kid, the department store catalog, the Sears catalog would land on the kitchen table. We'd all go around it. Mom would tear the little piece of paper form out, and we'd figure out how to order toys and dresses for her and all that kind of stuff. And you'd put it in a piece of envelope, you send it off and the stuff would come after a while

Starting point is 00:35:07 with a piece of paper check in it. And that was stale data that you could use to order. Very useful. In that case, the department store catalog was updated twice a year. When you're shopping online in an e-commerce site, you're looking at stale data. That's great. It's fine. Okay. So you can share stale data. You can't share current data. Furthermore, you don't share data across trust boundaries. I don't know what it's like in the UK, Jack, but I know here in the United States, when I walk up to an automatic teller machine on the side of a bank, no matter what buttons I push, they will not give me a JDBC connection to their backend database. Damn it. They just don't do it. I don't

Starting point is 00:35:46 understand. How am I going to work with them if I can't whack on their database? Okay. So I get to do what? Like five things, deposit, withdraw, transfer, ask for the imbalance. And I only can do a bounded set of things that they've decided to do for trust boundaries. Similarly, when you try to stretch data at a distance, it gets wonky. So now I'm in this paper on scalable OTP, what's the big deal? And I'm kind of saying, well, you know, let's stretch it anyway.

Starting point is 00:36:12 Let's push the envelope and things that are not the most practical, but what works and doesn't work. Is that making sense when I say that? Yeah, yeah, yeah. There's still time to go and do this again, Pat. Do you think there's time left

Starting point is 00:36:23 to kind of redo the full thing? No, I don't know. I mean, you know, we still, in the U.S., we still use English measures. And so, you know, I would do that over, too. And I was raised here, so I get confused, I think, in Fahrenheit, and I have to translate to Celsius. And it's like, I still think Celsius is smarter, but that's where I am on American. I'm, you know, hobbled by that. And so you kind of get into this world

Starting point is 00:36:50 of which things are worth fighting over. That's true, yeah, which you want to die on, right? Right, and, you know, and the SQL abstraction of set-oriented has gone so far because it allowed you to innovate on top of the interface and allowed you to innovate below the interface.

Starting point is 00:37:04 And I was part and parcel of inverting the way we manage data to get these set-oriented operations. And so it gave us a ton. Does it have its weaknesses and limitations? Hell yeah. But I mean, you know, computing is fascinating. Humans think with Newtonian universe. I don't know about you, but to me, time moves forward.

Starting point is 00:37:24 Everything's got its trajectory. Newton had it all, he had it buttoned up. He just did. And until you get to a certain scale, it all works, right? And then, you know, Einstein kind of said, let's push it past that scale and had this crazy stuff, which is, oh, no, no, no, no, no. Speed of light's constant, it's time that isn't. It's like, huh? What? Really what really huh and so distributed computing is similar to that grasping that i'm here and you're there right and i don't know anything about what's going on for you right now you might have you know you might be gone but i know what i heard from you so i'm trying to compose decisions based upon limited knowledge of participants' paths. And that's kind of like,

Starting point is 00:38:06 you know, special relativity, which is not Newtonian. And it's kind of a mind mess. And I've spent my career trying to do cooler things for folks who do not want to think weird, like some of my colleagues and I do. Like you do, Jack, you think weird. Okay, you were trying to build systems for folks who don't want to think weird. They want to build an insurance application. They don't want to think about that crap. They just want to build an insurance application, right? And so we're trying to model that, you know, and then early in my career, it was like, how do I hide failures? How do I hide the fact that the disk got corrupted? How do I hide that the server crashed? How do I keep it going when all these things go awry? Okay, so you're

Starting point is 00:38:45 trying to hide crap from somebody who just wants to get the job done. Now we're in this world, which is lovely, fascinating, and weird. Okay, let's go to the cloud. Well, what's that mean? We're separating compute from storage. Huh? So who is the endpoint? Who am I talking to? Am I really talking to the same computer? Nope. I'm talking to the computer that got the memories from the other computer. I can take this VM and I can squirt it over to that physical machine. Well, is it the same one?

Starting point is 00:39:15 I don't know. What's its identity? And is it answering like it knew what it knew before or not? You know, you watch Star Trek and Captain Kirk goes to the transporter and all of a sudden he's not in the ship. He's in the planet. Is it the same Captain Kirk? Well, I don't know. He remembered what he remembered. He does what he does and it seems like it. And they don't seem, they never really made, that I remember, made two Captain Kirks. Could do. I don't know. Right? Maybe they did. I don't remember. But that's an open question

Starting point is 00:39:39 here. So now the cloud is really weird. The cloud is just really weird because you can't put anything in one place. You got to put it in lots of places. And so you can't change it without going through a window of time where it might be changed. Yeah. Might be changed based upon what? Based upon, well, if it failed, somebody looks, which do they decide it based upon how they looked, if it changed or not. So when you're changing something, it goes through a superposition of states where it's simultaneously changed and not changed. Huh, that's kind of funky. And you end up in these things where you, you know, oh, you saw the change, but the change didn't stick. So that's okay.

Starting point is 00:40:18 You didn't stick. You can look at Schrodinger's cat inside the box. And if you see the cat's alive and it turns out the cat isn't alive, that's okay. We'll just remove you. And so all of this compounds as you start building your solution. And fundamentally though, you still want the person writing the insurance app to not give a rip. They just, it just works. Yeah. Yeah. And I'm totally in favor of the cloud. It's impractical to have dedicated resources rather than shared resources. Love that. 30 years ago, I used to tell my late wife, I'd say, yeah, I could be home for dinner. And I promised what time I'll be home for dinner if I only had a dedicated freeway line. One didn't think it was practical for me to have a dedicated freeway line. So I knew I had to accept some uncertainty in when I would get home.

Starting point is 00:41:01 And all this weaves in and out of what is the scalable OLTP system. So I had to pick a thing. I picked a thing. It's like, oh, database is an application. Let's just go down that path. Cool. So I guess kind of with that then, what's the rethink then? So how would we do this? How would we build this then from if we had a clean slate and we didn't have any sort of restrictions, what would it look like? How can we do this better? Yeah, well, to me it it's pretty obvious that you're going away from update in places of database blocks okay yeah i mean that is a thing you see some really innovative stuff by the way hat dip hat tip to aurora and you know

Starting point is 00:41:37 alloy db does this and you see it in socrates and and and azure where you're emulating the block update behavior and all of the storage underneath just magically can take a licking and keep on ticking on the cloud. That is taking the database software, decapitating it, and making the bottom part different so that it runs better in the cloud. The server that is doing that update to the block, the actual database server itself, isn't protected. But that's okay.

Starting point is 00:42:08 You made progress. It was a very clever thing. But it preserves the model that a record lives in a block and the block is going to be handled and that's what you do. So that preserves the model that the abstraction between the database and the app says no i have isolation on a record key each key is the thing that the app must not concurrently update because that was the rules in the big deal that's the rules in the isolation semantic. So we started with that and said, okay, what can I do? So now when you conflate the keys to the block, inevitably you will have false conflicts. Inevitably, there will be problems that were not in the actual contract you had to the app

Starting point is 00:42:59 that you've introduced in your implementation of the database because you conflated the record to the block. When I was a kid and I had hair and I started working on databases, you know, wow, it seemed obvious to do page locking. It seemed obvious to do shadow blocks, you know, because we hadn't figured out, I hadn't figured out the right head logging. Some folks at IBM had, but I hadn't learned about it yet. I was young. I was like 22 trying to figure this crap out, trying to feed the family, give job. Okay. And so page locking seemed like a good idea, but it ended up causing false conflicts across concurrent transactions. So now we only do all these other databases. They just do page locking for a tiny window of time to get in, do the record update against the page and get out.

Starting point is 00:43:41 Now you use the write ahead log and the isolation at a record semantic to protect the transaction. So you've up-leveled that abstraction. And all of this is, I wrote this paper called Side Effects Front and Center, which said a side effect is what the outsiders didn't kind of expect to happen. And so if you, you know, when you update a B-tree in most modern databases and you update a record and you abort their transaction, it undoes the update of the record, but it doesn't undo a block split of the B-tree. From the outside, you don't care. Okay. If I'm talking about microservice and it causes, you know, it's memory to fracture, you know, fragment,

Starting point is 00:44:20 or it causes a monitor logging record to spew out, I don't care. I didn't see that because my level of abstraction didn't see it. It's still a real artifact. And so all of this weaves in and out of these systems. What are the side effects that you have and what are the consequences on scale? So if you're going to push the asymptotic limit, you assume I can mask those consequences. Now, I personally believe that you want to do things like log structured merge systems where I'm logging the change. I'm organizing for whatever read patterns I have. And I'm writing immutable data into the cloud because the cloud knows how to deal with immutable data.

Starting point is 00:44:55 And if you want to think of an HDFS file, you got it. That's an immutable thing. Those things are copied around and squirted around to preserve the same byte stream. Right. And that's how the cloud works. I'm constantly copying forward and reorganizing, but I'm preserve the same byte stream, right? And that's how the cloud works. I'm constantly copying forward and reorganizing, but I'm giving you a byte stream image.

Starting point is 00:45:12 So at some point, you're providing something that doesn't change in a way where the underpinnings can gush around and die and stuff like that. I mean, most folks don't think about the fact that it's just pretty darn common in the cloud to just walk through and reboot every machine every week or so. You do this for security patching. And, you know, whether you give the thing a warning or not is another question. But fundamentally, the abstraction is you're just going to take something, you're just going to kill it all the dang time. So how do you do that? How do you make things keep going? Well, if I have multiple replicas of the read-only data, I can make sure that I don't kill all the

Starting point is 00:45:44 replicas at the same time. So now I can find sure that I don't kill all the replicas at the same time. So now I can find it, right? I learned that when doing the big data style stuff, working at Microsoft to do a project called Cosmos under the plumbing for Bing, you know, and we would keep triple replicas. And if you got down to two, it was like, ooh, make a third. If you got down to one, it was like, oh my God, oh my God, the house is not clear. Make a second one.

Starting point is 00:46:03 And you tried to make sure. And the model there was the data needed to live indefinitely, even when things are failing with a particular statistics. And of course, you're scrubbing the things. You're reading everything off of every disk because bit rot happens. If you can't read it, you better go get one of the other replicas and make it. So that's not terribly difficult to reason about when the data you're preserving is read-only and immutable.

Starting point is 00:46:27 And that's kind of the thing we build upon. And so now this is all weaving in and out of what is OLTP. How do I build a database, right? And I want to make the past be rock solid. And I want to make sure that that future up to the point, if I commit a change to a transaction, then I want to make sure I never lose it. That's the rules for enterprise grade OLTP. Once it's committed, it's committed. I'm not going to lose that dang thing. Right. But if I have two records fighting over a proposed transaction, typically you want one to

Starting point is 00:47:00 win, but if zero win, that's okay too, because i didn't commit to it that's the rules that transactions give you a latitude they give you the grace of being able to do things and so that was part of what i wrote about in the paper too is this i can take advantage of those you know if you will weaknesses in the guarantee i didn't guarantee it's going to commit i guaranteed if it does i keep it yeah i kind of i think it'd be it'd be nice if we kind of ran through the kind of the strawman architecture of this system as well so i'll give the listener maybe a flavor of kind of how this actually would look like to sort of maybe picture clearer so i guess yeah let's let's talk about the strawman architecture well and again this is an imaginary hypothetical architecture okay when you make a change to a record, okay, you're going to want to keep track

Starting point is 00:47:46 of it, you know, pretty quickly. And I realize there's a window of time there that's a little nebulous here. Pretty quickly, you want to push it out into a log structured merge system. A log structured merge system takes recent changes and puts them out organized by time first and then key second. So let's say every minute I'm going to grab stuff and grab everything that's changed and put it out in a file sorted by key. So if I want to read something that includes that, then I go to that first. Now, if I'm dumping these things out every minute or so, it gets pretty intractable to go back years. So what you're constantly doing is reading the recent changes and combining them with somewhat older changes and consolidating them so that you're covering a bigger time window.

Starting point is 00:48:31 And you're taking the keys and you're reorganizing it by key. So I can have a logarithmic set of places to look to find out the value for a key. And I need to make sure that if something got changed, the new value is the one that survives if it's below my snapshot, or if something was deleted, the key's got a tombstone, and I accurately read through that and see the things missing. And so I'm reading by key, and I'm climbing down this thing. But I may have, you know, three, four, ten levels of this log structured merge thing, which are increasing in exponential size as you go down, and you have to go through a logarithmic set in order to read

Starting point is 00:49:12 it. So the reads can be painful. But the fascinating thing about the cloud is I've got just a ton of resources to process. And I could choose, if I wanted to, and I said I'm reading it a lot, to go smash it down as of a point in time. The question that's more and more tractable, the older my reads. But we see this in real life. If you want to know the financial status of a big corporation and you want to see its quarterly results, well, they took three weeks to wrap up the quarterly results. And then they published them. And in every quarter, they took three weeks to wrap up the quarterly results. And then they published them. And every quarter they say, yeah. And the one I gave you three months back for the previous quarter, I lied a tiny bit. You know, and here's the corrections to that lie that I gave you, because I still couldn't figure it out in three weeks. But, you know, if it's within bounds, everybody does it. That's kind of how it rolls. So you can never kind of know now on their scale. And so this is not any different than that. It's like, how do I give you the correct

Starting point is 00:50:10 database answer? But we're spending cycles to put the read together to get that correct database answer in that window. That's the first piece of the puzzle is the past is read only and organizing it for scale. I can make an lsm that just scales and scales and scales if you want to have a thousand times the readers i just make a thousand copies of this read-only stuff right i can scale that like crazy if i got resources remember the the goal was to think about the asymptotic limit which is defined as you can have more resources can you do more scale so i can get more read-only things to like cows come home. I just get more if I want them. Okay.

Starting point is 00:50:46 But that recent change, that recent time becomes more and more complex. How do I make that narrower and narrower? If you want your snapshot to be a millisecond old, wow, it's not quite in that LSM. It's not quite off in the read-only things. How do I reason about that working okay? And that is the pain point is that turbulence of now, which is what you see in every scalable system, no matter what. And so when you want to do scale, you got to take time to roll it up. So one of the examples in the paper is a presidential election in the United States. Okay. So I've got hundreds of millions of voters

Starting point is 00:51:26 spread around the United States, and they go one day to their polling booth, and they send it in the mail, and they say whatever they want to vote for. Okay, and it's a presidential election. I'm not getting into politics, but this is the machinery. It's a good example. Yeah, a really good example.

Starting point is 00:51:41 Keep going, sorry. Yes, so the way it breaks down is you've got states, and the states roll up. There's some for there and there's this other, I'm not talking the electoral college. Let's assume you're just adding the total account and things. Okay. And so the states are going to roll it up and the states have got counties and the counties are going to roll it up.

Starting point is 00:51:55 The counties have got these precincts, which are, you go down to the, you know, the school next to you and you vote there and then they roll it up. And when the polls close, what happens is everybody's getting a pretty good, they're getting a partial count. And they're constantly sending the partial count off to the next level of the hierarchy. So the precincts are telling the county, and they're continually updating their partial count. And these inaccurate results are rolling up the hierarchy. You know, and if the vote is 70% this way and 30% that way on television, they just pretty quickly, they say it's going this way.

Starting point is 00:52:27 Yeah. Right. And sometimes it takes weeks to kind of count the last few. And you're counting and you're counting and you're counting and you're counting. And the closer it is to a, you know, a really close election, the longer it takes to gain the truth. This is the way it is inside of a scalable OLTP application on top of a scalable database. You can say that's close enough, right? And we do this all the time is, yeah, I'll ship you the book. I got lots of them. I'll promise to ship it to you. It's going to be

Starting point is 00:52:57 there. And you got lots of them and it's not, I mean, if a few of them get run over by the forklift, whatever, we got lots of them. If you're down to the last Gutenberg Bible, it gets special. The Gutenberg Bible is not handled that way because there's like not there for many of them. Right. And so you got to like take special care of each copy of it. And it's a special thing. Question on that, on that case there of some things needing to be tracked slightly differently. Do you think it's possible to have that within the same OLTP system the same scalable application of having been able to cater for both types of some I guess in that example both types of books

Starting point is 00:53:30 for example or do you think that they need a completely different way of being managed those those those types of all that sort of business application it's an application decision when you go into the bank and you want to deposit your brother-in-law's check for 100 bucks and you have a decent bank account balance, you've been with the bank a lot, they put it in there. If it bounces, they'll talk to you about it later. Yeah. If you've been floating $500 to $300 for a long time and you walk in with a $50,000 check and want to put it in your bank account, they're going to call the bank at the other side. They'll talk to a human being that will mark, yeah, I got that 50,000 bucks and yeah, you're the only one who's going to get that check to clear. And then they'll

Starting point is 00:54:11 put it in. They'll probably still put a hold on it if that's a big discrepancy between your normal thing and the deposit. And so there are business rules based upon the probability of loss. And that's not something you could just capture in the database, which understands reads and writes to record X. It's just not what you do because it depends upon the app. I was fascinated when the first cash dispensers, these ATM machines showed up on the side of the bank. I was like, wow, that's interesting. Because then, you know, you lost the connectivity between that cash dispenser and the mothership a lot. So now you have this problem, a customer comes up and says, give me some money. And the ATM can't go ask the mainframe across town or across the country how much money there is. And so do they piss off the customer or do they

Starting point is 00:54:58 risk the money? And so kind of they quickly settled into an upper limit. I'll give you 200 bucks without being able to check with a mothership because they knew from the card you were probably you unless it was a thief. But that's another discussion. And so as long as they didn't think you were able to fiddle with the connectivity to the mothership and then hit every ATM to get 200 bucks, they would just do it and then occasionally deal with an overdraft. This is not a serializable database transaction. It's a business logic and a business risk choice. And so the problem I have is that when you're raised as a database person and you don't stand back and look at the world,

Starting point is 00:55:37 you actually think this shit the database professor taught you is useful. It's not. It's an abstraction upon which you build that people then tolerate and manage which is why distributed databases are wonky because they're not always going to be available yeah i'm just kind of i was looking for a quote in the paper like you said i don't know that it's kind of this is the right one but but they're about the ambiguity of the computer and the computer's ambiguous understanding of the real world and i just loved that that that line in there and then i was kind of thinking along the real world. And I just loved that, that, that line.

Starting point is 00:56:05 And, and I was kind of thinking along the lines of like, kind of all models are wrong, but some are useful was kind of the sort of wonderful one going with it. Like, and that was just kind of what I was thinking about as well. I was like, yeah,

Starting point is 00:56:16 the real, the real world is messy deal with it in a way, like let's work around it. Right. And we make this imaginary pristine thing and people use it in a practical way. And part of the problem is we think we can push that limit and rather than expose our limitations. If you have a distributed system, they can be disconnected. Now, what do you show?

Starting point is 00:56:36 Right. And so, you know, and people are not used to kind of involving the application in that. But then again, we set out as our mission to pretend it's a single server with the single behavior and say, write your insurance app and don't worry your pretty little head about it. It's like, that's not the right thing to do at some level too, but it's kind of this lowering of the friction to people who want to solve business problems. And the reality is depending upon the complexity of the business problem, they kind of have to know. And I wish systems developers spent more time with application developers.

Starting point is 00:57:09 I wish to see how they cope with our crazy systems. This was sort of what I was alluding to earlier on when I said that kind of in my, I wish I'd had some industry experience before doing my PhD, because that sort of awareness of the applications and what people actually do with them what people want to do with them kind of wasn't there so you kind of that's why I kind of got stuck at that boundary of just thinking about the database thinking in terms

Starting point is 00:57:33 of reads and writes and making let's make serializability as fast as possible because surely that's the best thing in the world which just eventually there is there's a limit to that right that's not what people want want it for want to use it for, how they use it. But indeed, I will point out, the mission I set out on with this paper was indeed going down that path. Right, okay. Oh, what am I going to pick? Well, let's pick the app and the database

Starting point is 00:57:56 because the space is so complex and nuanced. I had to pick an invariant and say, now what can I do? Put some stakes in the ground to then actually build off, right? Otherwise it's just... Correct. Scalable OLTP is in fact ill-defined. It's kind of how do we make this stuff work, right? And I tried to put a stake in the ground and then iterate off of it. And it actually opened up a whole bunch of understanding from my side too. So I believe it was a well-spent mental exercise. And I'm looking forward to seeing who comes out and reads the paper and gives me the what for because I didn't think about this properly and challenges me to the next level because that's the glory of our world is when people tell you and help you realize I didn't understand this new thing and you learn and you grow and then you help others learn and grow that's the fun yeah one of my questions part was going to be kind of

Starting point is 00:58:51 where do you go next with this sort of this thing and this sort of research agenda is it the case of now put it out in the world wait and see what comes back or is it something you're still actively working on to sort of well i have business problems at work and this has some relevance but it takes time for some of the ideas to to come in you know in in two years earlier i'd done a paper called decoupled transactions which was a different mental exercise it was not about scale it was about low latency jitter tolerating jitter how can i make a database that gives you an answer when i have a bounded amount of the systems that don't even, the computers that build it are, you know, not even answering me. I don't know if they're dead or not. That's the fun thing in the cloud. You can't tell

Starting point is 00:59:34 something's dead. You can tell you haven't heard from it. What does that mean? I don't know, right? And so it becomes, we didn't used to worry about that stuff because we had our dedicated networks and you figured, you knew the answer was coming in a window of time with high probability, excruciatingly high probability. So if you missed three of them, well, that guy must be dead. And off you went. And that's not the case in our gushy cloud. Heck, you got these funky things like differential observability. You got three availability zones. Well, AZ1 can see AZ2 and AZ2 can see AZ3, but AZ1 can't directly see AZ3. And you say, hey, what computers are up in this world? Well, you-1 can see AZ-2 and AZ-2 can see AZ-3, but AZ-1 can't directly see AZ-3.

Starting point is 01:00:05 And you say, hey, what computers are up in this world? Well, you have different answers. Now what do I do? And it's an increasing challenge that's fun to deal with. Yeah. And so right now, I'm going to probably kind of work on other fun problems I have in the meantime. But, you know, and one of the things I like to do is just think of these things and then throw them out in the community and see how it causes, hopefully, good impact.

Starting point is 01:00:31 Yeah, I'm sure. That is a very good answer. I have a question as well on about impact and sort of the impact you hope this kind of work can have. I guess we kind of covered it off there with that. But I guess the kind of this next sort of set of few questions i have most of general high level sort of questions and what's funny so this one the first one's my favorite question i love how people answer this one it's about people's creative process and how you approach idea generation and then

Starting point is 01:01:01 selecting ideas to dedicate 12 hours a day for three weeks on or the next six months of your life? Like, yeah, what is your creative process? You want to imagine something might be different. And it's pretty common to say, you know, hey, I imagine this might be different, but then there's five, ten things that have to be different. And then each of those can cause you to say, hmm, that's too hard. I'm going to go home and watch TV, right?

Starting point is 01:01:26 You know, rather than you kind of give up and you kind of have to do this, you know, it's an optimization thing of imagining a big change and then imagining that the pieces that are required to change could be worked through. And then you try to do an optimization

Starting point is 01:01:39 for the impact of the big change and the unsolved pieces for it. And then typically my brain's kind of taking it apart and putting it back together over and over and over to try to see what might be different. Yeah. And so you want to imagine an arch. I ain't going to work the road.

Starting point is 01:01:56 The first stone's going to fall over. Right. Well, no, you want to like give yourself some slack and figure out how you can say maybe it won't fall over. Right. And how do I make all of that work? And how do you. Right.

Starting point is 01:02:07 And you're going to follow it through. And how can you then do it? And in the real world that I tend to be living in, it's this whole question of how can I make incremental evolution? So now you have these other crap. You got to keep the system going. You got to make sure it works. In that paper that I mentioned earlier, best place to build a subway, I point out that it's fascinating to me when I'm flying in an airplane and I happen to have a window seat and I look down, love seeing when an overpass over a freeway has a bend in it. I know they widened the freeway. It used to be a straight overpass because nobody makes a bend just for no reason.

Starting point is 01:02:44 And then they needed to make the four-lane freeway become a 12-lane freeway. So they had to take the overpass and build a new one. Well, you don't want to tear the old one down while you're building the new one. So you build the new one next to the old one. And so you can just look down and see the artifacts of evolution. And so not only do you want to visualize where you can be, you got a back channel. How can I get from hither to yon? How do I get from here to there? How does it happen? And I've had the fortune in various commutes. Commute's not always a fortunate thing, but I've had the fortune of commuting down freeways, which are being widened and watching the stages and saying, ah, this is going to do that. And that's going to do that

Starting point is 01:03:22 is just very evocative to me about how software will evolve because they're just very strong parallels. So not only do you want to imagine a new world, you have to imagine the pieces of that new world. Then you have to imagine the stages it takes to get from this world to that world. And each of those is strongly dependent on the details of the world you're in yeah it's funny we had a competition very similar at work today about some one of our components that kind of it needs a bit of loving it needs we basically need to kind of change it massively towards some new some new architecture and it was that same sort of thing like the end goal seems so far away and getting there and sort of laying out those steps like okay we want to get from a to b but like it's not obvious how to do that we

Starting point is 01:04:04 need to take it step by step and kind of, I don't know, do this sort of rolling process and incrementally move away from the old state to this new state because you just can't go straight from A to B. You've got to go through these steps. You can't make that one-hop jump, but yeah.

Starting point is 01:04:16 You know, I tend to be irreverent. I'm fond of saying, if it's not incremental, it's excremental. Just me. I'm just crazy. But that's one of the pieces of how i think about it cool yeah i kind of i'm kind of close to relate to this i guess in some way not necessarily directly but is is sort of kind of inspiration and motivation and what sort of i guess inspires you to kind of keep working on this sort of stuff and has

Starting point is 01:04:46 inspired you in the past and we touched on it maybe at the top of the show a little bit but kind of what work maybe then has had the most or papers that have had the most impact on your career and personally this one maybe the one by toby for example has obviously had an impact but are the ones that can stand out to mind when you look back over your career that's had a big impact on you i kind of feel like there's just a parade of them that constantly happen yeah it was like three or four years ago that i decided i never really understood the linearizability paper never really understood you know the distributed systems thing you got to remember i'm literally uneducated i didn't go to enough college i did two and a half years in Irvine was great and I got graduate studies and stuff, but there were things I just didn't turn over. Right. I didn't dig in and look. Hell, you know, this weekend I was, you know, bored and I was watching YouTube videos on, on, you know, how can you do sketches, which are include bloom filters And how do you carry some of this stuff? Because I just didn't have that education.

Starting point is 01:05:46 I bumbled into it. I was like, oh, this is interesting. And I'm watching these things and watching a class and being presented and lectured. And it's like, boom, boom, boom. And I'm fast forwarding through it and stopping and looking and just inhaling knowledge because I'm curious. And so these things, you never quite know what you're going to bump into. But you asked two questions. The paper thing got thrown in. The first question is what you're going to bump into. But you asked two questions.

Starting point is 01:06:05 You asked the paper thing got thrown in. Sure. The first question is what keeps me going. And there's the high order bit is watching people grow. I mean, I've been running this conference for, you know, helping to run it since 1985, the High Performance Transaction Systems Conference. I met you there a couple of years ago.

Starting point is 01:06:23 And, you know, you see people and you meet them when they're like, you know, a terrified, you know, grad student coming with a professor, right? And then you turn around and they're like some of the most significant people in the industry 20 years later. And you've known them all along. You've watched them grow. You've watched them progress and move up to become so accomplished, right? And I, to me, and I see that at my work, I see that in academia, I see these, you know, varying groups of folks that I spend varying amounts of time with. And to see them grow is the thing that keeps me cooking.

Starting point is 01:06:57 Now, I mentioned I married into a family really young. Okay, well, like, and my son-in-law is 48 now, okay? And I met him when they started, my daughter started dating him at age 14. Okay. I vividly remember thinking when he was like 16 and I saw his shoulders like, well, it was not a kid's shoulders. It was a man's shoulders. Right. And so now I'm always giving him crap about being 48 and having a pot belly. He's middle-aged. He's middle-aged. And so that progression as a human being, and I was just a wonderful, I love my son-in-law. And so that progression as a human being echoes to me when I see progressions of colleagues at work or progressions of folks I meet in and out of academia. And I may not see them for a couple of years, and then I see them again. And I can kind of watch that growth. And then what can you do to poke them to be able to maybe move a little farther if you're lucky to have that? If you're not, that's okay.

Starting point is 01:07:47 You still care about them as people. And so that's a piece of what keeps me going. The other piece of what keeps me going is I think the innovation and the discovery in the industry is just accelerating. It's fascinating the computer science algorithms are changing. And the hardware and the data centers are changing. And there's an explosion of change. And then now you get to sit and play, you know, either through work or on the weekends when you're bored with the combination of the ideas and the technology and the ideas in the academia and the algorithms and what might make a difference. And it ends up being fun.

Starting point is 01:08:24 And how can you deploy that in a way which helps the people grow and the industry grow? And then if you have time, can you like throw some little cute, fun summary of it that might provoke thought and move the industry forward? And so you're trying to help people in the microcosm with individual mentoring. You're trying to help the industry by writing papers. You're trying to think through the abstractions and make products that make a difference. And so, you know, at some point, I'm never going to be as good at anything as I am at this after almost, you know, 45, 46 years. You know, it's the thing that I've grown and I love.

Starting point is 01:08:55 And, you know, I'm not going to go take up some, to me, some other thing is not going to be as much fun. Now, maybe I'll get to a world where I want to go to fewer staff meetings. Okay, those are mind-numbing sometimes. But, you know, it's just a question of playing to the stuff that you love and brings you joy. And so I don't know if that answered your question. Yeah, that answers it perfectly. I wanted to ask when you said that about this explosion at the moment, how does this compare to previous sort of, guess looking back obviously i didn't live through them but when you read about them explosions in technology like advancements in in the past

Starting point is 01:09:29 do you think this is sort of unprecedented in the way and like the possible transformations because of that could be i wouldn't say unprecedented but there's an acceleration acceleration okay okay so like i a couple months ago was just gobsmacked by the prediction that we're going to be investing ballpark in three or four years. There'll be ballpark $400 billion with a B, million dollars a year invested in large language model training. So that is, you know, you look at the trends, even then that's like 1% of the United States gross domestic product will be training large language models. So set aside AI and all that. It's just, oh, we're finding the value to spend the money there. Okay, well, what is that?

Starting point is 01:10:16 Well, that's all this GPU processing vectors. Okay, cool. Now, you know, and how utilized are the GPUs, which cost a metric ton of money? They're expensive and they're kind of like starving for data. So one of my observations is, yeah, the networking people are going to figure out how to shovel the data faster. And they are, right? Bandwidth in data centers has increased by a factor of eight in the last three years. My prediction is it will accelerate. Okay. So, you know, now you can have 50,000 or 100,000 computers in a room. And how quickly can you squirt data from one to the other? Hmm.

Starting point is 01:10:51 Hundreds of nanoseconds under a microsecond for some decent amount of money of data to squirt across. And if you don't believe that, go look at the leaf spine interconnects that are being deployed in every modern data center. And the fall through a first byte from network interface card on a server to another is, you know, in the, you know, typically sub couple hundred nanoseconds to get first byte from here to there. And then the trailer of the payload of the data goes down as the bandwidth goes up. And so, wow, it's pretty interesting. The data is intimate. But I still don't believe in coherency, like cache coherence, coherency in the distance.

Starting point is 01:11:29 A paper I've not yet written is titled I Never Wanted to be Coherent. But because I kind of feel like that ends up causing friction if you have to make sure that your updates and my updates are there, which is not any different than that turbulence of now

Starting point is 01:11:43 in the OLTP database we discussed a bit earlier. Did I answer your question? You did indeed, Pat. Yeah, that's awesome. Sorry, go ahead. I was going to say, the problem I have

Starting point is 01:11:55 is that my attention deficit, you know, I keep running around and looking at cool things and figuring out which ones to dig deep into is not always an easy thing when I'm a kid in a candy store with all the fun stuff changing. Yeah, that was good with that creative process question though kind of

Starting point is 01:12:08 a second ago kind of how do you sort of stop kind of bouncing around off everything almost you assume i do stop bouncing around okay it's you know finishing something is a challenge and i work on it yeah cool yeah i've just got a kind a few more questions while I've got you. And I kind of, I always like to ask people this one as well. And it's how they deal with setbacks

Starting point is 01:12:32 in their work, in their life, I guess, as well in general. Because obviously, academia is rife with rejection, right? It's part of the game. It's a common part

Starting point is 01:12:41 of proceedings. So how do you deal with that personally? It's hard. You take a breath, you pout sometimes, you know, and you try to stand back and say, and then you realize that you just, there's so many more fun things to do. You dust yourself off and keep trying. Right. And, you know, like everybody, I've had ups and downs in life and loss and joys and successes. And, you know, I have a track record sometimes of snatching defeat from the stuff up. And I don't know, you just keep cranking and you remember all the fun, good things. And there's so many more fun You just try to, you know, I tell everybody I'm hobbled by optimism. And so it stops me from, you know, stopping me because there's going to be something else that's fun. The next wave is always coming.

Starting point is 01:13:56 Well, there's good. There's just so many fun things to do. And if, you know, if I can't do some of the fun things I want to do here, then I'll find other fun things to do there. Does that make sense? That does make sense yeah i like that approach that's a nice out a nice way to approach and nice outlook on life i like that and i'm not perfect but i try nobody is all right i mean that's it i wouldn't want to be perfect no anyway that would be that

Starting point is 01:14:17 would be imperfect exactly yeah cool so yeah just i guess i mean two more questions and then i'll let you go pat but the this this last one is sort, I've kind of got it written down as bridging the gap. And it's obviously what I, the sort of the goal of the podcast is, is sort of to help to bridge the, help improve or bridge the gap between research and academia. So, you know, industry and academia. And I kind of want to get your opinion on how we do that as a community and what you think it's like at the moment and how it could be made better or what's bad about it kind of yeah what you're taking us on how we do at the minute on the interactions between academia and industry you know both environments have their challenges academia is gonna you know how many papers can i

Starting point is 01:15:03 publish right and so tends to dissolve the quality on a whole, but it's not a hundred percent. And so there's this churn of how are you recognized? Industry tends to do this, you know, what are your quarterly results? And so investing for the longterm is hard when you're being measured on immediate responses in three months. And even as an employee going to work, it's like, what'd you do for me this week? Right. And these things are hard. Okay. And so I don't think either system's perfect, right? They, they tend to have their trade-offs, you know, in academia, you're both trying to teach and you're trying to innovate, but you innovate without following through on all of the

Starting point is 01:15:45 requirements to make something rock solid and that's good because now you can do something more speculative and it doesn't need to have the friction of doing everything buttoned up and perfectly and in industry you can't mess it up so you got to like do everything you got to have the the backups and the archives and the recoveries and the 48 protection layers, which is actually wonderfully challenging and good. And so that is a huge thing that I've grabbed. I mean, I got when I dropped out of college, I was in industry and I didn't have the chance to go back to academia. But there's a huge value to having those high requirements on what you're doing. Right. That ends up making things. You can't just be flighty with stuff.

Starting point is 01:16:27 And so there's a wonderful interaction in between the two. And you've come to HBTS and I try to throw that mixture in the mix. And I try to, you know, put everybody together and stir and cause trouble so that we can cross-pollinate with each other. And I think that's incredibly valuable. It's a thing that I love. It's my, the highlight for me is this going away for three days with a bunch of crazy folks thinking about scalable systems and letting them try to find out who they are as people, right?

Starting point is 01:16:56 Because fundamentally, I think the connections with people are the backbone for how technology and innovation and academia and ideas foster, you know, and how do you know someone and, you know, can, the first thing you say is how's the family, how are the kids, what's going on? And you figure out how they are as a human being. And then you can kind of work in and out of what are the ideas and technology. And I think that's the right way to make stuff work. I mean, and you came to HPTS a couple of years ago and you heard me stand up and say, look, the purpose of this is the breaks, right? Getting to know each other, meeting everybody, you know, the talks are to punctuate the breaks. And that's not the common conference, but I still think it's a strong foundation for making change happen. And it's deliberate to combine industry and academia. I don't know if that was the relevant answer or not.

Starting point is 01:17:45 No, that's a great answer. Yeah, I'd just like to echo that, what you said there. And that was the best conference I've ever been to for that exact reason. You should get out more. But other than that. Cool. But no, that was a great answer to that question. And I guess, yeah, last one now, Pat.

Starting point is 01:18:01 What's the one takeaway you want the listener to get from this podcast today? Be passionate. Get an idea. Fight hard. When you get a fire in your belly, don't let go. Keep working it. Keep asking yourself, driving it forward.

Starting point is 01:18:18 And, of course, you have to timeshare between that and your day job and your family and everything. But you got something, do it. Grab hold. Push it forward. That's what makes the difference in life. Personally, I kind of made it hard on myself when I got a family at age 20, four kids defeated age 20 and I got a job and, you know, I didn't get the help that academia could have given me, but at the same time, I never lost that fire. I never lost that desire to push and learn and grow and then to help others do that. So those are the things I would give you as a takeaway is whatever the heck it is. And if you have kids and they're interested in something you're not interested in, as long as they're excited about it, that's great.

Starting point is 01:18:59 Let them do that their own. But same to you. The listeners should find that thing they are passionate about and drive it. Don go and keep don't let go and just keep fighting to to do the best they can with it that's a lovely message to end on uh thank you very much pat it's been an excellent chat i've absolutely loved it and i'm sure the listener will as well if the listener does want to find anything any of the papers we spoke about today we'll put links to everything in the show notes and i just want to mention as well that we hope we mentioned him a few times in the podcast that toby put toby was also on the podcast on episode 23 so you should go and check that wonderful guy yeah he is by the way i want to

Starting point is 01:19:32 really thank you for taking the time to talk to me it's been a joy to talk to you jack you've been great to get to know thank you very much pat it's been awesome and yeah i guess we'll see you all next time for some more awesome computer science research.

Your Ad Here

Disseminate: The Computer Science Research Podcast - Pat Helland | Scalable OLTP in the Cloud: What’s the BIG DEAL? | #50

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.