The Data Stack Show - 122: Why Accounting Needs Its Own Database with Joran Greef of Tiger Beetle

Starting point is 00:00:00 Welcome to the Data Stack Show. Each week we explore the world of data by talking to the people shaping its future. You'll learn about new data technology and trends and how data teams and processes are run at top companies. The Data Stack Show is brought to you by Rudderstack, the CDP for developers. You can learn more at rudderstack.com. Welcome to the Data Stack Show. Today, we are talking with Joran from Tiger Beetle, which is an absolutely fascinating technology, Costas. I have so many questions. It's a database that is built for extremely fast double entry accounting, which we've been talking about databases a lot recently.

Starting point is 00:00:49 And so to have such a specific use case is super exciting. And this sounds so simple, but I want a refresher on double entry accounting because it's been a long time since I've taken accounting in school. And I think revisiting the fundamental principles will help me and hopefully our listeners understand why Tiger Beetle needed to be built to solve specific problems around that, as opposed to using any number of existing database technologies. So that's what I want to ask you. Yeah. Again, first of all, I'm very excited that we have this

Starting point is 00:01:33 conversation with Joran today. Tiger Beetle is a database system that has managed to create a lot of interesting noise lately. Not necessarily about the problem and the use case that they are going after, which we are going to talk about, but also because of the very unique approach that they have with technology, how obsessed they are with things like safety and performance. And us, like people will hear from Jørgen, like they will see like a very unique perspective on approaching problems from a very engineering perspective. Jørgen L perspective. Yeah.

Starting point is 00:02:26 Talking about type systems when describing, for example, what like a double entry system is, right? So there are going to be, it's going to be a very interesting conversation. And we're going to talk about a lot of things. So people should definitely check on this one. Henry Suryawirawan, Yeah. Are you going to have enough time to ask all your questions

Starting point is 00:02:50 about distributed systems? No. Well, definitely. Hopefully we will have him back. Henry Suryawirawan, Okay. Well, let's stop wasting time and dig in. You're on welcome to the Data Sack Show. We're so excited to chat.

Starting point is 00:03:04 Well, thanks, Eric. It cost us just such a huge privilege to be with you. So excited. Well, give us your background. How did you actually didn't start in data, which is really interesting. So take us way back to the beginning. Well, I guess data in a form, but yeah. So I started with double entry data on paper, general ledgers, the T accounts.

Starting point is 00:03:27 You weren't allowed to use pencil. You had to use pen, had to be blue ink. And I remember writing my university exams with my major in accounting, financial accounting. And yeah, I always wanted to get into startups, get into business. And I understood that from people that, you know, accounting was a great way to see the world, to travel the world of business and see, you know, all kinds of industries and sectors. And just, it's the way that any kind of business can be represented. It's the schema for business. So I got excited, excited about accounting. Um, yeah, so I, I'm kind of old school data compared to yourself.

Starting point is 00:04:12 Yeah, I love it. The schema for business. What a wonderful concept, um, you know, for the financial component. Okay. So from accounting, did you get, you know, what was your entry into startups and especially software, right? Because you, I mean, obviously you're building some and kind of then shelved that for a little while and around university days got back into coding as well. And what I loved about coding was just that you could, you could, it's like you're doing engineering, but you could build things and you're, it's like the movie

Starting point is 00:05:02 inception, you can build these incredible creations and you don't have to pay for raw materials. So if you don't have a lot of money, you can build something, incredible creations. There's no limit to what you can build. My father's an architect in the real world and I kind of saw that with software, you could be an architect in the invisible world.

Starting point is 00:05:26 And I always loved, you know, I love music. I love things that are invisible. Music, stories, coding. Yeah. So I kind of, I was doing accounting, always wanting to do a startup and the full of coding was pulling me back. And yeah, so I guess my final year of university, I had two majors.

Starting point is 00:05:50 It was accounting and coding on the side. Very cool. And let's start talking about Tiger Beetle a little bit. So where did Tiger Beetle, well, first of all, tell us what Tiger Beetle is and then rewind to the beginning and tell us where it came from because it's a, you know, sort of a very specific tool in many ways. Yes. So I never would have expected that I would be working on a financial accounting database.

Starting point is 00:06:19 I didn't, you know, because you must understand it was some, I mean, it was more than a decade ago that I was in university studying, you know, accounting as my major, loving coding. And yeah, the Target Beedle is a financial accounting database. It's a purpose-built database. It just solves the problem of double entry accounting, but it does it at scale. We have a particular focus on high performance, mission critical performance, mission critical safety. And someone said to me when I was in my twenties, you know, trying to do lots of different startups, they said, you know, well, maybe one day the thing that

Starting point is 00:06:59 you really do will be in your thirties. So at the time it wasn't what I wanted to hear, but I'm really glad that I get to work on TigerBeal now because it brings everything together, you know, brings the accounting together, brings the love of coding and I think, yeah, just to tell the story in between to connect the dots. So I've always been interested in this, you know this software trifecta of speed, storage, and security. So speed was kind of the first one. I used to read Martin Thompson's blog for many years on mechanical sympathy.

Starting point is 00:07:41 How do you write fast code? And my perspective into speed was top down. So, you know, some coders start with systems and they go bottoms up. You know, they work from the bare metal and they do systems coding and then they get into higher level languages. I started, you know, with basic and then was PHP and then Ruby and then JavaScript, running JavaScript on the server and then was PHP and then Ruby and then JavaScript, running JavaScript on the server and then Node came along.

Starting point is 00:08:11 And then always trying to go faster. How do you do this at a high level? And then I was eventually doing stuff like, you know, optimizing JavaScript for branch mispredicts or cache misses. And then eventually you're drilling down now, you know, you're working down to systems. But so many of the ideas that you could write high-performance JavaScript, which actually you can, it's quite a fast scripting language, but they translate well into systems languages.

Starting point is 00:08:37 So I won a competition for fastest search algorithm in Africa, on the speed side. And I just learned, that was a great learning experience because often, you know, the textbook computer science algorithm that's supposed to have the best performance, again, you know, with modern quantum and mechanical sympathy, when you look at the hardware, there's a bit of a disconnect. So the way to really go fast is to work with the grain of the machine. You know, so on the one hand, you can build these incredible, invisible creations with software.

Starting point is 00:09:16 But if you want to make them really efficient and fast, you've got to think of the physical world and the physical machine, and so I kind of learned that from that competition that you can come up with simpler algorithms. They're not textbook optimal, but they're real world optimal. Yeah. And then I got into storage systems as well, also about seven, eight years ago and getting into storage flock research and looking at what Dropbox were doing as they moved

Starting point is 00:09:45 off of the cloud, they moved out of S3 into your own storage systems. Mm-hmm. Work that James Carling was doing, and he's quite a hero of mine just from storage, and I guess he'll reappear later in our conversation, and then, yeah, I was getting into security work as well, doing static analysis tools to detect zero days and learning a lot about what is really secure security, you know, it's a bigger concept only than, you know, than memory safety. There's a lot more to the field.

Starting point is 00:10:20 You know, how does a hacker think? And it's kind of different to how we think as coders. Those were my three interests. And they kind of came together into Tiger Beetle, I think. Long story short. Yeah, absolutely. One question on the speed side of things. You know, I'm just interested to know in terms of your motivation, how much of your motivation in trying to, you know, build

Starting point is 00:10:47 things that ran really fast, right. Or write code that runs as fast as possible. How much of that was just your own personal desire to see how fast you could make it versus, you know, tackling problems that required speed? I think it was both. So because you, I started to realize that if, you know, when I was coding in Ruby, if you had something that took a second, it really impacted people. You know, they had to wait. I also saw at the time, because I was in Cape Town, I had an interesting learning experience because maybe 10 years ago, latencies here were not great.

Starting point is 00:11:27 So if you could really think about speed, you could have a much better user experience. You know, if you were writing a single page web app that had to run in the wine land somewhere, you know, where they had, they were practically still on dial-up. So you had that first-time experience of nobody wants software to go slower. Like they want it faster and faster. But you can actually do things that make a real difference to people. But I guess I also just like to go fast. Yeah, yeah, I love it. I mean, it's always fascinating to me, you know, sort of where discoveries are

Starting point is 00:12:05 made, because many times, you know, discoveries are made in response to a problem. And it's a novel, you know, response to a problem, but also achieving something for the sake of achieving it, because you enjoy it. I think it's really interesting when discoveries are made that way as well. Well, I would love to dig into the technical details of Tiger Beetle. I'm going to let Kostas do that. But I think for our users, one thing that would be really helpful is, you know, maybe just a quick summary of, you know, what is double entry accounting? And why is that important? You know, know for a database just for our listeners who like me who you know maybe they took an accounting class a really long time ago but

Starting point is 00:12:50 i think understanding you know just the basic you know nature of double entry accounting will will give us all a really good foundation for understanding some of the technical details and decisions okay sure so i'll do my best if If I remember my lessons correctly. I know what I don't, what I don't know. You know? Um, but so, so double entry accounting, I think maybe this would be. The developer's guide to double entry accounting. Perfect. That's exactly what we want.

Starting point is 00:13:19 And I would say maybe the way to think of it is like Newton's third law. So for every, third law so for every you know for every action there's an equal and opposite reaction or laws thermodynamics energy cannot be created nor destroyed but merely changes from one form of energy to another i hope I got those laws right. But basically, that's what double entry is. So money cannot be created nor destroyed, but it mainly moves around from one party to another.

Starting point is 00:14:01 That's very important because if money can be created or destroyed, well, there's a problem because, you know, greenback is being lost and that's illegal. So, so the, the, if you look at the system, money is moving between entities. It shouldn't just disappear or fall through, you know, fall through the gutter from awareness. And then it's, yeah, for every action is equal, but opposite reaction. So if I give you a hundred dollars, you know, I should lose that

Starting point is 00:14:34 and you should receive it. It needs to go somewhere and there should be a paper trail or, you know, log that it did or didn't. So basically double entry is a way of not only counting things, because you can use it to count anything. It doesn't have to be money. But what's interesting is, you know, we might take like a 64-bit integer counter and just count, you know. You could have a counter and I've got a counter.

Starting point is 00:15:06 This isn't double entry by the way, but this is maybe how we might do this development. So, so if I send you money, your counter goes up and my counter goes down. That, that isn't double entry because you don't really have a log of why your counter went up and why mine went down. And if you make mistakes, it's possible that someone might make your counter go up twice, but mine only goes down once or three times, and then there's an error in the system and how do you detect that, like money has been counted

Starting point is 00:15:39 twice, you know, it, you've lost this principle of an equal but opposite action. So really what I think double entry is kind of like a vector. It's not just a count, not just speed. It's a vector where money is moving, but it's also moving from somewhere to somewhere. So it's that relation of this is an entity that has received money, which in accounting terms, you would say it's been debited or credited. And this is an entity that has transferred it away, which could be a debit or credit. And I think this is also something, while we're at it, developers often want to simplify accounting. Say, well, let's just use a negative balance or a positive balance. Why do we need debits and credits?

Starting point is 00:16:30 And the answer is that double entry is almost like a type system. So you have different types of accounts. An account would be a person or entity. Yourself, Eric, and I would have an account, so I can debit my account or X units and credit yours. Now you've got equal and opposite. You've got double entry. Entry in one account, entry in another.

Starting point is 00:16:57 You've got that vector. But I think the other thing that we need to understand with double entry is that it's a type system. So you get different types of accounts because in the real world, you've got different types of transactions, different types of money. So, for example, you can have a loan or a debt, and that is called like a liability in accounting. It's a liability account. It's a type of account. You can also have an asset, which is like cash or bank account, or someone owes you money. That's an asset. Or it

Starting point is 00:17:32 could be a Tesla that you have on your books. It could be something intangible, like a brand. Those are all different kinds of assets, which is another type of account. If we see accounting as a type system, so you've got a liability, you've got an asset. Those are sort of mirror images of each other. So if the bank owes me, right, then I have an asset and the bank has a liability. And then another type of account is equity, which is, you know, if you own a piece of a business and that equity balances those two out. So you've got assets minus liabilities and that's equity. That's now what the business is. But then you have to ask, well, how do assets increase?

Starting point is 00:18:30 How do liabilities increase? And the answer is, well, there's income and there's expense. Incomes and expenses, those are also two different types of accounts. So now we've got five, income, expense, assets, liabilitiesabilities and equity and if you increase the way it works with this type system i'm coming into land but if you if you want a positive balance it's just the way it is right if you have an asset account a positive balance will be a debit balance. So think of it as a T-account. An account has two counters that increase. It's basically each side of a T-account is a pinned-only log. It's immutable.

Starting point is 00:19:21 You never change anything because that would violate the law of money being conserved. But you can always add things. So you add transactions to either side of the debit or credit of an account. And if you sum up the two, the debit column and you sum up the credits and subtract them, you're either going to end up with a net debit balance or net credit. And an asset account as part of the type system is always increasing with a debit balance

Starting point is 00:19:48 liabilities increase with the credit balance you can then work out what equity accounts do because

Starting point is 00:19:56 it's the mirror of those and then income accounts and expense accounts is kind of similar so

Starting point is 00:20:03 yeah so expense accounts is kind of similar. So, yeah. So expense accounts increase with a debit and income accounts with a credit. I mean, that's the type system. So that's why it's kind of bad practice to not just use negative numbers because accounting, you never subtract. You always add. And that way you don't lose information. So I don't know if that helps anybody out there. That was an incredible double entry accounting 101 for developers. I have to say here on that.

Starting point is 00:20:40 That was amazing. I actually wish I could go back and take my accounting test at university. Okay. I lied about how many more questions. So one more question. This will be a great lead into the technical stuff. So very helpful, I think, building a foundation for why we would need specific functionality in a database for a system that has those types of requirements and, you know, additional related requirements. Why does it need to be fast? Yeah. So that's a great question. Like it's not just Top Gun Maverick, you know, I feel the need for speed on the

Starting point is 00:21:17 t-shirt, although it's fun, you know, but I think it needs to be fast because we kind of want to get back to this world where I think what's happened is hardware got so fast and software thought, well, we'll be okay. Hardware's getting fast. Now we're in the world where software is really, really slow. Like hardware's so fast and software's so slow that we've gone backwards. I don't know, the moon landing, you know, and we think, well, we used to be able to do this stuff, why can't we anymore, like, why is everything so slow? You know? And I guess I just feel like, well, you know, if this is one corner of the world

Starting point is 00:22:02 where we work, which is a database to track financial transactions, well, at least it should be fast because we don't want people to wait. But more concretely, the reason why it should be fast is because there's a big problem out there is that the world is becoming more transactional, more financial transactions or business events. One business event leads to many double entry journal entries. So you can have a business that only does 10 ride-sharing business events a second. That can translate into maybe a hundred or maybe a thousand double entry journal entries a second, that can translate into maybe a hundred or maybe a thousand double entry journal entries a second. And, you know, so the Visa network will do a few thousand business events, but if you work up the journal entries for that and all the fees and all the

Starting point is 00:22:58 partners out of this, like for everybody else, you know, it adds up and, but that's the old world of shintech. And the new world, like on the internet, the scale of things is just increasing. The world's becoming more transactional. So we saw it in the cloud. You used to pay for a server once a month, or EFS used to buy them.

Starting point is 00:23:19 Then you pay for them monthly, then you pay for them hourly, then you pay per minute, now you pay per second, now you pay per function millisecond. It's becoming more and more transactional. So there's a real problem in that the existing database that people use to track this, they can't keep up. We're reaching a point where there's too much RoLock contention. Every time you have to debit an account or credit, and often it's a small

Starting point is 00:23:49 number of accounts, you know, like your Fiat account, that becomes a hot account and then RoLoc serialize everything. And it's a real problem. Everybody we chat to in FinTech has this problem. So you, if you can go fast, you can actually just solve that problem for people. You know, the Black Friday issue. And also it means that, you know, they might be paying a few thousands of dollars a month. Cloud hardware, if you go fast, you can change that, you know, cut that.

Starting point is 00:24:20 Obviously, that's not always important. But for some people, it's very important because that translates into cheaper processing fees for payments. So there's work going on in some parts of the world to limit billions of people above the critical poverty line. And one way to do that is just give them access to fintech that has cheaper payment processing fees, which, you know, a database like Tiger Beetle. So that was actually where Tiger Beetle came out. We were analyzing an open source payment switch and it had these problems that they needed more performance, more cost efficiency, but I guess the final reason

Starting point is 00:25:00 is that, you know, for example, in India, if I understand correctly, there's the famous IPMS switch there does, I think, on the order of 10,000 transactions a second. It runs so hot. The only way to keep that system running is, I think, Redis in memory, you know, so account balances are all volatile memory. And that's fine because they've designed a system that if they lose the Redis mode, they can, you know, restore the data from banking partners. But that's where you go and say, well, what if we could do a hundred times

Starting point is 00:25:40 more performance? Because if you can do that, well, maybe we can start to use stable storage and this, and then they don't have that problem anymore. So you actually performance buys you like a better operator experience. It's just nice. You know, you don't have to worry, you know, Black Friday, you can trade performance for safety, you know, better, that all great experience. Alexi Vandenbroeker Yaron, I have a question about performance. And it's like a question that I get many times from engineers, actually, when I have a conversation with them about performance and I'm saying, oh, I tried

Starting point is 00:26:19 like this database system and it was really fast, right? And then they're like, oh, are you talking about throughput or latency? So my question is, I see like the numbers that Tiger people can achieve and they are like amazing, right? Like they are huge numbers there. But what is important, like more important in these systems, like these financial transaction systems, right? Is it latency that matters the most or is it throughput that matters the most?

Starting point is 00:26:50 And is there a trade-off between or you can have at the end, like both? So I think that's what makes a double entry or a ledger database. That's what makes it so hard because both are important. You need low latency and you need high throughput. You need high throughput because the world is becoming more transactional. You know, there's just more and more volume and that unlocks the use cases. But you also need low latency because often these databases are tracking business-critical events.

Starting point is 00:27:27 So there, and remember again, you know, for one business event, there might be 10 to 20 general entries. So if one general entry takes 100 milliseconds because of contention and row locks, now the business event is taking, I don't know, one second or two seconds. And then that's a problem because now, you know, ordering a cab is going to take me one second. It shouldn't be, you know, we need to get that. It needs to be fast. So there's real business pressure on latency as well.

Starting point is 00:28:02 There's business pressure on latency as well. There's business pressure on throughput and latency. Also, because again, often, you know, for example, in FinTech, they deal with these nightly batches that arise. If you don't have enough ingest throughput, sometimes these systems, they don't, the data gets delivered at night and the morning when they're supposed to be open for trading again, they still haven't finished the nightly import, you know, and that's where we really want fast ingest because it can save you like that. But you've kind of got nowhere to hide. You need low latency, average throughput.

Starting point is 00:28:37 Okay. We'll get more into this later on how it can be achieved because I think there are probably like many different things that need to happen in order to like to improve both of them. But before we go there, let's talk about Tiger Beetle as a database system. It's so, Tiger Beetle is very, it's almost like laser focused in one use case, right? It's almost taking, let's say, something that someone would build with a schema over a relational database and putting like all the logic around like this schema

Starting point is 00:29:16 inside the database in turn on its own, right? So we have, let's say, a purpose-built database. Why is this needed? Why do we need something so laser-focused on the use case itself? And we cannot, let's say, just keep scaling Postgres or, I don't know, some other database system, ClickHouse, to do the work that we do with Tiger Middle. Yeah, thanks, Kostas. Great question.

Starting point is 00:29:50 So I think firstly, the domain is so valuable, so a really valuable form of data. So it's the kind of domain where you don't want to use Postgres. You don't want to use Redis because you can't afford a single node system. So you need durability, you need high availability, you need replication, but the replication really needs to be part of the database. It can't be an add-on, you know? And the other thing is, I mean, you want open source. So obviously you can get higher reliability in the cloud,

Starting point is 00:30:25 but why can't we have this as open source for our databases? Why can't we have strict serializability? Why can't we have automated leader election? These kinds of things should be in an open source database, I think. And kind of for the domain, you need that. Listening to Martin Thompson just convinced me that it's no, you know, you, these days you, you, for a ledger, you need to have just always on mission critical.

Starting point is 00:30:56 And that's kind of part of providing a great developer experience that you can give people a single binary. They can spin up a cluster and the database just runs. And that was what we wanted to do. So that was why we didn't do a Postgres extension. We didn't do Redis. I mean, those would have been options, you know, but we wanted to, I guess what

Starting point is 00:31:23 we realized is that here's an interesting domain, so we had a real problem looking at a real payment switch. I was working for Coil. Coil are a startup in San Francisco and a lot of payments experience. I've seen these systems in, you know, again and again, and we kind of just saw, well, everybody is reinventing, they're reinventing a ledger database. And eventually it just got to the point

Starting point is 00:31:53 where we figured, well, let's go do something about it. Let's give people a ledger database because everybody is taking SQL and then 10,000 lines of code. And eventually they've got a ledger database, but they don't know it. And we thought, well, let's go do it properly.

Starting point is 00:32:07 And because that's not a thing, you know, you can take another database that isn't managed for the domain and you can make it into double entry. The problem is you start with the raw material of throughput and latency. But when you look at what you've actually ended up with, your finished product, you get much less. You get about a thousand transactions a second, and your latency is maybe lumpy. You've got data risk. So we thought, well, because we've actually got such a simple, tightly focused domain

Starting point is 00:32:41 of double entry, let's go then deep on the technology and deliver a great experience in a single binary. And why do you think that's because, okay, like double entry is not like the new concept, obviously, right? As you said, when you were at college, you had to use the Wink, everything. Well, not enough, right? Yeah. It's been around for a while. And okay, the truth is that in tech, like finance and accounting, it's not like a driving force for innovation, right?

Starting point is 00:33:15 So why 2022 is the time that we can afford such a specialized piece of software that is not just the database, but is a database for a very specific data model, right? And why wouldn't it do that like before? Yeah, so I think I, I love that Henry Ford quote, you know, you can have any car as long as it's painted black. And I think for a long time we've had, maybe, you know, we've been in the situation where you can use any database and there can be as many databases in the world as you like. So long as they are Postgres or MySQL.

Starting point is 00:33:56 And I think it's because the world could only afford like that many databases. They're so hard to build or they used to be, you know, they take years and years of, I don't know how many people, you know, have put into those systems to get them to where they are today because they're incredible systems, they took 30 years, but I kind of believe that we're at the point like since 2018, where there've been like five big things that have changed and that means that it's not just tiger beetle i think we're going to see an explosion of new purpose-built databases because databases are like car engines if you have a great

Starting point is 00:34:39 engine you just go really fast you know if you want a great solution, if you want to simplify the application layer, just have a great database that is good for the domain. So for example, if you want to store lots of user photos, have a great database like S3, that's the right tool for the job, you know, don't use Postgres for your blobs, you know. If you want a greater queuing system, use Kafka or Redcanda even better, you know. Cause that's the right tooth for the job. And then when we looked at double entry, well, we were still in the old world where we were asking, well, where's the double entry database and could, you know, I mean, there's lots of ledgers, but there's nothing that is like a database that is really high performance, because

Starting point is 00:35:27 that's what databases are, you know. They give you all the invariance of the domain are protected by the database. You get the invariance enforced, you know, like data consistency, isolation, transactions, all these great things. The database has solved the whole problems for you and they give you performance. Though we looked at ledgers and saw it didn't seem to be there yet. But I do think we're kind of, you know, we can go into it. These five reasons why we're going to see that, you know, the world is ready for more

Starting point is 00:36:04 kinds of databases because they're just a great way to solve lots of problems. And what people are using for implementing like a double entry system or a ledger. So from what we saw and we looked around a lot, it's typically a SQL database. Then they wrap 10,000 lines of ledger code to create double entry. The reason, it's a very deceptive problem. And what we hear from people is you don't get it right the first time. So it sounds simple, but to do it well is really hard because you're solving

Starting point is 00:36:42 latency, throughput, you're solving strict serviceability, you're solving strict serializability, you're solving consensuals. There's so many hard problems. And on top of that, you kind of end up with a low, again, low throughput. But typically, it's SQL, or people will reach for Redis, and then you're taking shortcuts for safety.

Starting point is 00:37:01 Or they're using cloud databases, and there you're then paying a fortune. Yeah. And, and what it really means is that new like digital fintech, you know, use cases, you get these interesting, this, for example, at Quail, we were focused on the open interledger protocol which is a way it's like internet but for

Starting point is 00:37:27 the payments world connect all the payments networks the old networks, the new networks banks mobile networks let everybody send money like you can send an email or send a data packet, those kinds of

Starting point is 00:37:43 applications need a high performance database. You can't build these future, you know, future applications on the old, you know, old system of SQL plus 10,000 minded code because you hit that, again, you just hit that problem of RoLox and contention. You need a different design to unlock your use cases. Stas Miliuszakowskyi, Yeah, makes a lot of sense. And okay.

Starting point is 00:38:10 I think everyone's like pretty much aware of what are the primitives for interacting like with a SQL database. You have the real model there and the algebra, like tables, the types, blah, blah, blah, like all that stuff. So in a system like Tiger Beetle, what are the primitives that the user is interacting with? Yeah. So I think like Andy Bavlo or Justin Jaffray would say, well, be very careful if you're building a database and the query language is not SQL.

Starting point is 00:38:41 Be very careful. So we were very careful and we figured, well, if you ever wanted a query language for database that wasn't SQL, double entry is a good schema to have because it's tried and tested 500 years old or more, way more, a thousand, I think, thousand years old, more even. It'll probably be around after SQL. We'll still have double entry, you know? That makes you sleep well at night.

Starting point is 00:39:11 Okay. So, yeah. So what is the interface to Tiger Beetle? It's very simple. It's, you have accounts, all these T accounts, debit and credit balances, and you have transfers between accounts or journal entries. So you have two data types in Tiger Beetle accounts and transfers between accounts where it transfers like a vector, you know, debit this account,

Starting point is 00:39:35 credit this account, this is the amount, this is the time. So Tiger Beetle isn't SQL. It's double entry accounting, accounts and transfers. It's very, it just gives you these nice primitives out of the box. And it's kind of what you want to, because for financial data, you don't want to mix it up with your SQL data. You don't want to put it in a general purpose database because often you have very different compliance concerns for financial data. So you want separation of concerns. So, you know, same reason you want S3 object storage.

Starting point is 00:40:12 It's different data, different performance characteristics, retention characteristics, it's all different. So, so that's the interface for target needle. Stas Mouziskelaarhkosmikovic- Yeah, makes a lot of sense. Okay. So let's talk a little bit more about the technology now. You mentioned like some things that you are like very interested personally, and also like the foundational, let's say, parts of the Tiger Beetle design. One of them is safety, right?

Starting point is 00:40:41 And when I was going through the documentation on Tiger Griddle, I saw that you've done a lot of work on ensuring that you take care of storage faults. And I was like, storage faults? Like, I thought that, you know, we found the operating system for that stuff. Like the park system. Like, what are you talking about? I thought that what something is just committed on disk is, you know, just, I can't trust it.

Starting point is 00:41:04 So, and usually like in distributed systems, we talk more about like the network and like the splits that might happen there that's like the most common, let's say, topic of discussion when it comes to fault tolerance and availability. So tell me more about the storage faults and how important they are and how common they are also. Yeah. So this is kind of coming out of like just my love of storage

Starting point is 00:41:31 systems before TidyVisual. And what was interesting is that this was getting into how can we have new databases today? And the reason is kind of, well, we have to, because the existing databases are so tried and tested that on the one hand, okay, well, they're pretty reliable, but on the other hand, we know exactly where they've broken, like where they have latent correctness bugs. So if you're building a new database today, there's a lot of research on the

Starting point is 00:42:04 issues, you know, where you can lose data in Postgres. There was FsyncGate in 2018. That took, I think, at least two years for Postgres to fix. They had to switch to DirectIO, you know, because the Linux kernel page cache is not trustworthy. If you ask the page cache for data, it can actually misrepresent what's on disk, the Linux kernel, you know? And the reason is because disks are just faulty, you know, they're the real world. And the storage fault research out there is that disks do fail. And, and I've got ZFS to thank for this, you know, for opening my eyes, but just the

Starting point is 00:42:45 way that they cared so much that a file system shouldn't allow bit drops. You know, cosmic rays can flip bits in the disk. So many reasons that a disk can fail. I just recently, you know, the Hacker News, they lost two, I think two SSDs simultaneously. Yeah. And it happens and I've been running like a MySQL database that got corrupted because of a disk fault.

Starting point is 00:43:12 I've had, I've replaced several disks in Red Arrays where the disks went into read-only mode because of sector failures, you know, and so you get all kinds of storage faults, you know, disks can corrupt on the read or write path. They can, just a bit of firmware bug that will read from the wrong sector. It's very rare, but the thing is with probability theory, the more of these clusters you operate, if you operate 10 clusters, your risk has gone up by a factor of 10. Yep. So, and I mean, even a single disk in a 32-month period is, I think it's on the order of 3% chance of latent sector error. A latent sector error can wreak havoc, you know, with the

Starting point is 00:43:57 database running on top of it. So, so kind of with Tiger Beetle, we thought, well, there's a lot of storage fork research. A lot of it was coming from, you know, University of Wisconsin Madison, Renzi and Andrea Apacci de Sou. They were also wrote the OSTEP book, which a lot of developers love. But basically lots of good reasons, you know, why databases need to start changing. On the distributed side with like Paxos and Raft, what was interesting too is that if you want to write a set consensus protocol, you actually have to think of the disk. So there's this famous quote of Leslie Lamport where he kind of said view-standard replication was conjecture.

Starting point is 00:44:39 He said, you know, if you don't have a formal proof, it's just conjecture, right? That was what he used. But what's interesting is that Paxos, if you look at the formal proof, the fault model is the network fault model. But yet Paxos assumes stable storage. And that begs the question, well, how do you get stable storage? Where's the disk fault model because for paxos to be correct it relies on stable storage otherwise it's not correct and that's what i always loved about the eastern replication is especially the 2012 revision was they just had it

Starting point is 00:45:19 naturally you know this intuition that a consensus protocol should be able to levitate like a holograph. It should run, it should be able to run only in memory if you wanted to. And I love that because it shows, you know, the disks do fail in the real world. And if you wanna do a real consensus interpretation, you have to think about checksums and it's much harder imagine. You have to think about what if the disk doesn't F-sync, if that happens, you know? And all our Paxos and Raft formal proofs just break down if you introduce that storage fault model.

Starting point is 00:45:56 So yeah, again, Wisconsin Madison had this great paper called Protocol-Aware Recovery for Conferences-Based Storage. And they said, well, even the way we design distributed databases must change because up until now, you had the global consensus protocol like Paxos Raft, and it assumed, you know, that you had stable storage, you had the local storage engine, but the two were never integrated, so they couldn't query each other, it's kind of like you're running a database on top of ZFS. But actually, if you want to do this correctly for very high availability,

Starting point is 00:46:30 you need to be able to ask ZFS, did you have a storage fault? Because if you can tell me, I can maybe help you out because I can use distributed recovery. Otherwise, ZFS can't recover, so your cluster could get lost much sooner. So if you want to build very highly available databases, you need to start looking to new consensus protocols beyond Paxos or Raft, or if you use them, you at least want to integrate them with your local storage, that they work together and that paper show how to do it.

Starting point is 00:47:06 So that was also 2018. Yeah, I think we probably need like a full episode just to talk about storage. Yeah, yeah. I'm rabbit, rabbit hollering. Yeah, yeah. Maybe we should do that. But we're running out of time here and I have one last question before I give it back to Eric.

Starting point is 00:47:28 You mentioned use-tab replication and Paxos and Raft, right? And Paxos and Raft are like the two protocols that have pretty much dominated consensus when it comes back to distributed systems. Why it took the time that it took for used number application to start being implemented and like people start talking more about it and using it. And why you decided to go with that instead of Paxos or Raft, right? Yeah, so it comes back to, I was following the work that Dropbox were doing on Magic Pocket, their S3 solution, which was amazing, and James Cowling

Starting point is 00:48:09 was one of the engineers on that. And basically Heidi Howard was speaking about VStandard replication having better latency when it has to elect a new leader compared to Raft. And the reason is that Raft and Paxos have a random leader election algorithm. So, you know, it's a selfish algorithm. So if you think the leader is down, propose yourself as a candidate. And ViewStand replication, the 2012 edition is sublime because what it doesn't do that it's the unselfish protocol it's let's work together as a team kind of leader election

Starting point is 00:48:55 protocol so what happens there instead is that if you think the leader is down, then imagine all the replicas in array. Just vote for the next replica. And this way you can predict who the next replica will be. In Paxos or Raft, you can't, you don't have that foreknowledge. So you can do things with view stamp replication. And here it's important for people to understand also that Raft is view stand replication. It was influenced by Brian Oakey's thesis, Brian Oakey sort of pioneered consensus a year ahead of

Starting point is 00:49:31 Paxos in 88. And you'll see it in the Raft paper that they mentioned, you know, it's most similar to VSR, but it essentially is VSR. The only difference is the paper presentation. It has RPC. It also has the random election algorithm. But viewstamp replication, you get better latency because the cluster is working together as a team. If the leader is down, they almost always going to switch over to the next one in the ring. Whereas Rost, you have a problem of dueling leaders. So because everybody's putting themselves forward, you can get this

Starting point is 00:50:09 thing called dueling leader where you maybe have a split vote, then you have to redo the whole leader election, which is not great in production because now you're having, you're adding, you know, tens of milliseconds to hundreds. And so Raft will mitigate that with, you know, randomized padding, but you don't need any of that in view stamp replication. Basically, long and the short is, Raft is view stamp replication.

Starting point is 00:50:35 BSR is actually a better name for it because it is the original name. It's true to computer science history. And the newer view change or leader election algorithm for BSR 2012 is more advanced than what is in Rausch or Paxos, I think. That's why we picked it. It's a surprise that James Carling, doing all the storage work, he's one of the authors of the 2012 BSR paper. So I saw Heidi Lauer speaking about it. Martin Thompson was speaking about it.

Starting point is 00:51:08 Mark Brooker was asking the question, why is nobody using Eastern replication? So we thought, well, okay, all these people are heroes. Let's go try it. That's awesome. Yeah, there's a lot more that we can talk about, but I wanted to like to respect the time here and also the microphone to Eric. But we'll record at least one more episode. There's like plenty of wisdom there that I think everyone would be interested to hear from you.

Starting point is 00:51:43 So Eric, all yours. Yeah, absolutely. Well, we're close to the buzzer here. So just one more question. There's so many fascinating underlying principles in what you're doing at Tiger Beetle that are interesting outside of the world of double entry accounting. If you weren't working on Tiger Beetle, what problem would you go solve?

Starting point is 00:52:08 I don't know. I think I, yeah. I just love that databases have got, you know, so many cool things in them. They cover the whole range. It took me a while to figure it out. Yeah, and you get, you know, speeds, you get storage, you get security,

Starting point is 00:52:23 which is just another way of looking at safety. All the cool computer science things, but you can think of them from a mechanical sympathy point of view. What I think is also special is, you know, I've alluded to these five big changes, but some of the others are, you know, I always wanted to do systems, you know, and just write a single binary that runs everywhere without the JVM. And now you've got these great systems languages. There's Rust, there's Zig, and they're incredible. Now you can do that. They're much safer than C. They kind of move memory safety issues.

Starting point is 00:53:02 They have different approaches, but they move that into a lesser order of magnitude of concerns, you know, so, so great, safer languages, great time for databases. You've got, I always wanted to do like IO, you know, and everybody is struggling with, you know, how to do async IO on Linux, which is such a pain, and now you've got IO-U-Ring. It's perfect for databases. You know, good data database, thanks to Ian Saxper. And

Starting point is 00:53:29 also safer languages, better IO. But finally, it's just, there's a new way, you can build these databases much quicker because now we've got deterministic simulation testing that Foundation DB and James and CJ and them at Dropbox pioneered, quicker because now we've got deterministic simulation testing that FoundationDB and

Starting point is 00:53:45 James and CJ and them at Dropbox pioneered, where you can now run these databases in a simulation and tick time deterministically. It's kind of like Jepson, but on steroids because Jepson, it's not deterministic, so you can't replay. If you find a bug, maybe it takes you two years, you have to wait till you find it, then you can't replay. if you find a bug, you know, maybe it takes you two years, you have to wait two years to find it, then you can't replay. These new deterministic simulation techniques, if your database is designed for this, then you can actually tick time in a while truly. And you can simulate years of testing in just a day, you know, and then it gives you the confidence

Starting point is 00:54:25 that, Hey, let's build new purposeful databases because we've got safer languages, we've got better IO, we've got storage hot research, we've got all of that is behind us, it's there. And we've got simulation testing, which is kind of Fred Brooks' silver bullet. You know, here you can actually really speed up your development velocity and build these things much, much quicker. It gives you confidence. Yeah.

Starting point is 00:54:51 So I don't know what else I would be doing if it wasn't Tiger Beetle. Maybe I'd be working on on Red Panda or something. Just another pretty cool database. but yeah, just pretty happy. It's awesome team that we have and we kind of just that ZFS spirit that we love, you know, let's just build something safe and make a contribution be efficient, not waste. Just a lucky time to be doing new things, I think. What a wonderful answer and great to hear from someone who loves what they do.

Starting point is 00:55:32 So, Joran, thank you so much for joining us on the show. It's been delightful. And we absolutely want to have you back on because there's so many topics that we didn't have time to get into or go deeply enough into. So thank you. Oh, thanks so much, Eric Costas.

Starting point is 00:55:49 Just being a real joy in such a pleasure. I have two major takeaways, Costas. One is what an unbelievable, concise explanation of double entry accounting for developers. I mean, even the analogies are so good. I just really appreciated that, and I think it speaks to Joran's ability

Starting point is 00:56:14 to take concepts that can be very complex and have a ton of breadth and distill them down into really easy-to- to grasp concepts. But the other that I really liked was that he just really seems to love solving these problems and figuring out how to make things fast and how to make things safe.

Starting point is 00:56:42 And that was just really interesting. I think the other thing that stuck out to me was, you know, recently we talked with the founder of TileDB who sort of went and worked on a low level on the storage layer of databases. And Joran said something that really stuck out to me, which was that he learned that he needed to work with the grain of the hardware, which I thought was really fascinating and reminded me of our conversation

Starting point is 00:57:12 with Stravos from TileDB. They're sort of equal, or not equal, but they're sort of similar learnings there. So those are my big takeaways. What a fascinating conversation. I feel like we could have gone for two or three hours. Yeah. For me, one of the things that I will keep is from what it seems like the rise of purpose-built database systems out there, which I think is very interesting. And okay, we see something like that happening, like in the financial sector, which kind of makes sense because it's a very lucrative sector, right? If you solve a problem like really well, like you're going to be rewarded for that. Like it's, and I'm not talking about, you know, like making money for

Starting point is 00:58:04 like the Wall Street, right? It was very interesting to hear from Yura, like how reducing like the cost of financial transactions is actually like helping people to, you know, like rise beyond like the poverty level. Right. So it's very, it's an area, it's like an industry where if you can deliver value like this value like very impact. Right.

Starting point is 00:58:36 But hopefully we're getting to see like something similar in other industries too. We'll see more of let's say these kind of domain specific database systems. Which leads to another observation that building database system like has become easier and that's great. Like more people can go and build very complicated systems, validate them and get them to market like much, much faster than in the past, which is amazing. Anyway, I hope we will have him again in the future. It's at the beginning for Tiger Biddle and I'm pretty sure that both from a

Starting point is 00:59:21 technical and a business perspective, they're going to do great. So we'll talk to him again in the future. I agree. Well, thanks for listening in. Subscribe if you haven't. Tell a friend and we will catch you on the next one. We hope you enjoyed this episode of the Data Stack Show. Be sure to subscribe on your favorite podcast app

Starting point is 00:59:41 to get notified about new episodes every week. We'd also love your feedback. You can email me, Eric Dodds, at eric at datastackshow.com. That's E-R-I-C at datastackshow.com. The show is brought to you by Rudderstack, the CDP for developers. Learn how to build a CDP on your data warehouse at rudderstack.com.

The Data Stack Show - 122: Why Accounting Needs Its Own Database with Joran Greef of Tiger Beetle

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.