Disseminate: The Computer Science Research Podcast - Kevin Gaffney | SQLite: Past, Present, and Future | #11

Episode Date: November 14, 2022

Summary: In this episode Kevin Gaffney tells us about SQLite, the most widely deployed database engine in existence. SQLite is found in nearly every smartphone, computer, web browser, television, and ...automobile. Several factors are likely responsible for its ubiquity, including its in-process design, standalone codebase, extensive test suite, and cross-platform file format. While it supports complex analytical queries, SQLite is primarily designed for fast online transaction processing (OLTP), employing row-oriented execution and a B-tree storage format. However, fueled by the rise of edge computing and data science, there is a growing need for efficient in-process online analytical processing (OLAP). DuckDB, a database engine nicknamed “the SQLite for analytics”, has recently emerged to meet this demand. While DuckDB has shown strong performance on OLAP benchmarks, it is unclear how SQLite compares... Listen to the podcast to find out more about Kevin's work on identifying key bottlenecks in OLAP workloads and the optimizations he has helped develop.Questions: How did you end up researching databases? Can you describe what SQLite is? Can you give the listener an overview of SQLite’s architecture? How does SQLite provide ACID guarantees? How has hardware and workload changed across SQLite’s life? What challenges do these changes pose for SQLite?In your paper you subject SQLite to an extensive performance evaluation, what were the questions you were trying to answer? What was the experimental set up? What benchmarks did you use?How realistic are these workloads? How closely do these map to user studies? What were the key results in your OLTP experiments?You mentioned that delete performance was poor in the user study, did you observe why in the OLTP experiment?Can you talk us through your OLAP experiment?What were the key analytical data processing bottlenecks you found in SQLite?What were your optimizations? How did they perform? What are the reasons for SQLite using dynamic programming?Are your optimizations available in SQLite today? What were the findings in your blob I/O experiment? Progress in research is non-linear, from the conception of the idea for your paper to the publication, were there things you tried that failed? What do you have planned for future research? How do you think SQLite will evolve over the coming years? Can you tell the listeners about your other research?What do you think is the biggest challenge in your research area now? What’s the one key thing you want listeners to take away from your research?Links: SQLite: Past, Present, and FutureDatabase Isolation By SchedulingKevin's LinkedInSQLite Homepage Hosted on Acast. See acast.com/privacy for more information.

Transcript
Discussion (0)
Starting point is 00:00:00 Hello and welcome to Disseminate, the computer science research podcast. I'm your host, Jack Wardby. I'm delighted to say today I'm joined by Kevin Gaffney, who will be talking about his VLDB22 paper, SQLite, Past, Present and Future. Kevin is doing his PhD in the data systems group at the University of Wisconsin-Madison, and his research focuses on database system performance. Kevin, welcome to the podcast. Thanks, Jack. I'm happy to be here. Fantastic. Well, let's dive straight in. So how did you end up researching in databases? Yeah, so I was interested in computer science research probably since I started high school when I wrote my first program. But I never really knew what area I wanted to go into. So I had a little bit of interest in AI and ML, a little bit of interest in bioinformatics. But I ended up doing a semester of research with Jignesh Patel at UW-Madison and ended
Starting point is 00:01:19 up kind of falling in love with the idea of database research, specifically performance optimization. One of the things that his group really focuses on is bare metal performance. So we try to get performance of database systems as close to the performance of the underlying hardware as much as possible. Fantastic. So let's talk about your paper. Can you first off, for the uninitiated, can you describe what SQLite is? And I think I'm maybe pronouncing that wrong, right? Is it SQLite, right?
Starting point is 00:01:53 Yes, it's SQLite, kind of like a mineral. So SQLite is an embeddable database engine. So this means that in contrast to a lot of other database engines, it runs in the process of the host application. So this is as opposed to a database system where it occupies a single process on a computer and then clients have to communicate with that process via some sort of inter-process communication. SQLite is basically just a library. So you process data, you manage data via library calls. And it's the most widely deployed database engine in the world. And in our paper, we kind of boil that down into four reasons.
Starting point is 00:02:36 One, it's cross-platform. So the database file in SQLite can be freely copied between 32-bit, 64-bit architectures, little-endian, big big endian architectures. And it's also so portable that it's been recommended by the U.S. Library of Congress for the preservation of data. It's also compact, so it compiles down to less than one megabyte in general, and its source code is all contained in just one single C file. And then it's reliable. So over 600 lines of test code exist for every one line of library code. So there's just a huge amount of testing. 100% of machine branches are coveted in the test, which is a huge achievement. And then finally, it's fast. So it can support about tens of thousands of transactions per second.
Starting point is 00:03:29 And in some cases, it can be faster and more space efficient than the underlying file system. Amazing. I know when I first read the fact that all of the machine branches have been tested, I was like, wow, that is pretty amazing. And some achievement, like you say. Cool. So can you maybe give the listener an overview of what SQLite's architecture is? Yeah. So like most other database engines, SQLite is organized into various modules. So there are four main groups of modules in SQLite. The compiler modules, the core modules, the backend, and then accessories.
Starting point is 00:04:07 So the compiler modules have the basic tokenizer, parser, that you would see in most database engines that process SQL. But there's also this code generator. And this is a component that takes a SQL query and compiles it into a program of virtual instructions. And these virtual instructions are defined in SQLite specific to SQLite. So it's kind of like an instruction set specific to SQLite. So and then the core modules include the command line and C interfaces, how you interact with the database engine, and then the virtual machine. The virtual machine is the component that's
Starting point is 00:04:45 responsible for executing this virtual program. And it actually just consists of a large switch statement with one case for each virtual instruction. And this is kind of the heart of SQLite where all the data processing really happens. And then the backend modules include the B-tree implementation, the pager, and then the interface to the operating system. And then there are also some accessory modules like test code and common utilities. You mentioned transactions earlier on. So how does SQLite provide ACID guarantees? What mechanisms does it use to achieve this?
Starting point is 00:05:20 Yeah, so isolation is achieved through basically just file locking. Because SQLite is an embeddable database engine, it needs to lock files to prevent other processes from operating on files that could cause conflicts. It provides consistency via data structures like its B-tree and some internal checks. But when you think about atomicity and durability, it starts to get a little bit more interesting. So it provides these guarantees via one of two modes. There's rollback mode, and then there's wall mode. So in rollback mode, before modifying a page, SQLite will write the original version of the page to a rollback journal on disk. And this ensures that SQLite can recover the original page contents whenever there is a power failure or other some exception. Then after that rollback journal is flushed to disk, SQLite can write the modified page
Starting point is 00:06:16 to the database file. So in wall mode, it kind of inverts this. So instead of writing the original version of a page to a rollback journal, SQLite will write the modified page to a write-ahead log file or a wall file. And this leaves the original database file untouched. And then every once in a while, when the wall file grows to a certain size, SQLite needs to transfer the modifications in the wall file back to the original database file. And this is called a checkpoint. So the main differences in terms of performance are that rollback mode allows unlimited readers or one writer, whereas wall mode allows unlimited readers
Starting point is 00:06:56 and one writer. So there's a little bit more concurrency. And then rollback mode typically requires two writes per modified page, one to the rollback journal and then another to the database file. And the wall mode typically requires just one write. But there are those occasional checkpoint operations. And in our experiments, we evaluate both modes in our transaction processing focused experiments. Awesome. And we will probably touch on this later on when we discuss the OLTP experiments. But what are the sort of workloads when I want to use one or the other? Which is the best one?
Starting point is 00:07:28 Is it always that there's one that's better than the other? Or is it kind of there's a trade-off there between the two in a design space, but a choice based on your workload, essentially? Yeah, it definitely depends on the workload. There's no one answer because there are so many factors that go into it like for example wall mode won't work on a network file system whereas rollback mode will so there are some drawbacks and advantages to both modes so it really just depends my next question is about this is a really nice bit in your paper where you look at how kind of hardware and workloads have changed across SQLite's lifetime. Maybe we can dig into that a little bit and you can discuss these two different dimensions and how they've developed over the past 20 years or so.
Starting point is 00:08:15 Yeah. So since SQLite runs on just about anything and it runs on almost everything, it's kind of a question of like, how is computing hardware in general changed over time? So there are tons of trends that we could talk about, like, but there are some key trends that are specific to SQLite, mainly just faster storage. So devices now have faster flash-based storage that can sometimes support up to multiple gigabytes per second of read bandwidth. A lot of devices have larger memory. So the amount of DRAM in a typical mobile phone or a computer has grown by several orders of magnitude over SQLite's lifetime. And we also see a lot more parallelism. So modern computers now have several cores. Modern mobile phones even have CPUs with six or more cores and maybe a GPU or some specialized accelerators.
Starting point is 00:09:10 So there are many more trends in hardware, but these are kind of the most relevant to SQLite. In terms of workloads, it's kind of difficult to say what SQLite's been used for historically, but we know based on its design that it's kind of specialized for fast online transaction processing. So this typically includes workloads
Starting point is 00:09:32 with a small portion of inserts, updates, and deletes, and access to data through B-tree indexes. So a typical use case could be sort of a mobile application that needs to store some local state. But there is this really interesting 2015 study that was done by the University of Buffalo researchers in which they gave out modified Android phones to some students, faculty, and staff at the university. And these phones would record all the SQLite activity on the phone over the course of the study. And participants were just
Starting point is 00:10:05 asked to use these phones as normal. And then after the study, they looked at all the data. And they found that a large portion of the workloads they observed were key value reads and writes. But they also saw this significant tail of complex OLAP queries that had nested select statements and maybe joins between several tables. And then one other interesting thing they found is that delete statements tended to be very expensive because they often included nested select statements to figure out which rows in the tables to delete. So we know along with that that SQLite is sometimes used for analytical operations and data science and also for data processing on edge devices and the internet of things. So basically, it's used for OLTP, OLAP, key value, and mostly
Starting point is 00:10:54 everything in between. And DuckDB, which is a new embeddable database engine, has recently emerged to meet this demand for in-process fast analytics or OLAP. And DuckDB is similar to SQLite in many ways. It runs in-process. There's a single database file and there's no dependencies. But there are a lot of differences between SQLite and DuckDB, such as the storage where DuckDB uses column storage, SQLite uses row storage. DuckDB uses vectorized query execution, whereas SQLite uses row by row. DuckDB uses batch-optimized MVCC and has a lot of inter-query parallelism. Whereas SQLite offers very limited parallelism. So basically, SQLite is used for many, many different things.
Starting point is 00:11:47 And one of the goals in this paper is to compare the performance of SQLite on many different workloads relative to DuckTV. Building off what you said there, what do all of these sort of changes, what are the challenges do they pose to SQLite? So the key challenge is to sort of support a wide variety of these workloads with high performance, but we want to do this without sacrificing SQLite's qualities of compactness, portability, and reliability. And I would argue these are the things that have made it most popular. It's kind of difficult to say for sure, but most people probably choose SQlite because it's so easy to use.
Starting point is 00:12:30 It's so well tested and it's non-intrusive. You can basically just drop it in to your program and start using it. So we need to make sure that we maintain all these qualities as we start to think about improving its performance. Kind of a key component of your paper is, as you mentioned, a really nice extensive performance evaluation of SQLite and obviously the DuckDB as well that you're comparing it against. What were the kind of the key questions you were trying to answer with your performance evaluation? Yeah, most people that are somewhat familiar with SQLite and DuckTB have some intuition that SQLite is likely to perform better on OLTP workloads and DuckTB
Starting point is 00:13:12 is likely to perform better on OLAP workloads. But we didn't really have a good idea of the magnitude of those performance differences. So the first question we wanted to answer is just how does SQLite perform relative to DuckDB on a huge variety of workloads? And how big are those performance gaps, basically? So our second question is, what are the key factors responsible for these performance differences? In other words, can we explain these performance gaps with a few key causes? And in this part, we specifically focus on OLAP workloads. And then finally, as we talk about in the later part of our paper,
Starting point is 00:13:51 our third question is, how can we make SQLite faster on OLAP workloads? In other words, how can we shrink those performance gaps between SQLite and DuckDB? So what was the experimental setup and what benchmarks did you use in your evaluation? We run three benchmarks. For OLTP, we use TATP and for OLAP, we use star schema benchmark or SSP. And then we also use a third benchmark that we developed specifically for this study, which we call the blob benchmark. And this is basically just a benchmark to measure the efficiency of raw reads and writes to a binary object. And then we run these three benchmarks on a single thread on two hardware configurations, a fairly powerful cloud server, and on the other end of the spectrum,
Starting point is 00:14:37 a Raspberry Pi 4B. This is kind of a, what might be a difficult question to kind of answer because when people have asked me this before, I've struggled to give them a satisfactory answer. How realistic are these workloads? These are synthetic benchmarks, right? But TATP, for example, is quite a simple benchmark. How closely does that model or map to what you've seen in user studies? Yeah, it's difficult to say, as you said.
Starting point is 00:15:12 But the other obvious choice for transaction processing would be something like TPCC, which is quite a bit more complicated than TATP. So we felt that because we're measuring the performance of SQLite, which runs on a huge range of devices, but a lot of mobile phones are smaller devices. And judging on that 2015 study where we saw a lot of the workload consisted of key value reads and writes or simple OLTP operations, TATP would probably be a good choice. And then in a similar line of thinking, SSB is sort of a simplified version of TPC-H or something like TPC-DS, which is considerably more complicated. And we felt that SSB would be a better benchmark in order to start thinking about not only what
Starting point is 00:16:02 are the performance differences, but also what are the causes of those performance differences. Using a simpler benchmark would allow us to sort of boil down those performance differences that we were seeing into a few root causes. So let's start off with the OLTP experiment. What was the key results from this? On TATP, SQLite's wall mode is considerably faster than its rollback mode, about 3x. And this was expected given that, as I mentioned earlier, wall typically requires fewer writes to disk. Now, when we start to compare SQLite and DuckDB, DuckDB's performance is actually pretty comparable to SQLite for very small databases, like on the order of a few megabytes.
Starting point is 00:16:50 But as the database grows beyond a few hundred megabytes, DuckDB's performance starts to degrade pretty significantly. So at 1 million subscriber records, which is basically the scale factor for TATP, SQLite ends up being about 60 times faster than DuckDB on the Raspberry Pi. So I just want to touch on the deletes here. So you mentioned in the user study that deletes were really sort of problematic and the performance was quite low. There is, I believe, this one transaction in TATP that is a delete transaction, right? How did that sort of fare?
Starting point is 00:17:16 Did you observe anything there as to why the deletes are so slow? Yeah, so you're right. There is one delete transaction, but it's actually a fairly simple transaction. So it just requires a lookup into a B-tree. And I think it usually deletes a single row, if not a handful of rows. So as opposed to some of the deletes that were observed in that other study, where nested select statements, complex select statements were determining which rows to
Starting point is 00:17:45 delete. Can you now talk us through your OLAP experiment? Sure. So our OLAP experiments were focused on measuring the mean latency for each SSB query run on both SQLite and DuckDB. And we found that in summary across the board, and as we expected, DuckDB performs better than SQLite for this workload on every query. So on the Raspberry Pi, the gap in latency ranged from as low as 3x to as great as 35x. So there were some pretty big performance differences here. SQLite also had a lot more performance variation among different queries. So its slowest mean latency was about 10 times higher than its fastest mean latency. So there's a lot of difference when you're executing different SSB queries. And then based on those results, we kind of asked ourselves, and this leads into the next question,
Starting point is 00:18:40 what are the key bottlenecks in SQLite slowing down its OLAP performance? And that's where we start to look at what are the causes for these results. I guess the next question is, what are these bottlenecks that you found? Yeah, so to answer this question, we took advantage of SQLite's virtual database engine that I was talking about earlier. So like I said, SQLite compiles SQL engine that I was talking about earlier. So like I said, SQLite compiles SQL queries into a program of virtual instructions, and then these go into the virtual database engine, which is that large switch statement that executes the logic of each virtual instruction. So we can actually measure the time spent in each virtual instruction,
Starting point is 00:19:21 and we just did that for SSB. And we found that only two instructions, the seek row ID and column instructions, were taking up most of the time for all the queries. The seek row ID instruction says that given a row ID, find the corresponding row in a table via a search in a B-tree index. And then the column instruction says, given some row in a table, extract the value for a specified column in that row. Given these bottlenecks, what were the optimizations you developed and how did they perform? So our optimizations focus largely on the seek row ID instruction, but we talk a little bit about potential optimizations to the column instruction in the paper. So we asked ourselves, how can we reduce the number of seek row ID calls?
Starting point is 00:20:10 So SSB, as the name implies, uses a star schema. So this means we have a very large fact table, and then we have foreign key references to several smaller dimension tables. And when you draw these tables and the reference arrows, it resembles a star. Most SSB queries involve selective semi-joins on the fact table. And this means that many of the rows in the fact table are actually excluded from the final query results due to some filter applied on a dimension table. But in order to figure out whether a given row in the fact table should be included in the result, we often need to use that seek row ID instruction to do a B-tree index lookup on a dimension table. So that's why we're seeing so much time spent in the seek row ID instruction.
Starting point is 00:20:58 In order to reduce the number of seek row ID calls that we need to perform, we use this technique called look ahead information passing or LIP. This involves looking ahead in the join pipeline to build lightweight data structures that are used to filter out the rows from the fact table that will ultimately be excluded from the final result. So in SQLite, we build bloom filters on each key set that remains after applying some filter on a dimension table. And then for each row of the fact table, we first probe these bloom filters prior to probing any B-tree indexes. And because probing a bloom filter is relatively cheap compared to probing a B-tree index, and assuming a low false positive rate in the Bloom filter, we can actually eliminate a lot of these tuples early on. And thus, we save a lot of expensive B-tree index lookups.
Starting point is 00:21:52 So these optimizations resulted in a huge decrease in seek row ID calls, as we show in the paper. And it also improves the latency of some queries by a wide margin, up 10x and we also observed an overall speed up of about 4.2x on ssp awesome as you uh obviously optimize this this instruction what else becomes the bottleneck then just kind of does it kind of put more um sort of it's more time spent on the column instruction than basically relative to everything else? Yeah, exactly. As we know, as we eliminate one bottleneck, another arises. So after eliminating or reducing the time spent on seek row ID, now most of the queries actually spend a significant amount of time
Starting point is 00:22:39 on the column instruction. And we started to look into how we could reduce the time spent in the column instruction. And we started to look into how we could reduce the time spent in the column instruction. As I said before, the column instruction says, given some row in a table, extract a specified value from that row. Now, SQLite has an interesting quirk that causes this to be a little bit more expensive. SQLite actually uses dynamic typing, which is very unusual among database systems. And this means that within the same column, the value of one row could be a completely different data type than the value of another row. So for example, we could store a float in one row, an integer in the next, and a string in the
Starting point is 00:23:26 next, all in the same column. And SQLite tables can actually be defined with no types at all. So for example, like create table ABC without any type specified is actually valid in SQLite. Now, all of this means that because each column in SQLite doesn't have a predefined data type, it's flexible, we need to store type information associated with each row. So SQLite uses a row store organization, but each row consists of a header and then the actual encoded values. The header includes type information and size information for each the actual encoded values. The header includes type information and size information for each of the encoded values. So in order to extract a specified column from a row, we first need to read the header information
Starting point is 00:24:17 and then compute the offset of the specified value and then jump to that offset and extract the value. And we need to do this for every row. So that's the main source of overhead in the column instruction. Now you can kind of contrast this with a column store organization where all of the values in a single column are laid out consecutively. And you can perform using a SIMD operation, the same operation on maybe eight or so values at once. So that's where we start to see some big differences in the column instruction. Why was that design decision made way back whenever it was made? What was the motivation for it? And what are the benefits of opting for dynamic typing
Starting point is 00:24:59 versus the conventional approach, I guess, that most database systems use? I'm actually not sure. I do know that in terms of the rationale, I'm not very clear on it, but I do know that there are some benefits to dynamic typing. For example, if you're using a dynamically typed language, such as like Python or JavaScript or something like that, it can ease sort of the interoperability between the database engine and the language. It just kind of suits it a little bit better. But beyond that, apart from just flexibility, I don't know the rationale for that design decision. don't know the rationale for that design decision there is a there is a web page on sqlite's website that's dedicated to explaining this uh decision though so i would recommend checking that out
Starting point is 00:25:52 cool so yes are these optimizations you've made are they available today yes they are yes they are so these optimizations were released in sq SQLite version 3.38.0, and that came out in February of 2022. So I would encourage you to download the code and try it out yourself. Awesome. Have you had any feedback on the optimizations? Any happy customers? Not yet. We're not sure how often these optimizations will come into play just because we don't have a great idea of the workloads that SQLite is used to execute in the wild,
Starting point is 00:26:32 which would be another interesting experiment to sort of build on this 2015 study to see what is the impact of a given optimization. And seven years later, is SQLite being used for considerably different things than the study originally found. The next experiment you did was the blob IO experiment. Can you talk us through what your findings were with these experiments? Yeah, so the motivation for this experiment was our observation that SQLite is used for a lot of key value type workloads. So it's sometimes used as just a storage engine that provides transactional guarantees. So in this experiment, or in this benchmark rather, the database has issued a constant stream of reads and writes
Starting point is 00:27:17 to a raw binary object. And we vary the size of the binary object and the proportion of the workload that is reads and writes. So for small blobs, about 100 kilobytes in size, SQLite is just a bit faster than DuckDB for all proportions of reads in the workload. So SQLite executes this workload with about 30% faster throughput than DuckDB. Now, as you start to increase the size of the blobs up to about 10 megabytes in size, DuckDB gets a little bit faster. So for these large blobs, DuckDB is actually 2x faster. And we think that this is largely due to the differences in how the two engines handle atomic commit in transaction processing. As I said before, SQLite uses rollback mode or wall mode. But for these large blobs in wall mode, it actually causes a checkpoint to occur with each write.
Starting point is 00:28:13 So there are actually two writes per modified page in SQLite. So the performance of wall mode and rollback mode in SQLite are similar for this workload. But DuckDB's batch optimized version of MVCC typically requires fewer file system writes for this workload. So that's why it performs a little bit better for large blobs. Cool. Is there any sort of space there for potential optimizations based off of that finding? Yeah, I think somehow reducing the writes in SQLite would be an important target for that. Possibly increasing the size of the wall that triggers a checkpoint operation to occur. So if a checkpoint occurs when the wall grows beyond 10 megabytes in size and you're performing a 10 megabyte write with each transaction. Of course, the checkpoint is going to occur with each transaction.
Starting point is 00:29:10 So increasing the size of the wall would be one option. But we didn't get to that experiment in the paper. So my next question is, progress in research is pretty much nonlinear, right? We have ups and downs and so from the kind of conception of the idea of the paper to the publication were there multiple things well were there things that you tried that sort of failed and also I'd be interested to know in about the origin story of this project as well how did it actually come about yeah this is a great question and definitely there were things that we tried that failed. Over the course of this project, we had several great discussions with the SQLite team, who, by the way, are just great people to work with, about the viability ambitious dreams about what we could do. But they were quick
Starting point is 00:30:08 to point out that some of these things wouldn't be viable because they would break the portability of SQLite or its cross compatibility, or it would be too complex to add. And it would counteract some of the other goals that SQLite was striving for, such as its portability, compactness, and reliability in terms of tests. So we considered many things like hash joins, native code generation, vectorization, increased parallelism, and some of these things were inviolable considering those other goals, but also others were just out of the scope of this work and would take too much effort to actually implement within the given timeframe. So we ultimately decided on the Bloom filter-based lip optimizations because of the ratio of added
Starting point is 00:30:57 complexity to the performance gains. In order to implement those optimizations, we only added two new virtual instructions, made some small changes to the query planner, and also implemented sort of a simple data structure, the Bloom filter. And that's all it really took, but we ended up getting a pretty substantial performance gain, as I said. In terms of the origin of this project, my group had done some related work in the past focused on developing an analytics optimized layer outside of SQLite's core engine that would use columnar storage and vectorized query processing in order to answer OLAP queries more efficiently, and also have some machinery to keep the data in SQLite's core engine and the analytics optimized engine in sync with each other. Now, this turned out to be a little bit too clunky and didn't work out for various reasons. So we kind of turned to a different direction, which is how can we improve the performance of SQLite's OLAP performance, or how can we improve the performance of SQLite by just focusing on things that we can change within the core engine? And this is how we arrived at this project.
Starting point is 00:32:13 I'm interested to know about the development time and the ease of going in there and making an invasive change to, I'm guessing, quite a, I don't know, is it quite a complex code base to work with? Or is it quite easy to find your way around? Yeah, SQLite, and they say this on their website, is very difficult to hack, because it's been optimized pretty heavily over its two decade lifetime. Now, it's still organized in a relatively modular way. And even though the code is provided in a single amalgamated C file, the developers use separate C files that are put together in a script whenever they want to package it. So it's somewhat modular, but due to its being heavily optimized over the many years, it is difficult to make a change that doesn't break everything. So
Starting point is 00:33:05 we actually implemented these optimizations using extensions to SQLite. So we defined a bloom filter as a blob in a table. And then we used SQLite's user-defined functions to construct and probe this Bloom filter. And we developed sort of a proof of concept using this methodology. And then we presented those results to the SQLite developers. And from then on, we started talking about how we could actually integrate this change into the core sqlite code base and then the developers sort of took took the lead and actually wrote the code that would implement these changes but in terms of actually trying to make the changes ourselves i did find it somewhat difficult just because of the the complexity of sq code. So I'm guessing as well,
Starting point is 00:34:06 there is certain levels in terms of like test coverage and stuff like that and sort of things they need to maintain, right? So I'm guessing that's sort of like, I guess it doesn't make the best left to the SQLite team, I guess. Yeah, but one of the cool things about that huge amount of test coverage
Starting point is 00:34:22 is that you can make changes as long as you know what you're doing to the SQLite code base. And as long as all the tests pass, you can be reasonably sure that nothing is broken. Because there are so many tests that cover so many possible cases, you can quickly add new features to SQLite
Starting point is 00:34:42 and be pretty confident that nothing's broken. So the actual time it took for the developers to implement these changes was relatively small. I think they got it done in less than a month and all the test cases were as well. So, and Richard, the developer says that he spends, you know, a small fraction of his time writing the actual code of sqlite and the vast majority of his time writing test cases so although that can kind of bog you down in terms of a developer it can also empower you to sort of make changes more quickly because you have more confidence in your changes yeah awesome cool so what do you have planned next for future research? Are you going to be continuing in working with SQLite or what's next, basically?
Starting point is 00:35:36 Yeah, I would love to continue working with SQLite and I'm still sort of carving out my direction. But I think I'll continue to look into methods for improving the efficiency of join algorithms, possibly focusing on more general techniques that could be applied to any database system in addition to SQLite. And I think I'd like to focus on lightweight data structures such as Bloom filters to improve memory efficiency. So I'm not sure exactly what that looks like, but I'm enjoying reading a ton of papers at the moment.
Starting point is 00:36:02 Yeah, awesome. So is that sort of things like worst-case optimal joins, that sort of space? In that vein, yes. Fantastic. So I guess now we can touch on the future of SQLite as well a little bit. So how do you think the system will evolve over the coming years? Yeah, I think SQLite will continue to add tons of new features as it has over its, you know, over 20 year lifetime. And I wouldn't be surprised if its analytical performance continues to improve.
Starting point is 00:36:34 But one certain thing, and the developers note this, is that it will continue to receive the same level of support that it has for several more decades. So they say their mission is to support SQLite through the year 2050. And a lot of their design decisions are made accordingly. So they really prioritize the portability, the compatibility, and the compactness and the reliability of the SQLite library. I guess the sort of the elephant in the room sort of question or the duck the duck in the room right i mean how do you think the sqlite and duck db what will the market look like how do these two sort of things end up in over the next 30 years right is it going to be a case of eventually duck db will basically corner the market on any sort of analytics use case whereas that not a little sort of like almost relegate sql like to just being like sort of transaction style workloads or do you think there'll be some sort of meet in the middle where
Starting point is 00:37:30 the performance will gradually sort of sort of trend towards each other kind of yeah what do you think the state of play over like if we fast forwarded 20 years yeah that's a great question i think there will still be a place for both as there is now. I think DuckDB fills a really important gap in database engines today. You know, you see data scientists execute a lot of, you know, relational type operations as part of data science workflows. And sometimes that can be in a data frame type environment like Pandas. But often it's a lot more efficient, especially when the data starts to exceed the capacity of memory to execute it in an embeddable database engine like DuckDB. So DuckDB has a very strong use case for that.
Starting point is 00:38:16 And other use cases where embeddable OLAP database engines would be important. So I think DuckDB really, really nicely fills that gap. SQLite is still very optimized for transaction processing. And as we show, for larger databases, it can be considerably faster. So if you're an application and you need a database engine to store state and maybe even do some lightweight analytical processing, SQLite would be probably the ideal choice, in my opinion, just due to its valuable
Starting point is 00:38:53 properties of compactness and reliability. So I think, in general, both will occupy an important space, and I think there will be a need for both. The key question that I'm sort of interested in is, is there a need for a system that can kind of do both at the same time? And this kind of relates to a lot of the HTAP research that you see in database systems. You know, is there a need for an engine that can perform both transaction processing and real-time analytics in the same space? And if so, how do we accomplish this? So I think you can apply that same logic to small database engines, embeddable database engines. Is there a need for an embeddable engine that can have the transaction processing performance of SQLite and the analytical processing performance of DuckDB.
Starting point is 00:39:47 I guess as well, the question that follows from that is, what would the architecture of that system actually look like? How do you go about marrying these two things together? It's a non-trivial question, right? And how would you actually do that is very interesting. I agree. That's a huge question. I think that could occupy, you know, a grad student's entire career. So somebody could write their dissertation about it. But I think, you know,
Starting point is 00:40:14 there are certain properties of DuckDB that you just can't get with like a more OLTP-focused engine. And there are certain properties of SQLite that you can't get with OLAP-focused engine. We know that row store is typically better for OLTP and column store is typically better for OLAP, certain things like that. So, you know, the question is, do we try to have some sort of hybrid storage? Do we duplicate data and different types of storage? Do we duplicate data in different types of storage? But there are things that you can do to improve the performance of OLTP-focused systems. And another interesting question, for me at least, is starting with an OLTP system, how close can we get to the performance of a system that's been heavily optimized for all that are there any techniques like that you can borrow from their duck db or the i mean imagine if you
Starting point is 00:41:11 was like constructed it's maybe harder to integrate these sorts of techniques into a system as mature as sql light but if you were building a system from scratch and you kind of had that um design goal in mind are there certain techniques you could borrow from day one that would kind of maybe let you kind of close that gap a little bit more? Yes, definitely. I think a key one would be vectorized query processing, in other words, batch operations. So this definitely reduces the overhead of expression evaluation and just general query processing. The other thing is, even if you can't implement sort of a column store organization, in a lot of row store tables or OLTP tables, you have a certain number of regularly sized values. So maybe storing those regularly sized values in a location or in
Starting point is 00:42:09 a way that you can quickly process them. So for example, having them regularly spaced out in regular strides would help a lot in reducing the overhead of extracting those values from the rows. So obviously this project is kind of one of the many things you've done over the course of your PhD. Can you maybe give the listeners a sales pitch of some of your other work as well that you've done? Yeah, so I started out doing a lot of work in transaction processing. My first paper was about a modular isolation service. So the goal here was to say, you know, given a typical transactional storage manager in a database system, it's responsible for providing all these transactional guarantees, but it ends up being a very monolithic,
Starting point is 00:43:02 complex piece of code. So is it possible to take this transactional storage manager and extract services from it to provide transactional guarantees as modular code components? So we focus specifically on the isolation component. And in order to accomplish this, we used sort of an optimized version of predicate locking, which is an old idea that's been sort of discarded or used in a limited capacity. But we showed that you can optimize predicate locking for a very common class of OLTP workloads
Starting point is 00:43:34 and provide transaction isolation as a modular service. Are there any other things you've worked on across your PhD that might interest the listener? Yeah, I was actually a co-author on a recent paper at SIGMOD focused on HTAP benchmarking. So we observed that there are a few other HTAP benchmarks out there that already exist. But we really wanted to develop a benchmark that was focused on evaluating HTAP across the entire spectrum of HTAP. So HTAP is a very loosely defined term. Sometimes you can have largely OLTP workloads with a little
Starting point is 00:44:13 bit of analytics thrown in there, or maybe you have a lot of analytics with a little bit of OLTP. So we wanted to test the performance of systems over an entire range of workloads and also come up with a more structured methodology to do so and this paper's um what's its name this paper is um how good is my htap system cool great stuff again we'll put that in the air in the old show notes and the listener can go and check that out it's really cool stuff two more questions now so the penultimate one is what do you think is the biggest challenge in in your research area now yeah so the the biggest challenge that i foresee in database research that some people are already starting to work on is taking advantage of the huge amount of available hardware that we have available to us right now.
Starting point is 00:45:08 So hardware has changed over the past many years, but it's also grown in heterogeneity. So not only do we have CPUs available, we have GPUs, FPGAs, and we also are starting to see the emergence of some specialized accelerators, maybe for data processing. So I think database systems are really well poised to be responsible for taking advantage of all this hardware because of their declarative query interface. That's what a database engine does, is it takes a query that says what you want, and it determines how best to get it. So I think that's the key property of database engines that make them suited to taking advantage of all this available hardware. And my research group has already started some projects that are focused on this this sort of goal brilliant and
Starting point is 00:46:06 we'll um keep an eye on keep an eye out for that research coming out and so i guess last question now it's time for the last word what's the one key thing you want the listener to take away from your research and your work on sqlite this work really taught me the importance of benchmark-driven performance optimization in research in general. So as researchers, I think we can sometimes have the tendency to put the solution before the problem. of first taking just an experimental approach in which we execute a benchmark or a number of benchmarks on a system and figure out what are the real problems that that system is experiencing, and then sort of use that information to guide the next steps. And related to that point, I think there's a lot of importance in designing systems that are easy to profile and easy to observe. Aesculite's design as a virtual database engine where we can measure the time spent in each of
Starting point is 00:47:11 its virtual instructions was really useful in determining where the time was going in our OLAP experiments. And that sort of led to those performance optimizations that we discussed in the paper. So I think there's a huge amount of importance for designing systems that are easy to profile. Brilliant. Yeah, definitely echo that sentiment as well. That's awesome. So we will, let's end it there.
Starting point is 00:47:36 Thanks so much, Kevin, for coming on the show. If the listeners are more interested to know more about Kevin's work, we'll put all the links to everything in the show notes so you can find it there. Or you can, I'm guessing reach out to kevin via email or i believe you're on linkedin as well right so yes the message on there right if you're interested in chatting more so yeah and we'll we'll see you all next time for some more awesome computer science research you

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.