The Data Stack Show - Shop Talk: Why Are There So Many Flavors of Databases?

Episode Date: October 24, 2022

In this bonus episode, Eric and Kostas talk shop about the wide world of databases. ...

Transcript
Discussion (0)
Starting point is 00:00:00 Welcome to the Data Stack Show Shop Talk, where Kostas and I talk shop. And it's one of our favorite things to do. And also, we should tell everyone, Kostas, if there's a topic you want us to discuss, send us an email and we will discuss it on an upcoming Shop Talk episode and we will send you a Datastack Show coffee mug and t-shirt. So please email us
Starting point is 00:00:31 eric at datastackshow.com costas at datastackshow.com or brooks at datastackshow.com You'll probably get a faster response if you email Brooks but please send us topics you want us to
Starting point is 00:00:42 and we'll tackle them on Shop Talk. Okay. Yeah, or the calls are resolved on Twitter as well. Oh, yeah, that's true. Yep. For sure. Okay, Kostas, here's my question for this week.
Starting point is 00:00:55 And this is me as the less technical co-host of the show asking you as the more technical co-hosts of the show asking you is the more technical co-hosts. One thing that is really interesting to me is it seems like there are a lot of new databases being created, like different types of databases, right? I mean, there are lots of, like there are lots of database types out there, right? And one of my questions is for maybe flavors is a different, is a better word to describe it, right?
Starting point is 00:01:32 I mean, fundamentally databases like, you know, have a lot of similarities, but one of my questions is why is that, right? Like building a database system that can be widely successful seems like a ridiculously hard undertaking, especially with so many incumbents. And yeah, it's just interesting. I don't know, if I was going to pick a problem to solve, I don't know if building a new database system
Starting point is 00:02:00 would be it just because it seems like there are so many established really good options out there. Let me make sure that I understand the question. The question is about why we have so many different labels or why it is so hard to build a database or both. I mean, I can answer both. I don't like something you like. Part of my question is why do people keep trying
Starting point is 00:02:29 to invent new kinds of databases? That's really more of the question. I'm not so sure that they try to do that. Like what is like the latest flavor of database that you have seen out there that you didn't know about it. I was just thinking back on what was the quine, right? Like the graph database.
Starting point is 00:02:53 Yeah, but that was more of like a processing system. It wasn't exactly like a database. And utilizing, let's say, it was adding like a graph layer on top of a key-value store, which already existed. It's like the Soviet database technically is the key-value store and they put like a real-time graph processing system on top of that, right? So that's a little bit different, but... Stig Brodersen How about Fireball? Like...
Starting point is 00:03:30 Well, Fireball... That's a little bit different, but... How about fireball? Like, I mean, it is, it's like a hard fork of ClickHouse. But that's, that's relevant in the sense that like they have added a lot of stuff on the board to make it fireballed. So like fireball is not exactly like ClickHouse, but let me, okay, let me, David Pérez- does that make sense? I know I'm probably, my question probably like reveals a lot of my technical ignorance. But no, no, no, no, no, no, no. I think it's a reveals the how to say that like the obscurity around like database systems and why like database systems are... how to sort of like, there's like a veil of, I mean...
Starting point is 00:04:14 Henry Suryawirawan- Mystery? And yeah. Yeah. Which I think also has to do with like how hard it's supposed to be like to build one, right? But okay. Let's, let's take this like from the beginning. Data-based systems are primarily, let's say, categorized based on the
Starting point is 00:04:36 workloads that they serve best. Okay. And a workload is, it'll make many definitions of like workloads, but it's mainly what kind of data we are working with and what kind of processing we want to do on that data, right? So having a dashboard is something like serving a dashboard, it's like something like fundamentally different to doing real-time queries on streaming data, right? So, okay, fundamentally, all these systems are like database systems
Starting point is 00:05:14 in the sense that they operate over like a set of data. They expose, let's say, an interface where the user can ask a question and process the data and get an answer, right? Obviously, you got, like, technically, let's say you can take, let's say, Postgres, okay, and you can use Postgres to do, to use it as a transactional database, you can use it to run analytical queries, you can use it for time-series data. Maybe you can also use it like with streaming data. Okay.
Starting point is 00:05:52 But there's tons of trade-offs. I mean, yeah, like mainly how much you can steal and how much you can cover the use cases for each one of these. Right. So yeah, like we reach a point where we need to start like specializing. So we suddenly have like time series databases, right? We have OLAP systems, which is like dataware thousands. And then we have data lakes and then we have graph databases and
Starting point is 00:06:22 key variant stores and in-memory systems. So yeah, we have like different labors because we need to specialize in order like to maximize, let's say, what, how well we can solve each one of these problems. And as we need like to do more and more on each one of these workloads, the more innovation we will see there. Having said that, yeah, databases are like, I don't know, like maybe together with operating systems and compilers, like the three most complex systems to build. I mean, not as a toy, but as a product, right?
Starting point is 00:07:00 Probably closer to... probably closer like to an operating system, to be honest. It has many commonalities in terms of the different components and stuff. At least combiners are very difficult to build because there's a lot of algorithms and stuff that you have to do there. But in terms of their architecture, I think they're a bit simpler compared to something like a database system or an operating system. But databases serve many things with operating systems, like how they handle memory, how they handle storage,'s say, workloads and the need to specialize in these workloads, together with the fact that it's really hard to build a database, is what I think creates, let's say, this difficulty for people to understand why we need all these different databases and why we keep like trying to, to build new ones.
Starting point is 00:08:08 Yeah. That makes sense. It's like helpful for me. Yeah, it makes total sense. I think that the, yeah, that makes total sense. Yeah. It just, do you think, do you see like if a company has, I mean i mean there's there also seems like a lot of operational overhead right which is probably why smaller companies just use like a very simple
Starting point is 00:08:31 sort of standard set of databases right like it seems like an individual company would use a wider variety as they have the scale and resources to manage that right because like having multiple different database you you know, like a wide variety would introduce a lot of operational overhead, right? Yeah, yeah, absolutely. I think like introducing data infrastructure in general, like I don't think it's just like the database that applies to, but you, like people should always do that when they scale enough that they have the need to do it.
Starting point is 00:09:13 Otherwise you are just having like too much complexity and you're going to get hurt instead of like solving like a problem. You have to be a little bit like careful with that. Like always try, like in my opinions, like always basically to try to be lean and at the beginning, even like being scrappy, when trying like to go and you know, like buy the latest, most shiny solution out there to go and solve like a problem that you can probably solve with Excel, so. Preston Pyshke- If you were going to build a database, what problem area
Starting point is 00:09:47 would you focus on? Well, there is like a very interesting topic in database systems that we start seeing more in the transactional databases, but I think we will see more and more of it also like in analytical databases, but I think we will see more and more of it also like in analytical databases, which is going completely serverless. So this is like something like super, super interesting from like a point of view of like architecture and also the kind of experience that you can deliver with these systems.
Starting point is 00:10:26 There are like some, like there's CockroachDB with their serverless. Stas Mouzakis- Right, I was going to mention Cockroach. David Pérez- Database. PlanetScale. It's probably like a couple of, like Neon, it's a new one, but it's also like open source. There are like some very interesting developments there. They're more around like the transactional databases at this point.
Starting point is 00:10:45 But very interesting, both like products and companies. So I would, I mean, I'd love like to work on something like that. It's very fascinating and very challenging, like from a technical perspective. Stig Brodersen Yeah. Yeah. Super interesting. Okay. Last question.
Starting point is 00:11:02 I don't know. I actually have no idea how long this is going to take. I'm not sure. I'm not sure. I'm not sure. I'm not sure. I'm not sure. I'm not sure. I'm not sure. I'm not sure. I'm not sure. work on something like that. It's very fascinating and very challenging from a technical perspective. Yeah, super interesting. Okay, last question. I actually have no idea how long we've been talking for, but this is a super interesting topic. Okay, really hard to build. Like, okay, let's say you're going to go build a serverless database.
Starting point is 00:11:21 Really difficult, right? Many difficult things that you mentioned about that. So, I mean, this isn't unique to databases, right? When you think about new technology, there's risk in adopting that technology because it's like, well, I mean, if this doesn't actually play out, right? You have to basically redo a ton of work, right? You know, you have to make, you know, you have to basically, you know, redo a ton of work, right? And so for databases in particular, you know, that's,
Starting point is 00:11:55 like, if you think about, like, let's just take a standard, like, ETL pipeline versus a database, right? Two pieces of data infrastructure. It's like, okay, is it painful to like, you know, replace an ETL pipeline? Like, sure, that can be painful, right? Or like, especially you have to build it or whatever, right?
Starting point is 00:12:14 But a database is a much bigger deal, right? You know, because of all the things you would think, like, you know, there's a ton of data in it. There are, you know, formatting implications there, right? I mean, like it, you know, there's a ton of data in it. There are, you know, formatting implications there. Right. I mean, generally like, you know, critical business functions run over it, et cetera, et cetera. When do you think a database or like a new database technology sort of like, what are the signals to you that it is like going to be around? Right.
Starting point is 00:12:43 Like what, when would you like invest in it? Like what would make you comfortable in terms of investing? Is that like, does it need to be open source? Is it like a certain level of adoption? I think that's one of the reasons that like pretty much every database system out there, like one way or another, there is an open source component to it. So, and I think we will keep like, like neon database, like for example, like,
Starting point is 00:13:08 yeah, like they, I think they released the open source before they started like offering some kind of hosted version. David Pérez- Yeah. It seems like a common pattern. Yeah. Yeah. Like, and I think it will continue to be like that exactly for the reasons that you're talking about, like it's just like such a big investment and an important component
Starting point is 00:13:28 of every technology out there. That's okay. Yeah. You cannot gamble that and use something that is, I don't know, like will stop like existing in a week from now. So yeah, I think like open source is important without outside of this. I mean, I don't know. I think like the community is like an important thing and obviously like the
Starting point is 00:13:54 company itself that's behind it. Right. Yeah. Yeah. Like I said, it's like a, like, and it takes time. Like, I don't think that, I don't know, like, like for example, like how long CockroachDB has been around, but like building a business around database takes time. Yeah.
Starting point is 00:14:14 Yeah. I guess Snowflake's like an outlier in that they're not open source. Yes, that's true. Just pretty interesting. But again, they, they are an outlier and also it's a little bit different when we are talking about transactional databases, which you will use to build like your product on top of it and an analytical database. Yeah.
Starting point is 00:14:40 Where, okay, I mean, you can always move like the data to another place and keep doing analytics. Okay, you can survive without your dashboard like for a day, right? It's like, yeah, done. Cockroach DB was early 2015. So yeah. Early, yeah. I mean, it's been a while. And they all started like with an open source. So yeah, yeah, it's a common bother. Super interesting.
Starting point is 00:15:16 All right. Well, I got such a good education on database fundamentals. Yeah, I'm happy to discuss more about that. It's a very interesting topic. It is. It's super interesting. All right. Well, thank you for joining us for Top Talk. I hope you learned as much as I did, even if we covered things that a lot of our listeners already know.
Starting point is 00:00:00 And we will catch you on the next one.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.