The Data Stack Show - Shop Talk: Why Are There So Many Flavors of Databases?
Episode Date: October 24, 2022In this bonus episode, Eric and Kostas talk shop about the wide world of databases. ...
Transcript
Discussion (0)
Welcome to the Data Stack Show Shop Talk, where Kostas and I talk shop.
And it's one of our favorite things to do.
And also, we should tell everyone, Kostas, if there's a topic you want us to discuss,
send us an email and we will discuss it on an upcoming Shop Talk episode
and we will send you
a Datastack Show
coffee mug and t-shirt.
So please email us
eric at datastackshow.com
costas at datastackshow.com
or brooks at datastackshow.com
You'll probably get
a faster response
if you email Brooks
but please send us topics
you want us to
and we'll tackle them
on Shop Talk.
Okay.
Yeah, or the calls are resolved on Twitter as well.
Oh, yeah, that's true.
Yep.
For sure.
Okay, Kostas, here's my question for this week.
And this is me as the less technical co-host of the show asking you as the more technical co-hosts of the show asking you is the more technical co-hosts. One thing that is really interesting to me
is it seems like there are a lot of new databases
being created,
like different types of databases, right?
I mean, there are lots of,
like there are lots of database types out there, right?
And one of my questions is for maybe flavors
is a different, is a better word to describe it, right?
I mean, fundamentally databases like,
you know, have a lot of similarities,
but one of my questions is why is that, right?
Like building a database system
that can be widely successful
seems like a ridiculously hard undertaking,
especially with so many incumbents. And yeah, it's just interesting. I don't know,
if I was going to pick a problem to solve, I don't know if building a new database system
would be it just because it seems like there are so many established really good options out there.
Let me make sure that I understand the question.
The question is about why we have so many different labels or why it is
so hard to build a database or both.
I mean, I can answer both.
I don't like something you like.
Part of my question is
why do people keep trying
to invent new kinds of databases?
That's really more of the question.
I'm not so sure
that they try to do that.
Like what is like the latest
flavor of database
that you have seen out there that you didn't know about it.
I was just thinking back on what was the quine, right? Like the graph database.
Yeah, but that was more of like a processing system. It wasn't exactly like a database.
And utilizing, let's say, it was adding like a graph layer on top of a
key-value store, which already existed.
It's like the Soviet database technically is the key-value store and they put like
a real-time graph processing system on top of that, right?
So that's a little bit different, but...
Stig Brodersen How about Fireball?
Like...
Well, Fireball... That's a little bit different, but... How about fireball? Like, I mean, it is, it's like a hard fork of ClickHouse.
But that's, that's relevant in the sense that like they have added a lot of stuff on the board to make it fireballed.
So like fireball is not exactly like ClickHouse, but let me, okay, let me,
David Pérez- does that make sense?
I know I'm probably, my question probably like reveals a lot of my technical ignorance.
But no, no, no, no, no, no, no.
I think it's a reveals the how to say that like the obscurity around like database systems and why like database systems are...
how to sort of like, there's like a veil of, I mean...
Henry Suryawirawan- Mystery?
And yeah.
Yeah.
Which I think also has to do with like how hard it's supposed
to be like to build one, right?
But okay.
Let's, let's take this like from the beginning.
Data-based systems are primarily, let's say, categorized based on the
workloads that they serve best.
Okay.
And a workload is, it'll make many definitions of like workloads, but it's mainly what kind
of data we are working with and what kind of processing we want to do on that data,
right?
So having a dashboard is something like serving a dashboard, it's like something like
fundamentally different to doing real-time queries on streaming data, right?
So, okay, fundamentally, all these systems are like database systems
in the sense that they operate over like a set of data.
They expose, let's say, an interface where the user can ask a question and
process the data and get an answer, right?
Obviously, you got, like, technically, let's say you can take, let's say,
Postgres, okay, and you can use Postgres to do, to use it as a transactional
database, you can use it to run analytical queries, you can use it for time-series data.
Maybe you can also use it like with streaming data.
Okay.
But there's tons of trade-offs.
I mean, yeah, like mainly how much you can steal and how much you can cover the
use cases for each one of these.
Right.
So yeah, like we reach a point where we need to start like specializing.
So we suddenly have like time series databases, right?
We have OLAP systems, which is like dataware thousands.
And then we have data lakes and then we have graph databases and
key variant stores and in-memory systems.
So yeah, we have like different labors because we need to specialize in order
like to maximize, let's say, what, how well we can solve each one of these problems.
And as we need like to do more and more on each one of these workloads, the
more innovation we will see there.
Having said that, yeah, databases are like, I don't know, like maybe together
with operating systems and compilers, like the three most complex systems to build.
I mean, not as a toy, but as a product, right?
Probably closer to...
probably closer like to an operating system, to be honest.
It has many commonalities in terms of the different components and stuff.
At least combiners are very difficult to build
because there's a lot of algorithms and stuff that you have to do there.
But in terms of their architecture, I think they're a bit simpler
compared to something like a database system or an operating system.
But databases serve many things with operating systems, like how they handle memory, how they handle storage,'s say, workloads and the need to specialize in these workloads, together with the fact that it's really hard to build a database, is what I think creates, let's say, this difficulty for people to understand why we need all these different databases and why we keep like trying to, to build new ones.
Yeah.
That makes sense.
It's like helpful for me.
Yeah, it makes total sense.
I think that the, yeah, that makes total sense.
Yeah.
It just, do you think, do you see like if a company has, I mean i mean there's there also seems like a lot of
operational overhead right which is probably why smaller companies just use like a very simple
sort of standard set of databases right like it seems like an individual company would use
a wider variety as they have the scale and resources to manage that right because like
having multiple different database you you know, like a wide
variety would introduce a lot of operational overhead, right?
Yeah, yeah, absolutely.
I think like introducing data infrastructure in general, like I don't think it's just
like the database that applies to, but you, like people should always do that when they scale enough
that they have the need to do it.
Otherwise you are just having like too much complexity and you're going to get
hurt instead of like solving like a problem.
You have to be a little bit like careful with that.
Like always try, like in my opinions, like always basically to try to be
lean and at the beginning, even like being scrappy, when trying like to go
and you know, like buy the latest, most shiny solution out there to go and solve
like a problem that you can probably solve with Excel, so.
Preston Pyshke- If you were going to build a database, what problem area
would you focus on?
Well, there is like a very interesting topic in database systems that we start
seeing more in the transactional databases, but I think we will see more
and more of it also like in analytical databases, but I think we will see more and more of it
also like in analytical databases, which is going completely serverless.
So this is like something like super, super interesting from like a point
of view of like architecture and also the kind of experience that you
can deliver with these systems.
There are like some, like there's CockroachDB with their serverless.
Stas Mouzakis- Right, I was going to mention Cockroach.
David Pérez- Database.
PlanetScale.
It's probably like a couple of, like Neon, it's a new one,
but it's also like open source.
There are like some very interesting developments there.
They're more around like the transactional databases at this point.
But very interesting, both like products and companies.
So I would, I mean, I'd love like to work on something like that.
It's very fascinating and very challenging, like from a technical perspective.
Stig Brodersen Yeah.
Yeah.
Super interesting.
Okay.
Last question.
I don't know.
I actually have no idea how long this is going to take. I'm not sure. I'm not sure. I'm not sure. I'm not sure. I'm not sure. I'm not sure. I'm not sure. I'm not sure. I'm not sure. work on something like that. It's very fascinating and very challenging from a technical perspective.
Yeah, super interesting.
Okay, last question.
I actually have no idea how long we've been talking for,
but this is a super interesting topic.
Okay, really hard to build.
Like, okay, let's say you're going to go build a serverless database.
Really difficult, right?
Many difficult things that you mentioned about that.
So, I mean, this isn't unique to databases, right? When you think about
new technology, there's risk in adopting that technology because it's like, well, I mean,
if this doesn't actually play out, right? You have to basically redo a ton of work, right? You know, you have to make, you know, you have to basically,
you know, redo a ton of work, right?
And so for databases in particular,
you know, that's,
like, if you think about,
like, let's just take a standard,
like, ETL pipeline versus a database, right?
Two pieces of data infrastructure.
It's like, okay, is it painful to like,
you know, replace an ETL pipeline?
Like, sure, that can be painful, right?
Or like, especially you have to build it or whatever, right?
But a database is a much bigger deal, right?
You know, because of all the things you would think,
like, you know, there's a ton of data in it.
There are, you know, formatting implications there, right? I mean, like it, you know, there's a ton of data in it. There are, you know, formatting implications there. Right.
I mean, generally like, you know, critical business functions run over it,
et cetera, et cetera.
When do you think a database or like a new database technology sort of like,
what are the signals to you that it is like going to be around? Right.
Like what, when would you like invest in it?
Like what would make you comfortable in terms of investing?
Is that like, does it need to be open source?
Is it like a certain level of adoption?
I think that's one of the reasons that like pretty much every database
system out there, like one way or another, there is an open source
component to it.
So, and I think we will keep like, like neon database, like for example, like,
yeah, like they, I think they released the open source before they started
like offering some kind of hosted version.
David Pérez- Yeah.
It seems like a common pattern.
Yeah.
Yeah.
Like, and I think it will continue to be like that exactly for the reasons
that you're talking about, like it's just like such a big investment and an important component
of every technology out there.
That's okay.
Yeah.
You cannot gamble that and use something that is, I don't know, like will stop
like existing in a week from now.
So yeah, I think like open source is important without outside of this.
I mean, I don't know.
I think like the community is like an important thing and obviously like the
company itself that's behind it.
Right.
Yeah.
Yeah.
Like I said, it's like a, like, and it takes time.
Like, I don't think that, I don't know, like, like for example, like how long CockroachDB has been around, but like building a business around database takes
time.
Yeah.
Yeah.
I guess Snowflake's like an outlier in that they're not open source.
Yes, that's true.
Just pretty interesting.
But again, they, they are an outlier and also it's a little bit different when we are talking
about transactional databases, which you will use to build like your product on top of it
and an analytical database.
Yeah.
Where, okay, I mean, you can always move like the data to another place and keep doing analytics.
Okay, you can survive without your dashboard like for a day, right?
It's like, yeah, done.
Cockroach DB was early 2015.
So yeah. Early, yeah. I mean, it's been a while.
And they all started like with an open source.
So yeah, yeah, it's a common bother.
Super interesting.
All right.
Well, I got such a good education on database fundamentals.
Yeah, I'm happy to discuss more about that.
It's a very interesting topic.
It is. It's super interesting.
All right. Well, thank you for joining us for Top Talk.
I hope you learned as much as I did,
even if we covered things that a lot of our listeners already know.
And we will catch you on the next one.