The Data Stack Show - The PRQL: Who Really Needs To Know How a DBMS Works?
Episode Date: September 2, 2022In this bonus episode, Eric and Kostas preview their upcoming conversation with Kyle Weller of Onehouse.ai. ...
Transcript
Discussion (0)
okay costas i have a confession to make we talked about an interesting topic on the show we just
recorded uh optimistic concurrency control and i have misused that technology before
mainly when i'm having disagreements with my wife you know sort of assuming that
the conversation won't have any sort of collisions
when we're both trying to get the point across.
But to segue, it was interesting to hear about that as sort of one of the, like an interesting
sort of limitation of some data lake technologies,
which was actually what the whole episode was about,
data lakes and data lake houses.
What interests you most about the lake house format
that we just talked about with Kyle from OneHouse?
I mean, for me, what is like extremely interesting with the lake house is that like any engineer who is going to work with building one or trying to understand one, they will have like the used to interacting with databases, thinking of
them as like a very kind of like black box, right?
Like there is a system we know that's like pretty complicated.
It's one of the hardest things like to build maybe outside of like
compilers and operating systems.
But the way that like engineers usually interact with them, especially
when they build products, is through like SQL and probably also around
like the operations, like how to configure it and stuff like that, right?
I don't think that like that many engineers out there, they have like
any need to go deep into how the data is stored
on the card drive, how it is passed, how concurrency works, why would it do it, or why don't we
do it?
All these are things that we were taking for granted because they existed for 30, 40 years
now.
But what people can understand, what the lake houses and how this build is,
like actually you have to go through this process where you can understand and
learn all the different components that are needed to build a database
management system at the end.
Now, okay, people might argue that why is this like good?
But for me it is because I'm naturally a very curious person and I love like none about that stuff.
But at the same time, I think that when, let's say, the market or the industry ends up in
a situation where things need to break down into pieces, there is a good reason for that.
What the reasons are, we will figure it out as we build the categories and as we
see what the market is going to adopt there, but I think it's very interesting
from an engineering perspective and also from a business perspective to see
exactly how this process of breaking down a database system into like smaller
pieces and building commands around these different components, and then
probably merging them all together again at some point, like it's going to work out.
So I don't know, like these two reasons aren't enough for me, at least,
like to be interested in this.
Of course I'm biased.
So yeah. That's the reason for me at least least, like to be interested in this. Of course, I'm biased. So, yeah.
That's the reason for me, at least. I'm sure.
Well, that was a great episode. And if any of
those topics interest you, definitely check
out the episode with
Kyle from OneHouse.
He works with Vanath,
one of the inventors of Apache
Hudi. So we get into some Hudi history.
Some really interesting Microsoft
history. We talk about telemetry
with the Office Suite, which was fascinating.
And then, of course, some
really practical advice on the Lakehouse,
especially for companies who are looking
to migrate towards that architecture.
So, definitely
check out the episode, and we will catch you
on the next one.