The Data Stack Show - The PRQL: Does Lakehouse Architecture Really Mean the End of the Data Warehouse and Data Lake As We Know It?
Episode Date: August 5, 2022In this bonus episode, Eric and Kostas preview their upcoming conversation with Vinoth Chandar of Apache Hudi. ...
Transcript
Discussion (0)
Welcome to the Data Stack Show prequel.
We just recorded a show with Vinath, one of the creators of Apache Hootie, and we
got to ask him about his new company.
Well, maybe we should just leave that as a teaser for the listeners.
What do you think, Astus?
Does that mean?
Oh, no, no, no, no.
I think it's think it's great.
We don't have to name the company.
I guess probably people will already know
about the company anyway.
I mean, they're doing a great job
with marketing so far
and spreading the word out there
about the company
and the relationship with the project Apache Hoodies.
So it'll be okay, I think.
Yeah, I agree.
I just had to start with a little teaser there.
Put a cliffhanger.
But I will say, I think one of the most interesting
and helpful things to me about the conversation
that we just had was understanding the, or better understanding the different components of sort of data lakes and
data warehouses as they are, as they create value for different users and then optimize
towards different goals, right? So sort of usability versus cost, et cetera.
And I feel like we just got such a good picture of how those things are converging, right?
Because traditionally, a lot of those concerns
have been very separated.
And I think way more rapidly than I realized,
they are converging and creating the opportunity to do some really cool stuff,
right?
So, I mean, one thing we talked about to give a little teaser was like, sort of bring your
own like query interface, right?
Which is a really interesting concept because to Vinat's point, like a lot of these things
are sort of big vertically integrated stack.
So that was fascinating.
But what stuck out to you? I mean, I
think we
focus a lot
on, let's
say, the
rivalry between
the lake
house
architecture
and the
data warehouses,
which, okay,
it has been
created in a
way because
of, let's
say, the
relationship
between
data
bricks and
Snowflake.
But we forget that initially, at least,
and I think it's still the case,
that data lakes have been created for different use cases.
Data warehouse still isn't,
or we will always be the right environment
if you want to do, let's say, BI,
and you want to do analytics.
Data lakes were not initially built for that. Now, the lake house says that you want to do like analytics, right? Data lakes were not like initially good for that.
Now the lake house says that you can also do that,
but the most important part is that you also have the rest of the use cases
that usually are like more into some very heavy type of processing,
like doing ML stuff, like working with like very big and complex
like workloads and stuff like that.
Right.
Uh, so that's what I keep because it's very easy, let's say, to forget about that.
Uh, and, uh, like we not like mentioned that at some point at the end, like, yeah.
I mean, at the end, lake
causes and data lakes are also enabling like a set of use cases that you cannot do
on the data warehouse. And that's like where the value is, right? Like, that's why
we need that. It's not just like a marketing, let's say, to make people buy the same thing, but they think that they buy something
different, right?
Like, it's something else.
And like, that's why you see, like, in many companies, the two solutions to coexist, right?
Like, we have data warehouses together with data lakes and lake houses.
So that's what I keep from the conversation.
And yeah, I mean, outside of this, as always, it was like an amazing technical
conversation with someone who knows deeply what he's talking about.
And so that's something that I always like enjoy when I talk with him.
So yeah, I'm looking forward to chat again with him in the future.
You had a great exchange about Compaction, which was fascinating.
Yeah.
We talked about that, about compaction.
We talked about the different services in general
that we build on top of the data lake
to bring it closer to the data warehouse.
So if anyone wants to learn more about that stuff,
I'm not going to disclose more.
We've already gone too long.
We've already gone too long at the prequel.
Thanks for joining us. This is a great show. You won't want to miss it. Ah, we've already gone too long. Yep. We've already gone too long with the prequel. Thanks.
Thanks for joining us.
This is a great show.
You won't want to miss it.
Subscribe if you haven't to get notified of the updates.
Tell a friend about the show and we will catch you on the next episode.