The Data Stack Show - The PRQL: Does Lakehouse Architecture Really Mean the End of the Data Warehouse and Data Lake As We Know It?

Episode Date: August 5, 2022

In this bonus episode, Eric and Kostas preview their upcoming conversation with Vinoth Chandar of Apache Hudi. ...

Transcript
Discussion (0)
Starting point is 00:00:00 Welcome to the Data Stack Show prequel. We just recorded a show with Vinath, one of the creators of Apache Hootie, and we got to ask him about his new company. Well, maybe we should just leave that as a teaser for the listeners. What do you think, Astus? Does that mean? Oh, no, no, no, no. I think it's think it's great.
Starting point is 00:00:26 We don't have to name the company. I guess probably people will already know about the company anyway. I mean, they're doing a great job with marketing so far and spreading the word out there about the company and the relationship with the project Apache Hoodies.
Starting point is 00:00:43 So it'll be okay, I think. Yeah, I agree. I just had to start with a little teaser there. Put a cliffhanger. But I will say, I think one of the most interesting and helpful things to me about the conversation that we just had was understanding the, or better understanding the different components of sort of data lakes and data warehouses as they are, as they create value for different users and then optimize
Starting point is 00:01:19 towards different goals, right? So sort of usability versus cost, et cetera. And I feel like we just got such a good picture of how those things are converging, right? Because traditionally, a lot of those concerns have been very separated. And I think way more rapidly than I realized, they are converging and creating the opportunity to do some really cool stuff, right? So, I mean, one thing we talked about to give a little teaser was like, sort of bring your
Starting point is 00:01:51 own like query interface, right? Which is a really interesting concept because to Vinat's point, like a lot of these things are sort of big vertically integrated stack. So that was fascinating. But what stuck out to you? I mean, I think we focus a lot on, let's
Starting point is 00:02:08 say, the rivalry between the lake house architecture and the data warehouses, which, okay,
Starting point is 00:02:16 it has been created in a way because of, let's say, the relationship between data
Starting point is 00:02:23 bricks and Snowflake. But we forget that initially, at least, and I think it's still the case, that data lakes have been created for different use cases. Data warehouse still isn't, or we will always be the right environment if you want to do, let's say, BI,
Starting point is 00:02:41 and you want to do analytics. Data lakes were not initially built for that. Now, the lake house says that you want to do like analytics, right? Data lakes were not like initially good for that. Now the lake house says that you can also do that, but the most important part is that you also have the rest of the use cases that usually are like more into some very heavy type of processing, like doing ML stuff, like working with like very big and complex like workloads and stuff like that. Right.
Starting point is 00:03:08 Uh, so that's what I keep because it's very easy, let's say, to forget about that. Uh, and, uh, like we not like mentioned that at some point at the end, like, yeah. I mean, at the end, lake causes and data lakes are also enabling like a set of use cases that you cannot do on the data warehouse. And that's like where the value is, right? Like, that's why we need that. It's not just like a marketing, let's say, to make people buy the same thing, but they think that they buy something different, right? Like, it's something else.
Starting point is 00:03:51 And like, that's why you see, like, in many companies, the two solutions to coexist, right? Like, we have data warehouses together with data lakes and lake houses. So that's what I keep from the conversation. And yeah, I mean, outside of this, as always, it was like an amazing technical conversation with someone who knows deeply what he's talking about. And so that's something that I always like enjoy when I talk with him. So yeah, I'm looking forward to chat again with him in the future. You had a great exchange about Compaction, which was fascinating.
Starting point is 00:04:23 Yeah. We talked about that, about compaction. We talked about the different services in general that we build on top of the data lake to bring it closer to the data warehouse. So if anyone wants to learn more about that stuff, I'm not going to disclose more. We've already gone too long.
Starting point is 00:04:42 We've already gone too long at the prequel. Thanks for joining us. This is a great show. You won't want to miss it. Ah, we've already gone too long. Yep. We've already gone too long with the prequel. Thanks. Thanks for joining us. This is a great show. You won't want to miss it. Subscribe if you haven't to get notified of the updates. Tell a friend about the show and we will catch you on the next episode.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.