The Data Stack Show - The PRQL: The Two Parallel Tracks of Development In Data Processing with Ryan Blue of Tabular
Episode Date: April 8, 2024The Data Stack Show is a weekly podcast powered by RudderStack, the CDP for developers. Each week we’ll talk to data engineers, analysts, and data scientists about their experience around building a...nd maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data.RudderStack helps businesses make the most out of their customer data while ensuring data privacy and security. To learn more about RudderStack visit rudderstack.com.
Transcript
Discussion (0)
Welcome to the Data Stack Show prequel.
This is a short bonus episode where we preview the upcoming show.
You'll get to meet our guest and hear about the topics we're going to cover.
If they're interesting to you, you can catch the full-length show when it drops on Wednesday.
Welcome back to the Data Stack Show. Costas, we've talked a lot about databases, database technology.
You know, it's been a common theme on the show.
But today, we're going to dig really deep into that world at high scale.
So Ryan Blue is our guest. He helps create Iceberg,
which is now part of the Apache Foundation. And it's going to be a great story. I mean,
I am really interested in hearing the background of the challenges that they faced at Netflix,
where this was originally developed.
And then it's above my pay grade,
but I am really interested if you would be willing to ask him about file formats.
Because that is actually another interesting thing
that we haven't covered in great detail.
I mean, we've done it here or there,
but that's a huge topic when it comes to Iceberg
and we think about data lakes.
So that's another topic that I've been thinking about
just as it relates to all of Ryan's experience.
So hopefully I didn't steal your thunder
on the file submit question,
but what do you want to ask about?
Yeah, I mean, first of all,
I know that like most people, I know that most people,
when they think about Ryan,
they think of Iceberg, but
what is, I think, extremely
interesting is that
Ryan has been around for a very
long time. He has been part
of building some of
very foundational
pieces of technology that we are using
today,
like things like Avro, Parquet,
and obviously the table formats like Iceberg is.
So outside of anything technical
that we will be talking about with him,
one of the things that I will spend quite some time with him is like,
do a little bit of like history, like why things actually happened the way that they happened.
We touched with him and it's like, in my opinion, super interesting.
It's about how when it comes to data processing, there are actually two parallel tracks of development that happened in the past
like 10-15 years. One which is coming primarily like from the database folks that were building
database systems. And another one is like coming actually from people that were primarily distributed
systems people. And that's where things like MapReduce came stuff like Hadoop and like all these big data
technologies that we are talking about and we will see that
and there are like some very interesting comments and points that are made of like
how we reinvented some things or we did
some things like differently, why this happened
and Ryan gives like a very interesting or we did some things like differently, why this happened.
And Ryan gives like a very interesting perspective into the evolution of these systems and how they happened and why.
And outside of that, we'll talk a lot about file formats,
which is also quite of a hot topic.
RK, for example, has been out for a while.
There are like a lot of conversations of like, we need to update it.
There are some actually new things coming out these days.
So I think it's a very good time to do a refresher on what file formats are
and for storing data and how they differ between them
and how they differ to table formats like Iceberg, right?
And on top of that, we'll talk also like a little bit about like Tabular, his company,
and also about some other like really interesting things that are happening right now in the
space.
So make sure you listen to the episode.
It's very interesting.
Ryan has like a lot to share and we have a lot to learn from him.
MARK MIRCHANDANI- Agreed.
Well, let's dig in and talk about Iceberg
and all the other things.
Let's do it.
MARK MIRCHANDANI- All right, that's
a wrap for the prequel.
The full-length episode will drop Wednesday morning.
Subscribe now so you don't miss it.