The Data Stack Show - The PRQL: Exploring the Evolution, Challenges, and Benefits of Composable Data Stacks Featuring Wes McKinney, Pedro Pedreira, Chris Riccomini, and Ryan Blue
Episode Date: January 29, 2024In this bonus episode, Eric and Kostas preview their upcoming discussion with a panel of experts as Wes McKinney (Co-Founder, Voltron), Pedro Pedreira Software Engineer, Meta), Chris Riccomini (Seed ...Investor, various startups), and Ryan Blue (Co-Founder and CEO, Tabular) join the show.
Transcript
Discussion (0)
Welcome to the Data Stack Show prequel.
This is a short bonus episode where we preview the upcoming show.
You'll get to meet our guest and hear about the topics we're going to cover.
If they're interesting to you, you can catch the full-length show when it drops on Wednesday.
Welcome to the Data Stack Show.
We have a truly incredible panel here to discuss the topic of
composable data stacks. So many topics to cover today. So let's get right into introductions,
and I'm just going to do it in the order that it shows up on my screen.
Chris, do you want to start out by giving us a quick background and intro?
Sure. Yeah. My name is Christopher Comiti.
I have spent the last 20 years of my career at two companies, mostly LinkedIn, where I spent a
lot of time on streaming and stream processing and was the author of Apache Samza, which was an early
stream processing system, kind of similar to Flink. And most recently at a company called
WePay, which is acquired by JPMorgan Chase, where I ran our payments infrastructure, data infrastructure and data engineering teams for a stretch of time.
I've also written a book for new software engineers, kind of a handbook, because I was
tired of saying the same thing in one-on-ones over and over again. I've been involved in open
source. I was a mentor for the Airflow project and helped guide it through Incubator on Apache.
I also do a little bit of investing.
And so that's where I spend
a chunk of my time now.
And I, yeah, write a little newsletter
on all things systems infrastructure.
That's me in a nutshell.
Very cool.
Wes, you're up.
Yeah, I'm Wes McKinney.
I'm a serial open source project,
open source software developer.
I've created or co-created
a number of popular open source libraries, open source software developer. I've created or co-created a number of popular open source libraries,
Pandas and IBIS for Python,
Paciero, kind of in-memory data infrastructure,
Layer, it's very relevant to the topic of today's show.
I've been involved in a bunch of companies,
most recently a co-founder of Voltron Data,
building accelerated computing software for the Composable Data Stack and Posit,
the data science platform company for R and Python. I am an author of the book,
Python for Data Analysis. So popular reference book for Python data science stack.
And I also do a fair bit of angel investing in and around next generation data infrastructure startups.
Very cool.
Ryan, you're next on my screen.
Oh, thanks.
I'm Ryan Blue.
I'm the co-creator of Apache Iceberg, which is one of the open table formats that I think is slowly but steadily making a big change to the way we architect big data systems, especially in object stores. I'm also a co-founder of Tabular, where we sell an iceberg-based
architecture that has security and data management services baked in. I left Netflix to found Tabular
and Netflix. At Netflix, we were on the open source big data team.
So I got to work on Parquet and Iceberg
and replace the read and write paths
and Spark and various other things.
Very cool.
And Pedro.
All right.
Hello, everyone.
I'm happy to be here once again.
I'm Pedro Pedreira, software engineer.
I've been at Meta for a little bit over 10 years,
always involved in projects around data infrastructure,
a little bit closer to analytic engines, log processing engines.
So it's been most of my career just kind of developing databases
and data processing engines.
And I think about the last five years,
I started getting a little closer to this idea of composability
and how can we make the development of those engines more efficient.
So we started working on a variety of projects related to the space.
One of the projects that we eventually open sourced that got a little more visibility on the industry was Bellox, which was recently open sourced.
It was this idea of making execution more composable for data management systems.
But inside Meta, I work with a variety of teams,
with most of the warehouse compute,
large warehouse compute engines like Presto, like Spark.
So kind of this data processing area for analytics,
developing efficient query engines, that's sort of the thing I do.
All right, that's a wrap for the prequel.
The full-length episode will drop Wednesday morning.
Subscribe now so you don't miss it.