The Data Stack Show - The PRQL: Exploring the Intersection of Software Engineering and Data Management with Kevin Liu of Stripe
Episode Date: March 18, 2024The Data Stack Show is a weekly podcast powered by RudderStack, the CDP for developers. Each week we’ll talk to data engineers, analysts, and data scientists about their experience around building a...nd maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data.RudderStack helps businesses make the most out of their customer data while ensuring data privacy and security. To learn more about RudderStack visit rudderstack.com.
Transcript
Discussion (0)
Hi, Data Stack Show listeners. I'm Pete Soderling, and I'd like to personally invite you to Data
Council Austin this March 26 to 28, where I'll play host to hundreds of attendees,
100 plus top speakers, and dozens of hot startups on the cutting edge of data science,
engineering, and AI. If you're sick and tired of salesy data conferences like I was,
you'll understand exactly why I started Data Council and how it's become known for being
the best vendor-neutral,
no BS, technical data conference around. The community that attends Data Council are some of the smartest founders, data engineers, and scientists, CTOs, heads of data, lead engineers,
investors, and community organizers. We're all working together to build the future of data and
AI. And as a listener to the Data Stack Show, you can join us at the event at a special price.
Get 20% discount off tickets by using promo code DATASTACK20. That's DATASTACK20. But don't just
take my word that it's the best data event out there. Our attendees refer to Data Council as
Spring Break for Data Geeks. So come on down to Austin and join us for an amazing time with the
data community. I can't wait to see you there.
Welcome to the Data Stack Show prequel. This is a short bonus episode where we preview the upcoming show. You'll get to meet our guest and hear about the topics we're going to cover.
If they're interesting to you, you can catch the full-length show when it drops on Wednesday.
We are here on the Data Stack Show with Kevin Liu.
Kevin, thank you so much for giving us a little bit of your time today.
Yeah, thanks for having me.
All right, well, you've done a couple of really interesting things in data,
but just give us your brief background.
How did you start and what are you doing today?
Sure. I'm currently a software engineer at Stripe.
I've been working there for around three years.
I've been working with data infrastructure there.
So a lot of open source technologies such as Trino, Iceberg,
my team powers are internal BI analytics.
And recently I've taken on another challenge with,
on the data product side,
the product is called Stripe Data Pipeline. We essentially enable merchants to have their Stripe data
back into their warehouse,
into their data ecosystem in an efficient way.
This is great. Actually, I know you, Kevin, for a while now. warehouse into their data ecosystem in an efficient way.
This is great.
Actually, I know you, Kevin, for a while now.
We've been talking since the time when I was at Starburst and about Strino specifically.
And I'm very excited today because I had the opportunity and the pleasure to work with Stripe quite a few times.
And it's one of these companies that they've been around for long enough to go through many changes,
but always trying to stay at the forefront of what is happening out there.
For example, very early adopter of Spark.
I'm pretty sure you probably still have pipelines in Scala in there because of that.
And you keep innovating.
You are open, using new technologies.
And many things have happened in these past 10 years, let's say.
So having you from there and you being long enough there to see these past three, four
years, the evolution, I think will give us a great opportunity to talk about where data
infrastructure stands today, what some interesting problems are.
And also based on your latest move into turning data into products, talk about that.
Because I think it's a very important next evolution step when it comes to infrastructure around data.
So that's what I'm really excited about today.
What about you?
What are a few things that you'd love to talk about?
Yeah, I think in general, I've been really happy working at Stripe just because,
you know, the company for its size, for the kind of engineering culture there,
it really helped me learn and get to understand a lot of what is going on, especially in the data world, kind of at the, like, you know,
what is the most newest and shiniest thing
that we can work with, right?
So, you know, I took a database class in college,
didn't think much of it, came to Stripe,
started working with, you know,
OLAP systems, Trino, Iceberg,
and it was very new to me.
But then eventually I started to realize that it was new kind of to the industry as well.
And that's been really exciting to me in order to say, okay, well, you know, how do I take this
new concept? How do I run it efficiently at Stripe? And then how do I help the community? Because it is an open source project. How do I kind of take ideas that we have
that we come up with and share it with the community as well? And then on the data product
side, I think Stripe is positioned very well to do data sharing.
Not a lot of companies can do that because not a lot of companies have
the value from the data that they have
and have that kind of be shared to their customers
in a way where the customers are asking for it
on a daily basis.
So I'm still learning.
I think I just want to share some ideas with you guys.
And yeah, happy to talk more about things.
Yeah, let's do it.
What do you think, Eric?
Are we ready?
I was born ready, Kostas.
I was born ready.
I know that.
Let's do it.
Let's do it.
All right.
That's a wrap for the prequel.
The full-length episode will drop Wednesday morning.
Subscribe now so you don't miss it.