The Data Stack Show - The PRQL: How Did Pandas Become a Data Science Powerhouse? Featuring Chang She of Eto Labs

Episode Date: October 23, 2023

In this bonus episode, Eric and Kostas preview their upcoming conversation with Chang She of Eto Labs. ...

Transcript
Discussion (0)
Starting point is 00:00:00 Welcome to the Data Stack Show prequel, where we replay a snippet from the show we just recorded. Kostas, are you ready to give people a sneak peek? I am, of course. Let's do it. Let's do it. Kostas, one thing that is amazing about Chong, I mean, of course, in addition to the fact that he, you know, was a co-author of the Pandas Library, which is legendary. And the fact that he has built multiple high impact technologies, is a multi-time, multi-exit founder, building data tooling, you know, and sort of the data and ML ops space. I mean, all of those things are, it's really incredible. But when you talk with him, he, you know, if you didn't know who he was, you would just think this is just one of those like really curious,
Starting point is 00:01:03 really passionate, really smart founders, you know, and you said at the very beginning that he's humble, but I mean, that's almost an understatement. You know, he's just, he would treat anyone on the same level as him, no matter their level of, you know, accomplishment or technical expertise. Yeah. no matter their level of accomplishment or technical expertise. That really stuck out to me. And I also think the other thing that was really great about this episode was it wasn't like he came out and said, I have an opinion about the way the world should be,
Starting point is 00:01:38 and this is why we're doing things like the Lance DB way. He just kind of had a very calm explanation of the problem and a really good set of reasoning for why he needed to create a new file format, right? Which is like shocking to hear, you know, because it's like, whoa, you know, you have like Parquet exists. Why do this? Right. So it sounds really shocking on face value, but I mean, his description was really compelling. And the story of how they actually sort of almost backed into creating a vector database, you know, because they invented this file format. Just an incredible episode.
Starting point is 00:02:20 Yeah. Yeah. I mean, Chang is like one of these rare cases where you have both like an innovator and a builder, which is like, I mean, it's hard to find rare cases where you have both an innovator and a builder, which is... I mean, it's hard to find an innovator. It's hard to find a builder. It's even harder to find someone who combines these two. And at the same time, being down-to-earth like him. I think this episode has pretty much everything. I mean, it has lessons from the past
Starting point is 00:02:45 that can be super helpful to understand how we should approach and solve problems today. And there's a lot of things to learn from the story of pandas that are applicable today for everyone who's trying to build tooling around AI in the mail. It has... What I really enjoyed was actually,
Starting point is 00:03:07 probably like the first time that we talked about something, I think very important, which is how the infrastructure needs to evolve in order to accommodate these new use cases and actually accelerate innovation with like AI and ML, which is still like work in progress. And I think Chang provided like some amazing insight of like what are the right directions to do that.
Starting point is 00:03:31 And he said some very interesting things about not creating silos, how like, you know, gave like a very interesting example, like from like mathematics, where he said that, you know, in mathematics, like when you have a new problem, you try to reduce it to a known problem, right?
Starting point is 00:03:46 And that's how we should also build technology with an amazing insight, to be honest. And I think it's something that especially builders keep forgetting and tend to either replicate or create bloated solutions and all that stuff. So there's a lot of wisdom in this episode. I think anyone who's like a data engineer
Starting point is 00:04:06 and wants to get like a glimpse of the future of what it means like to like work with the next generation of like data platforms, they should definitely tune in and like listen to Chang. I agree. A really incredible episode. Subscribe if you haven't. You'll get notified when this episode goes live
Starting point is 00:04:24 on your podcast platform of choice. And of course, tell a friend. Many exciting guests coming down the line for you. And we will catch you on the next one.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.