The Data Stack Show - The PRQL: A Methodology for Better DAGs with Stefan Krawczyk of DAGWorks

Episode Date: July 24, 2023

The Data Stack Show is a weekly podcast powered by RudderStack, the CDP for developers. Each week we’ll talk to data engineers, analysts, and data scientists about their experience around building a...nd maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data.RudderStack helps businesses make the most out of their customer data while ensuring data privacy and security. To learn more about RudderStack visit rudderstack.com.

Transcript
Discussion (0)
Starting point is 00:00:00 Welcome to the Data Stack Show prequel, where we replay a snippet from the show we just recorded. Kostas, are you ready to give people a sneak peek? I am, of course. Let's do it. Let's do it. Kostas, I love this show because we covered a variety of topics with Stefan from Dagworks and Hamilton. You know, I think one of the most fascinating things about the show to me was we kind of
Starting point is 00:00:34 started out thinking we were going to talk a lot about DAGs, right? Because Dagworks, you know, sort of the name of the company is focused on DAGs. But really what's interesting is that it's not necessarily a tool for DAGs like you would think about Airflow, necessarily. It's actually a tool for writing clean, testable ML code that produces a DAG. And so the DAG is almost sort of a consequence
Starting point is 00:01:01 of an entire methodology, which is Hamilton, which is absolutely fascinating. And so I really appreciated the way that Stefan sort of got at the heart of the problem. It's not like we need another DAG tool, right? We actually need a tool that solves sort of problems with complex growing code bases at the core. And a DAG is sort of a natural consequence of that and a way to view the solution, but not the only one. So I think that was my big takeaway. I think it's a very interesting, elegant solution
Starting point is 00:01:32 or way to approach the problem. Yeah. DAGs appear everywhere with these kind of problems, right? Like anything that's like close to a workflow or there is some kind of like dependency there there's always a somewhere right and like similarly like again like hamilton the same way that if you think about like dbt right like dbt also is a dad right it's every dbt project is a graph that connects models with each other. The difference, of course, is that we have like dbt,
Starting point is 00:02:08 which lives like in the SQL world, and then we have Hamilton, which lives like in the Python world, and it's also like targeting a different audience, right? So that's like at the end, what Hamilton is trying to do is like to bring the value of, let's say, the guardrails that a framework like DBT is offering to the BI and the analytics professionals out there to the ML community, right? Because they also have that and probably they have it also like in deeper complexity compared
Starting point is 00:02:40 to, let's say, the BI worlds, just because by nature, ML models and features have deeper dependencies to each other. So it's very interesting to see how the patterns emerge in different sides of the field, like the industry, but at each core they remain the same. So, yeah, I think everyone should go and take a look at Hamilton. They also have a
Starting point is 00:03:15 sandbox playground where you can try it online if you want and started building a company on top of that. And any feedback is going to be like super useful for the Hamilton folks. So I would encourage everyone like to go and like do it. Definitely.
Starting point is 00:03:36 And while you're checking out Hamilton, I think it's tryhamilton.dev. Head over to Data Stack Show, click on your favorite podcast app and subscribe to the Data Stack Show. Tell a friend if you haven't and we will catch you on the next one.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.