The Data Stack Show - The PRQL: A Methodology for Better DAGs with Stefan Krawczyk of DAGWorks
Episode Date: July 24, 2023The Data Stack Show is a weekly podcast powered by RudderStack, the CDP for developers. Each week we’ll talk to data engineers, analysts, and data scientists about their experience around building a...nd maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data.RudderStack helps businesses make the most out of their customer data while ensuring data privacy and security. To learn more about RudderStack visit rudderstack.com.
Transcript
Discussion (0)
Welcome to the Data Stack Show prequel, where we replay a snippet from the show we just
recorded.
Kostas, are you ready to give people a sneak peek?
I am, of course.
Let's do it.
Let's do it.
Kostas, I love this show because we covered a variety of topics with Stefan from Dagworks and Hamilton.
You know, I think one of the most fascinating things about the show to me was we kind of
started out thinking we were going to talk a lot about DAGs, right?
Because Dagworks, you know, sort of the name of the company is focused on DAGs.
But really what's interesting is that it's not necessarily
a tool for DAGs like you would think about Airflow,
necessarily.
It's actually a tool for writing clean, testable ML code that
produces a DAG.
And so the DAG is almost sort of a consequence
of an entire methodology, which is Hamilton,
which is absolutely fascinating. And so I really appreciated the way that Stefan sort of
got at the heart of the problem. It's not like we need another DAG tool, right? We actually need a
tool that solves sort of problems with complex growing code bases at the core. And a DAG is
sort of a natural consequence of that
and a way to view the solution, but not the only one.
So I think that was my big takeaway.
I think it's a very interesting, elegant solution
or way to approach the problem.
Yeah. DAGs appear everywhere with these kind of problems, right?
Like anything that's like close to a workflow
or there is some kind of like dependency there there's always a
somewhere right and like similarly like again like hamilton the same way that if you think
about like dbt right like dbt also is a dad right it's every dbt project is a graph that connects
models with each other.
The difference, of course, is that we have like dbt,
which lives like in the SQL world,
and then we have Hamilton, which lives like in the Python world,
and it's also like targeting a different audience, right?
So that's like at the end, what Hamilton is trying to do
is like to bring the value of, let's say, the guardrails that
a framework like DBT is offering to the BI and the analytics professionals out there
to the ML community, right?
Because they also have that and probably they have it also like in deeper complexity compared
to, let's say, the BI worlds, just because by nature, ML models
and features have deeper dependencies to each other.
So it's very interesting to see how the patterns emerge in different sides of the field, like the industry, but
at each core they remain the same.
So, yeah,
I think everyone should go and take a look at
Hamilton. They also have
a
sandbox playground where you
can try it online if you want
and started building
a company on top of that.
And
any feedback is going to be like super useful for the Hamilton folks.
So I would encourage everyone like to go and like do it.
Definitely.
And while you're checking out Hamilton, I think it's tryhamilton.dev.
Head over to Data Stack Show, click on your favorite podcast app and subscribe to the Data Stack Show.
Tell a friend if you haven't and we will catch you on the next one.