The Data Stack Show - The PRQL: Why is the Data Engineer's Role Expanding?
Episode Date: December 3, 2021In this show PRQL, Eric and Kostas talk about the evolution of the role of a data engineer and preview the conversation with Aayush Jain. ...
Transcript
Discussion (0)
Welcome to a Data Sack Show prequel.
So we're going to talk about the upcoming episode that we're about to publish.
Kostas and I exchanged some interesting ideas and hopefully give you a little bit of a preview
of what we're going to talk about.
Okay, Kostas, we talked about business observability with someone building a company and trying
to establish that category.
This is kind of specific, but one thing that really stuck out to me in this conversation
was we talked about, you know, sort of all of the technologies that are popping up around
the modern data stack, right?
So collection and adjustment is getting easier.
Transformation is getting easier.
Data quality monitoring, you know, There's really good tools for that. Now we're talking about business observability
sort of around the metrics piece. And we talked a little bit about the role of the data engineer
as these tools that automate a lot of things that sort of previously were components of their job.
How does that change the role of the data in a near team structure?
And the answer was really interesting to me
that our guest gave.
He actually said it's causing the role
of the data engineer to expand in scope and responsibility.
It's becoming more important
as this constellation of tools
makes certain parts of the data stack easy.
So your thoughts.
Yeah, okay.
Two things. One thing, one main thing
there, one main thought. First of all, I think by definition, the data engineering role and why we
need it to have a different flavor of engineering there is exactly because it's like a kind of
hybrid. And I think we have touched that in the past, like when we were talking also with other
guests. It's like a hybrid between operations and actual software engineering, right?
So traditionally, the data engineer had to write code.
So when we didn't have infrastructure for pipelining, someone had to write the code
for these pipelines, right?
But then when you write the pipelines, then you have to operate the pipeline.
So you have also operations and you have observability, blah, blah, blah, and all that stuff that
comes more from the SRE space.
So I think what is happening and does like more and more and more tools are becoming
available and platforms is that part of the engineering work that the engineer had to
do, like writing the pipelines, okay, now it becomes more of an operational problem.
So the balance between operations and actual software engineering
is changing a little bit as part of the role of the data engineer.
This whole new data stack or whatever we want to call it
is still in e-fancy.
So we don't know exactly in a couple of years
what the data cloud is going to look like
and how the job description of the data engineer is going to look like.
But today, it's like this kind of mix between software engineering and operations.
In my opinion, what I find much more interesting is not like this kind of mix or how the mix of operations and software engineering changes from data engineer.
It's more about the emerging roles of rev ops
and even data ops.
But okay, this is, I think, still like very, very early.
Rev ops, marketing ops, sales ops,
like all these roles that they, let's say,
are more of a marketeer in the case of the marketing firms,
but at the same time, they have to work,
also need to have a tech background.
There's the need to operate things
that have to do with data infrastructure
or to work with data and all these things.
So I kind of find the emergence of these roles,
to be honest, a little bit more interesting
than how the data engineering
role is going to to change and i think that this whole part of the stack that has to do with the
operationalizing the data and the data warehouse is where all these roles like they're going to
emerge from so and it's a pretty new space right like? It's a category that it's still out of definition.
In every episode we have, we hear about a new definition,
the new extension of what it means to operationalize your data, right?
Yeah.
So, yeah, I think we will see more new professions coming out of this whole wave of innovation.
I agree.
Okay, second question for you. We talked about business observability, which is, I won't, we don't have enough time in the prequel to just talk about the
emergence of all these new categories, you know, data mesh, data fabric, you know, all of the data
cloud, these different terms, and now observability, you know, sort of the constellation of terms
around observability is popping up. Super interesting in and of itself. But we talked to the guests about sort of anomaly detection
relative to business metrics and KPIs, right? Which is a really interesting concept, right? So
the basic idea is dashboard goes away for particular things and you basically have
anomaly detection. You can get notified of things that are statistically significant.
You are an entrepreneur who also sort of understands
the mechanics of data pipelines.
As an entrepreneur, how would you,
like what would be the first couple of use cases
you would use in terms of anomaly detection
for business metrics or KPIs?
I think there's also, and that doesn't have to do with our guest today, but I think there's
also, this is part of trying to create a new category.
There's a lot of marketing language in there.
And we are still trying to figure out exactly what's the difference.
When we are migrating a term from one space to another, we also try to figure out the
new semantics of this term, right?
So I don't think that business observability
is exactly the same thing as systems observability.
Let's say, think about being Doordash, right?
And you have to observe your infrastructure
because if you have a network outage,
yeah, you're going to lose probably thousands or millions of dollars because nobody's going
to be able to make orders.
Now, I don't think that on a business level, things operate in the same way.
Unless we want to connect the outage of the network with a business result, which is,
of course, going to be so severe.
But actually, what we have to understand
and where we have to react
is not like on the business observable, right?
It's the system observable that we need to react there.
Sure.
So there's like, I think, a distinction there
which is very important between metrics and KPIs.
So yeah, metrics.
And I think that I just mentioned that they have a company that
has like 200,000 metrics that they are using. Yeah, it was wild.
Yeah. Yeah. You can have a lot of metrics. And metrics usually is what you want to
measure much more, they react much more immediately. for example, let's say, let's talk about marketing, okay? Like, can you
share with us like one or two metrics and one KPI? Yeah, sure. I mean, we look at something like
product activation as a KPI, number of users who activate in the product. That's to some extent,
a proxy for intent, you know,
of actually solving a problem with the product.
And there are lots of metrics that ladder up to that, right?
You have certain in-app behaviors, you have volume metrics,
you have teammate metrics, you know,
there's just a lot of things that ladder up to that.
Yeah.
My intuition about that is that metrics are much more tactical tools.
It's something that you probably like go and check every day because you have your campaigns
and you might have tens of campaigns and each one of these campaigns and its performance
is one metric, right?
But it's not your KPI.
Like that's not something that you are going to report when like the leadership team is
going to discuss.
And then you have the KPIs, which is more of like a strategic tool okay and i think that there's a big difference there in the time
dimension like a kpi is something that i mean okay what is anomaly detection think about like
how you measure the kpi yourself right like it's on a weekly basis or like even on a monthly basis
because the variation week over week is like the variance is such that
doesn't make sense like to measure it.
There are KPIs that are quarterly based, right?
So what is anomaly detection on a quarterly basis?
I don't know.
I mean, why do you need like an algorithm to do that for you, right?
So I think this kind of like distinction between like metrics, KPIs,
limiting indicators and like all that stuff and proxy indicators and like all these things like
are very, very important. Of course, these tools are like important to help us like measure all
that stuff effectively, but there's also like the human factor that has to put the semantics behind
all that stuff that we are doing. And this is like extremely important. Otherwise, like the
tool is going to be useless.
Yeah.
Well, you didn't answer the question, but I knew you wouldn't, which is why I asked.
Why I asked it so specifically.
Okay. We are way over time, I think, for this prequel, but thanks for joining us.
We have a really great show coming up with a guest where we will dig into different roles
around data, sort of in the data
value chain. And we will talk specifically about what Costas just said, which is metrics, KPIs,
and anomaly detection. So join us for the upcoming show. You're going to love it.