The Data Stack Show - The PRQL: Why is the Data Engineer's Role Expanding?

Episode Date: December 3, 2021

In this show PRQL, Eric and Kostas talk about the evolution of the role of a data engineer and preview the conversation with Aayush Jain. ...

Transcript
Discussion (0)
Starting point is 00:00:00 Welcome to a Data Sack Show prequel. So we're going to talk about the upcoming episode that we're about to publish. Kostas and I exchanged some interesting ideas and hopefully give you a little bit of a preview of what we're going to talk about. Okay, Kostas, we talked about business observability with someone building a company and trying to establish that category. This is kind of specific, but one thing that really stuck out to me in this conversation was we talked about, you know, sort of all of the technologies that are popping up around
Starting point is 00:00:38 the modern data stack, right? So collection and adjustment is getting easier. Transformation is getting easier. Data quality monitoring, you know, There's really good tools for that. Now we're talking about business observability sort of around the metrics piece. And we talked a little bit about the role of the data engineer as these tools that automate a lot of things that sort of previously were components of their job. How does that change the role of the data in a near team structure? And the answer was really interesting to me
Starting point is 00:01:08 that our guest gave. He actually said it's causing the role of the data engineer to expand in scope and responsibility. It's becoming more important as this constellation of tools makes certain parts of the data stack easy. So your thoughts. Yeah, okay.
Starting point is 00:01:24 Two things. One thing, one main thing there, one main thought. First of all, I think by definition, the data engineering role and why we need it to have a different flavor of engineering there is exactly because it's like a kind of hybrid. And I think we have touched that in the past, like when we were talking also with other guests. It's like a hybrid between operations and actual software engineering, right? So traditionally, the data engineer had to write code. So when we didn't have infrastructure for pipelining, someone had to write the code for these pipelines, right?
Starting point is 00:01:57 But then when you write the pipelines, then you have to operate the pipeline. So you have also operations and you have observability, blah, blah, blah, and all that stuff that comes more from the SRE space. So I think what is happening and does like more and more and more tools are becoming available and platforms is that part of the engineering work that the engineer had to do, like writing the pipelines, okay, now it becomes more of an operational problem. So the balance between operations and actual software engineering is changing a little bit as part of the role of the data engineer.
Starting point is 00:02:34 This whole new data stack or whatever we want to call it is still in e-fancy. So we don't know exactly in a couple of years what the data cloud is going to look like and how the job description of the data engineer is going to look like. But today, it's like this kind of mix between software engineering and operations. In my opinion, what I find much more interesting is not like this kind of mix or how the mix of operations and software engineering changes from data engineer. It's more about the emerging roles of rev ops
Starting point is 00:03:08 and even data ops. But okay, this is, I think, still like very, very early. Rev ops, marketing ops, sales ops, like all these roles that they, let's say, are more of a marketeer in the case of the marketing firms, but at the same time, they have to work, also need to have a tech background. There's the need to operate things
Starting point is 00:03:32 that have to do with data infrastructure or to work with data and all these things. So I kind of find the emergence of these roles, to be honest, a little bit more interesting than how the data engineering role is going to to change and i think that this whole part of the stack that has to do with the operationalizing the data and the data warehouse is where all these roles like they're going to emerge from so and it's a pretty new space right like? It's a category that it's still out of definition.
Starting point is 00:04:08 In every episode we have, we hear about a new definition, the new extension of what it means to operationalize your data, right? Yeah. So, yeah, I think we will see more new professions coming out of this whole wave of innovation. I agree. Okay, second question for you. We talked about business observability, which is, I won't, we don't have enough time in the prequel to just talk about the emergence of all these new categories, you know, data mesh, data fabric, you know, all of the data cloud, these different terms, and now observability, you know, sort of the constellation of terms
Starting point is 00:04:42 around observability is popping up. Super interesting in and of itself. But we talked to the guests about sort of anomaly detection relative to business metrics and KPIs, right? Which is a really interesting concept, right? So the basic idea is dashboard goes away for particular things and you basically have anomaly detection. You can get notified of things that are statistically significant. You are an entrepreneur who also sort of understands the mechanics of data pipelines. As an entrepreneur, how would you, like what would be the first couple of use cases
Starting point is 00:05:17 you would use in terms of anomaly detection for business metrics or KPIs? I think there's also, and that doesn't have to do with our guest today, but I think there's also, this is part of trying to create a new category. There's a lot of marketing language in there. And we are still trying to figure out exactly what's the difference. When we are migrating a term from one space to another, we also try to figure out the new semantics of this term, right?
Starting point is 00:05:44 So I don't think that business observability is exactly the same thing as systems observability. Let's say, think about being Doordash, right? And you have to observe your infrastructure because if you have a network outage, yeah, you're going to lose probably thousands or millions of dollars because nobody's going to be able to make orders. Now, I don't think that on a business level, things operate in the same way.
Starting point is 00:06:18 Unless we want to connect the outage of the network with a business result, which is, of course, going to be so severe. But actually, what we have to understand and where we have to react is not like on the business observable, right? It's the system observable that we need to react there. Sure. So there's like, I think, a distinction there
Starting point is 00:06:37 which is very important between metrics and KPIs. So yeah, metrics. And I think that I just mentioned that they have a company that has like 200,000 metrics that they are using. Yeah, it was wild. Yeah. Yeah. You can have a lot of metrics. And metrics usually is what you want to measure much more, they react much more immediately. for example, let's say, let's talk about marketing, okay? Like, can you share with us like one or two metrics and one KPI? Yeah, sure. I mean, we look at something like product activation as a KPI, number of users who activate in the product. That's to some extent,
Starting point is 00:07:24 a proxy for intent, you know, of actually solving a problem with the product. And there are lots of metrics that ladder up to that, right? You have certain in-app behaviors, you have volume metrics, you have teammate metrics, you know, there's just a lot of things that ladder up to that. Yeah. My intuition about that is that metrics are much more tactical tools.
Starting point is 00:07:46 It's something that you probably like go and check every day because you have your campaigns and you might have tens of campaigns and each one of these campaigns and its performance is one metric, right? But it's not your KPI. Like that's not something that you are going to report when like the leadership team is going to discuss. And then you have the KPIs, which is more of like a strategic tool okay and i think that there's a big difference there in the time dimension like a kpi is something that i mean okay what is anomaly detection think about like
Starting point is 00:08:15 how you measure the kpi yourself right like it's on a weekly basis or like even on a monthly basis because the variation week over week is like the variance is such that doesn't make sense like to measure it. There are KPIs that are quarterly based, right? So what is anomaly detection on a quarterly basis? I don't know. I mean, why do you need like an algorithm to do that for you, right? So I think this kind of like distinction between like metrics, KPIs,
Starting point is 00:08:46 limiting indicators and like all that stuff and proxy indicators and like all these things like are very, very important. Of course, these tools are like important to help us like measure all that stuff effectively, but there's also like the human factor that has to put the semantics behind all that stuff that we are doing. And this is like extremely important. Otherwise, like the tool is going to be useless. Yeah. Well, you didn't answer the question, but I knew you wouldn't, which is why I asked. Why I asked it so specifically.
Starting point is 00:09:14 Okay. We are way over time, I think, for this prequel, but thanks for joining us. We have a really great show coming up with a guest where we will dig into different roles around data, sort of in the data value chain. And we will talk specifically about what Costas just said, which is metrics, KPIs, and anomaly detection. So join us for the upcoming show. You're going to love it.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.