The Data Stack Show - The PRQL: Can We Define the Role of the Data Engineer (Yet)?

Starting point is 00:00:00 Welcome to the Data Sack Show prequel. We just recorded an episode with Par who has a long history working at Talent. We talk about early days of Java, so ETL pipelines in Java, then Hadoop Spark, and then we talk about all of the modern tools today that he uses in consulting. He also started a school, which is super interesting. And I think one of the things that I thought was really interesting about our chat with Par Kostas was I really got the sense that we're still really in early innings with data engineering in many ways. It's kind of like the term, the modern data stack. It's, you know, parts of it makes sense, but it can be kind of hard to define.

Starting point is 00:00:47 And data engineering seems to be the same way, even down to the tools. He said, I'm opinionated. Here are the five tools that you have to know as a data engineer. And I think he's right, but I also think that someone else may have a different set of tools and they could be right as well well just because we're in early innings what do you think yeah i totally agree we are definitely at the beginning of the definition of this role and i think as we move forward probably we will see the role break down into like many different roles just the same way that you know like 30 40 years ago you were a software engineer and at the same time you were doing operations you know, like 30, 40 years ago, you were a software engineer.

Starting point is 00:01:25 And at the same time, you were doing operations, you were doing like database management, you were doing like, you might even like, had to manage the hardware itself, right? We don't do that anymore. We've reached a point where we have like, front-end engineers and back-end engineers and systems engineers. And like, this is probably something that might, but will also happen with data engineering. It's still early, but what I keep like

Starting point is 00:01:51 from the conversation that we had with Par and I think that's what everyone needs to keep in their mind. And this is something that we see in general with software, to be honest. It's not the tools themselves that are like that important. It's more about the foundations. What he called also,

Starting point is 00:02:05 he gave this great example between when you build a house and the difference between the nail and the frame. I totally agree with that. Yeah, you can be very opinionated and be like, when it comes to streaming data, I prefer to work with Kinesis or I prefer to work with Kafka or PubShop. It doesn't matter at the end which one of them you're going to use. This is going to be actually, the decision is going to be affected from many different parameters. But what is important to know is that, yeah, I will have, as a data engineer,

Starting point is 00:02:38 I need to understand the difference between streaming and batch processing of data and why I need both, right? And this is more fundamental than the tool itself. Doesn't matter if it is Kafka or Kinesis at the end, right? The same thing also with orchestration, like orchestration is like a very important part of data engineering. Now, is it going to be a flow or something else?

Starting point is 00:02:57 Doesn't matter. At the end, you need somehow to schedule and orchestrate the execution of jobs, right? That's what is like really important to keep in mind. Yep. I thought that was a really good discussion. I agree. And the house framing analogy was really helpful.

Starting point is 00:03:14 I think the other thing, I know we're close to time here. I think the other thing that was really interesting was just talking about some of the similarities and differences between a data engineer and a software engineer. And he gave some good examples of what does it look like inside of an actual company, some of the stakeholders, and some of the important components there. So really good conversation. You'll definitely want to check that one out, and we will catch you on the full episode.

The Data Stack Show - The PRQL: Can We Define the Role of the Data Engineer (Yet)?

In this PRQL, Eric and Kostas preview their upcoming conversation with Parham Parvizi of tura.io. ...

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.