The Data Stack Show - The PRQL: Can We Define the Role of the Data Engineer (Yet)?
Episode Date: February 11, 2022In this PRQL, Eric and Kostas preview their upcoming conversation with Parham Parvizi of tura.io. ...
Transcript
Discussion (0)
Welcome to the Data Sack Show prequel.
We just recorded an episode with Par who has a long history working at Talent.
We talk about early days of Java, so ETL pipelines in Java, then Hadoop Spark, and then we talk
about all of the modern tools today that he uses in consulting.
He also started a school, which is super interesting.
And I think one of the things that I thought was really interesting about our chat with Par Kostas
was I really got the sense that we're still really in early innings with data engineering in many
ways. It's kind of like the term, the modern data stack. It's, you know, parts of it makes sense, but it can be kind of hard to define.
And data engineering seems to be the same way, even down to the tools. He said, I'm opinionated.
Here are the five tools that you have to know as a data engineer. And I think he's right,
but I also think that someone else may have a different set of tools and they could be right
as well well just because
we're in early innings what do you think yeah i totally agree we are definitely at the beginning
of the definition of this role and i think as we move forward probably we will see the role break
down into like many different roles just the same way that you know like 30 40 years ago you were a
software engineer and at the same time you were doing operations you know, like 30, 40 years ago, you were a software engineer.
And at the same time, you were doing operations, you were doing like database management, you
were doing like, you might even like, had to manage the hardware itself, right?
We don't do that anymore.
We've reached a point where we have like, front-end engineers and back-end engineers
and systems engineers.
And like, this is probably something that might,
but will also happen with data engineering.
It's still early, but what I keep like
from the conversation that we had with Par
and I think that's what everyone needs
to keep in their mind.
And this is something that we see in general
with software, to be honest.
It's not the tools themselves that are like that important.
It's more about the foundations.
What he called also,
he gave this great example between when you build a house and the difference between the nail and
the frame. I totally agree with that. Yeah, you can be very opinionated and be like, when it comes
to streaming data, I prefer to work with Kinesis or I prefer to work with Kafka or PubShop. It
doesn't matter at the end which one of them you're going to use.
This is going to be actually,
the decision is going to be affected from many different parameters.
But what is important to know is that, yeah,
I will have, as a data engineer,
I need to understand the difference between streaming
and batch processing of data and why I need both, right?
And this is more fundamental than the tool itself.
Doesn't matter if it is Kafka or Kinesis at the end, right?
The same thing also with orchestration,
like orchestration is like a very important part
of data engineering.
Now, is it going to be a flow or something else?
Doesn't matter.
At the end, you need somehow to schedule
and orchestrate the execution of jobs, right?
That's what is like really important to keep in mind.
Yep.
I thought that was a really good discussion.
I agree.
And the house framing analogy was really helpful.
I think the other thing, I know we're close to time here.
I think the other thing that was really interesting was just talking about some of the similarities
and differences between a data engineer and a software engineer.
And he gave some good examples of what does it look like inside of an actual company,
some of the stakeholders, and some of the important components there. So really good
conversation. You'll definitely want to check that one out, and we will catch you on the full episode.