The Data Stack Show - 216: From Raw Data to Business Results: Building High-Impact Data Teams with Ethan Aaron and John Steinmetz
Episode Date: November 20, 2024Highlights from this week’s conversation include:Ethan's Background (0:47)John's Background (1:16)Data Teams vs. Engineering Teams (2:04)Career Paths in Data (3:40)Pressure of Large Companies (6:10)...Contrasting Industries: Expedia vs. Gallo (9:02)Establishing Trust in Data (11:30)From Sales to Data (16:30)Understanding Success Metrics (18:58)Creating Daily Business Value (23:03)Aligning Data Work with Leadership Goals (29:30)Differences Between Data and Software Engineering (31:25)The Role of Data in Business (35:56)Understanding Data Contracts (39:35)Accuracy vs. Usability in Data (41:51)Observational Skills in Data Roles (44:03)Defining Product in Data vs. Software (47:07)Final Thoughts and Takeaways (52:42)The Data Stack Show is a weekly podcast powered by RudderStack, the CDP for developers. Each week we’ll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data.RudderStack helps businesses make the most out of their customer data while ensuring data privacy and security. To learn more about RudderStack visit rudderstack.com.
Transcript
Discussion (0)
Hi, I'm Eric Dotz.
And I'm John Wessel.
Welcome to the Data Stack Show.
The Data Stack Show is a podcast where we talk about the technical, business, and human
challenges involved in data work.
Join our casual conversations with innovators and data professionals to learn about new
data technologies and how data teams are run at top companies.
Welcome back to the Data Sack Show.
We have two guests today, Ethan Aaron of Portable and John Steinmetz of Gallo Mechanical.
Gentlemen, welcome to the show.
Thank you so much for the conversation.
All right.
Well, give us just a quick background.
Ethan, why don't we start with you?
Totally.
So I'm Ethan Aaron.
I'm the founder and CEO of Portable.
I've been working in data for almost a decade at this point.
I've been the head of data at a small startup, at a thousand person company.
And now for the last five years, I've been building data integrations so that data people don't have to worry about extracting data from systems and centralizing it into their warehouse. So we have 1500 different integrations. I've built
hundreds of them, almost 1000 at this point. So I can speak to all the different
nuances of the Seekers system. John. Yeah, John Steinmetz. Right now I'm a head of data
over at Gallo. I've implemented three data teams from scratch for startups, led some of the bigger teams
over at Expedia, HomeAway, and Bizarre Voice.
Started out as an engineer, worked my way up, decided to move to products.
Now moved to a CTO role where I would be administrating over product data and engineering.
And now primarily over the last five years, I've been working on startup data and focusing
on that.
Recently worked for Schiffke, a startup that eventually and is now probably about $2.5 billion close to that.
And now I'm taking my talents, if you want to call them that, to Gallo Mechanical to try to change the construction industry.
Because that is a very underserved data industry.
So yeah, that's me.
So guys, in our wrap here a few minutes ago,
we talked about data and engineering teams
and some differences specifically around product.
So I'm excited to dive into that,
talking about data product people
versus product people on the engineering side.
What are some topics you guys are excited about?
Talking about that problem
in terms of the similarities and differences,
I think there are a lot of differences between data
teams and product and engineering
teams. And then also thinking about
the nuance of that
when you're at a one-person data team,
a company that can afford a one-person data
team versus a company that can afford a
100-person data team, because it changes.
Just like engineers. A one-person
startup with one engineer is very different from a company with 300 engineers and how you have to trade.
So I'm excited to dig into that as well.
What about you, John?
Yeah, very similar.
I think that data is all about doing what's right for the business from a value perspective.
And with any engineering task, if you don't have a business goal or a business
lead into that, you will eventually waste a lot of money. So tying all that in, I always run all
my teams like a product. It's got an engineering side, a product side, and a design side as well.
So you have all of that in there and leading to that business value is really critical and not
all companies are the same. So you got to kind of figure out like, what does it mean for one person? Like,
like Ethan said, versus, you know, what does that growth look like? What do you need right now?
Versus, you know, what do you need later and making sure you don't spend a lot of money up
front and the product side of that really drives that home. Love it. Well, tons to talk about.
Let's, let's dig in. Right. Let's do it. All right. Both of you have really interesting careers.
You know, Ethan, you started in banking.
And John, you started in, you know, sort of, let's say, traditional software engineering as a software engineer.
And both have ended up in the world of data in different places.
So could each of you just give us the
couple minute version of your story? Where did you start in the world of data and what got you
into it? And then, you know, how did you end up where you are today? We'll switch it from the
intro. So John, why don't you, why don't you lead us off? Yeah, so this is great. I actually started
out as an engineer for more of a marketing side where I'm building, you know,
probably 20, 30 applications a year for various brands. Loved it. Love the engineering side.
Love the design side. That's actually my degree is actually a graphic design degree.
Oh, wow. Yeah. And then I realized very quickly that the other side of my minor in college was
sociology. So I realized quickly that I loved
how groups think. And that was a really important aspect of what I was doing. And then I realized
as I was building these applications that there were nuanced ways of building them for each
different brand according to what goals they had. And then I saw the marketing data behind it. And then I saw all the marketing data behind it. And then what really inspired me to make that
switch into more of a product slash data role was when I saw the effects. It was at the very
beginning of social media, right? When I saw that just some small differences that you can make in
these applications, they would lead into big things on the other side, value for the businesses.
And then I realized that's really where I wanted to put most of my time and effort.
And that led me to leading data organizations, organizing them in such a way so that
everything rolled up to the company's goals, not just the group's goals.
I have a personal mission so that data no longer becomes what you see in IT, which is a kind of like a, we have to
have this mentality, cost center to, we need, this is a profit center for us.
I want to move data into that space as opposed to what most people do is they just hire a
data team because they need it.
And I say you do.
And the reality is we need to be the bookends of everything at the start and at the end.
And that's really what my mission is
from where I was to where I am today.
And you did some time at some really large companies,
both established companies.
You were at Expedia.
You managed the homepage team there.
Can you describe just a little bit of the pressure of that? Because I don't have direct experience with a homepage that there. Can you describe just a little bit of the pressure of that? Because,
you know, I don't have direct experience with homepage that large, but the little experience
I do have is, you know, when you push to production, it's a big deal because it can mean,
you know, if you screw something up, it can cost the company a huge amount of money.
Oh yeah. And I'll tell you, like being at Expedia and it was Expedia HomeAway.
So it wasn't a big one, but like we integrated with Expedia because all of the data is shared between those organizations.
Oh, interesting.
So my first day at Expedia, I was told you are going to be presenting to all 10,000 people in the company.
My first day.
I had never talked to anybody.
I was like, all right, what do I need
to do? I roll up my sleeves and do it. So pressure, yeah. And there's a guy over there. Well,
he's not over there anymore. He's now with PayPal, John Kim. And I think I learned more in the time
I was at Expedia on how to do data for leaders from listening to him give me feedback in those
big product meetings, because these were all
televised over the internet for the entire company. So I was literally speaking to not only
them, but the questions he asked and the way he presented those questions really got me thinking.
It was the first time I had ever heard of OKRs and KPIs, right? And I thought I knew everything
when I got in there. So the pressure was there. But at the end of the day, Expedia and most big companies like that, their process is essentially a workflow. The homepage leads to the next page, leads to the next page. And your job is to pass the right amount of people in the right ways to the next team.
That was it.
And if you think about it from a manufacturing perspective, that's a phenomenal way of thinking
because you're not worried about the whole thing.
It takes the pressure off.
The pages that came after that page told me what they needed.
And then I designed my systems to say, let's figure out the best way to
send the right people in the shortest amount of time to that next page. And then they did their
thing for the same thing, right? I mean, we're talking millions of dollars of impact, especially
at the homepage, because I was the tip of the spear. If I didn't drive the right people to the
second page, everything's off. So yes. And it was a very data science heavy
role. Very data science. That makes total sense. Yeah. So it was fun. It was a learning experience
and it was really good for me to learn how big companies do it because then you can parse that
out into how smaller companies can also drive that impact as well. Okay. And then contrast that with Gallo because, you know, that sort of experience is, you
know, let's just say bleeding edge, digital, you know, almost purely digital journey, tons
of traffic, tons of SEO implications, you know, just the nice edge, if you will, or the sharp end of,
you know, a digital funnel. And now you are doing data in the construction industry. And those
things are, those are different. So I would just love to hear a little bit about the contrast there.
Different, but the challenges are still there, right? The challenge in an industry like this is,
and Ethan could probably talk about this too, the challenges in the finance industry 10 years ago is what construction is doing right
now. They don't necessarily have the same systems. They don't necessarily have people like me with
the experience that come into the construction because there's not as much money to be made
in that space. So it's typically, it's a different kind of situation, but the foundational elements are
all the same, right?
You got to have a data warehouse.
You got to centralize all your analytics.
You can't tie in directly to these systems because you can't affect production.
Like all of that stuff is very similar that I would say the one biggest difference is
most of the stuff I'm doing today is one way.
It's all read only.
Whereas in these other systems, I'm pushing data back into these platforms.
Right.
Right.
So that's the biggest technical difference.
Yeah.
But from a business perspective, it still goes, what are the company's goals?
What are you looking to do?
And how can I provide those?
Now, I know how much we're going to talk about this today, but there is a method I go through,
which is determining a company's analytics intelligence.
And I have a very specific formula that I use for that.
And that helps me determine what path the data culture needs to follow.
Companies like this here at Gallo, very smart people, but we're still stuck in Excel.
Whereas some of these other companies use Excel as a sandbox tool,
but never as a production tool.
Yep.
So you kind of have to like figure out what's important to the business
and how do we get them away from Excel and trust the data.
So that trust, I would say it's more difficult to establish trust
in an organization like this than it is at Expedia
because the trust in our big data world comes with the territory.
Right.
It's accountability instead of trust.
Like you have to be accountable to the things you make.
Whereas here, and I'm not saying this as a detriment to this company, it's not.
But in companies that aren't served by data very frequently. Yep. Everything is an amazing thing.
Sure, sure.
Right?
Yeah.
So that's a little harder to kind of come back to.
Yeah, yeah, yeah.
No, I'd love to talk about the analytics framework.
But Ethan, okay, give us your backstory
and then we can dig into the juicy stuff.
I mean, this is already juicy, but I mean the top.
My background's kind of all over the place.
So I started my career at Goldman Sachs doing real estate investing. I was buying properties, office buildings,
residential units, logistical warehouses, that type of stuff. But I found myself getting more
excited about the spreadsheets and how do we write VBA code or how do we restructure these to be more
efficient or how do we streamline a process or how do we replace light bulbs in a building,
all the operational stuff. I also hate authority and I hate being told what to do.
So I was going to say, that sounds like a great cultural fitting.
Heavily regulated industry, maybe not the best.
Yeah. So a couple of years in, I went to a 12-person data startup and I was supposed to do
sales. And again, I knew nothing about any, I didn't know sales. I didn't know startups. I
didn't know data, but it sounded like an amazing place to learn.
And I got there day one.
The CEO was like, actually, we don't need sales right now.
We need someone to build dashboards and implement customers.
Do you know SQL?
Do you know Shell Script?
I was like, no, but I'll figure it out. And it gave me a really interesting perspective because it wasn't, can you go learn SQL and
can you go learn data tooling?
It was, I need you to know this because you need to build the stuff.
I need to run this company.
It was very much in service of running the company.
It was not in service of learning SQL for the sake of learning SQL.
Did they did product, sold some data, and then we got acquired by LiveRamp.
I did product management at LiveRamp for about a year.
So I've been on the data side, but I've also worked with
engineers at small companies, large companies, and now back at portable. And then I set up the
data team at LiveRamp. So we were a thousand person company, did not have a centralized data
team. I made a case for, hey, we should have a centralized data team. Here's what it should look
like. Started interviewing all of our execs, what matters to the CRO, what matters to the CEO, what matters to the CMO, and coming up with our global list of KPIs and OKRs that
we can actually measure with data.
Did that for about a year.
We sold off our parent company for $2 billion.
And I went and I worked for our chief strategy officer at Libram, trying to figure out who
we should partner with or who we should acquire.
So I spent a lot of time digging into data integration companies, small companies, big companies, and give me a very good understanding
of the ecosystem of ELT tools, ETL tools, CDPs. Reverse ETL wasn't really a category yet, but
iPaaS tools. And about a year into that, I was like, I can do that. Started portable. And what
we've been doing for the last five years, our goal has always been build 10,000 integrations so that data teams don't have to. We want to build a platform
on which we can build and maintain 10,000 integrations that pull data from systems and
put it into data warehouses. At this point, we have 1,500 integrations. I personally probably read thousands of sets of API documentation.
I've probably built 750 integrations myself at this point. And I'm also the data person at
portable. So in addition to building integrations, being the marketing face of portable, doing sales,
customer support, I'm also back to the lens of I'm the CEO. I need to run this company.
What data do I want at my fingertips?
And I'm doing that, but I'm not doing it for the sake of I like data.
I'm doing it because I need to run the company.
It's going to be an interesting perspective on all of this.
I also talk to data people all the time. I probably have 15 meetings a week with heads of data at small companies, big companies.
I host events in New York and really big events at Snowflake Summit every year.
But excited for the chat today.
Yeah, yeah.
So one quick observation
before we move on.
I don't think I've ever heard
anybody get hired
into a sales role
and a CEO go and say,
hey, we don't need more salespeople.
I want you to do data stuff.
I don't think I've ever heard that.
That is a great point.
It wasn't we don't need more salespeople. I was going to be the first salesperson.
The funny part about that, though, is when I got hired, it wasn't me being like,
I have a sales background. I can do that. The CEO was really looking for passion and just like,
are you willing to work hard? So the entire conversation, my interview with the CEO of
this company for about an hour was just us talking
about building efficiency light bulbs energy efficiency all the stuff I was working on at
Goldman the things I got really excited about I went very probably too deep into but I got hired
because of that I like he said you need a salesperson I said I could do that and then when
he said I need a data person I was like I do that. Like when you join a 12 person company, like I would assume you're signing up for whatever
needs to be done, whatever.
Yeah.
Okay.
Yeah.
You got hired based on like, Hey, this is the right guy.
Like we're going to, we're going to slot them somewhere.
You got initially slotted and just like move positions.
Basically.
That makes sense.
Yeah.
I honestly, like as a founder now, I view it through the lens of like like that's probably a really good way to screen like it's too late like you should
do it beforehand but it's like probably a really good way to screen out people at small startups
is like you thought you were going to do this day one i'm actually gonna and yeah okay let's do it
yeah and if they don't then they, yeah, it saves you some time.
I'll say this. Most of the unicorns that you see all start out like that, where you have a very
small group of people that can do multiple things. I'll say this about Schiffke. When I was hired,
I was in a CTO role and I left to go take a product role, very low level product role with
Schiffke because I saw the vision and they knew all the
things i could do and within i was the first product hire and within three weeks they were
like we just want you to do data yeah i mean it's all the successful startups i've heard tons of
stories like that yeah that kind of reminds me of your background you like at rudder sex don't
bunch of different things and it's like so it was a hard
pivot day one for me like look i didn't know anything about sales and i didn't know anything
about data so it was really just like which one am i going to get up to speed up one yeah yeah
totally we actually had someone from braze on the show a gentleman named spencer burke yeah and he Spencer Burke. Yeah. And he has been at Braze, it's 11 years?
12, yeah.
12 years?
Yeah.
Which is a really long run
at a startup.
He was, I mean,
I think they were just
a couple of people.
He was employee number
like one or two
outside of the founders.
Yes, yeah.
And I think, I mean,
he was like running around
New York City
just trying to see
if people would install
this SDK in their mobile app,
you know, like all these mobile startups. Anyways, great episode and like really good story of like
that playing out over a long period of time within the same company because his role has changed
really significantly. But yeah, so great episode if anyone's interested. Okay, we've already had
such a fun conversation, but let's dig into our first topic.
And I'll lead it in by mentioning an article that bubbled up on Hacker News this week.
It's a great article called How I Ship Products at Large Companies, or something of that nature.
And really interesting article, a number of just generally good pieces of advice. But one of the points that the author makes that really stuck out to me was how you measure success for a product. And one of the points that he makes
is that one of the success criteria needs to be buy-in and excitement from your boss and from management on what you ship.
And he even goes so far as to say, which I loved the provocative nature of this,
he said, you're probably thinking like, oh, you need to measure like whether people use it as
well. And he was like, actually not true. Like it's like, management buying into and getting excited about what you're shipping, right?
Because if they're not doing that and it is, like, numerically successful, it still doesn't matter, right?
You know, which people have different opinions on that.
But I loved the point that he made in drawing a really hard line on you can measure things in different ways, but there's really only one or two
things that actually matter. And when we were chatting before we hit record, both of you made
that point around the business value of data. And I mean, Ethan, you even alluded to this,
you know, you're doing data stuff at Portable, but only to serve extremely specific needs,
you know, the specific needs of the CEO, which, you know, also happens.
But yeah, John, why don't you, I'd love to just hear you talk about your experience with that.
And yeah, just give us some insight into that dynamic around business value.
Yeah, I think that article probably has probably some merit to it. When I get in,
my number one thing is to create excitement, right? Because when you're creating a data culture,
especially from scratch, that business value leads to excitement a lot of times. Most of the time,
those two things intertwine because business leaders get really excited when they see the
numbers going up in a very strategic way. But they're also intangibles
that when you have discussions with these business leaders, especially at the C-suite levels, like,
what do you care about? What is important to you? And I've actually heard C-suite members say,
I don't care what the financials are right now. I want to get people talking about our product.
I want to get people understanding that we have made a mark in this industry or whatever.
So that gets my wheels turning, right?
Like what can I do from a business perspective that will show those things, right?
I preach to my teams, well, not here, but like my previous teams, right?
Daily business value.
And they will tell you, I say it every single day.
What have you done today to provide value to us?
Right?
Some days you have better than others.
And that's okay.
But like, you should always strive for that.
And by business value, I mean, what did you deliver that someone is using?
Right?
It doesn't matter how many dashboards you deliver.
It doesn't matter how many APIs you create.
It doesn't matter any of that.
If nobody's using it, it holds zero value. And that's why I also teach my teams, the faster you can get
something to production, even if it's not perfect, the better it is for you. Because if you spend six
months planning something and you push it out there and people go like, what the hell am I
looking at? Like you've just wasted six months of company time. But if you put something in and you build it
and you get it out there in two or three weeks,
even in a partial spend,
and you start getting feedback,
feedback is business value
because you are now learning
what the business needs to be successful.
So it really is that simple to me.
Generating that excitement comes from
really finding those champions within the business,
getting those people excited about
what you're doing. And that usually leads to more money for your group too. So I always try to put
that in place and try to make people understand that, you know, value for the most part equates
to dollars, but it doesn't always have to. It could be just that excitement. Yeah. Can you speak to both the listener and me?
Because I'm guessing that some of our listeners are having the same reaction where they hear
that and the idea of daily business value sounds great.
But in the back of their mind, they're thinking, things are pretty complex here where I work. There's a lot of technical debt. Shipping stuff is hard, probably for some reasons that aren't that great, but also for some reasons that are legitimate, maybe from both a technical and cultural perspective. And so shipping daily business value, I think for some
people may feel like a really challenging thing to do. So can you speak to those people and maybe
just give an example of if I'm feeling that, like, oh, that sounds so great. I just don't know how
that would be possible. Help us think through how we can do that and like, you know, go into work
tomorrow and try to create some daily business value. It's simple.
Make a list of everything you see on a daily basis that your business or your teams are
struggling with.
Could it be lack of data definition documents, lack of understanding of certain things?
Well, you know what?
Carb out some time today instead of doing development work or building out an insight.
Start trying to model out things efficiently.
Go into Miro, create a workflow document that generally determines
what your business is doing in this particular workflow
instead of just assuming and building, lay it out, right?
Like do those types of things.
It doesn't always have to translate to shipping code.
Sometimes it could be, I mean,
it was an easy one at Schiffke when I got in there, you know, they had already had some processes in
place. And I said, you know what? Yeah, I'm going to spend some of my time building this out. But
if I just went and talked to a few people, I could really, from a business perspective, model out the path from
sales all the way through BI, like all the way through, and then write it out as a business
document. That's the thing that most people miss when you're building out teams to do things like
stop thinking like a technical line. There's so much more to what we do than just thinking, oh, we're building a workflow, right? Ask why, what is happening here and uncover problems. Just because you can't solve them doesn those people is don't get overwhelmed by your actual
job. Get overwhelmed by everything. Put your business stuff in place. Yep. I love it. I love
it. All right, Ethan, do I need to re-ask the question? What was the question? Oh, focusing.
Yeah. Yes. The data. okay. So tell us about the interactions
between your data person and your CEO
and the data person having to prioritize,
you know, the core things that the CEO needs.
Yeah.
So I think there's a,
I got a few different things to say on this.
Number one, for the last like four years,
I keep going to people being like,
focus on business value, not infrastructure. Focus on business value, not infrastructure. Which I think for the last four years, I keep going to people being like, focus on business value, not infrastructure.
Focus on business value, not infrastructure. Which I think for the last four years was correct. It
was like, you went from a one-person data team to a 10-person data team, and now you're using
10 times as many tools. I think we enjoy it. That was not the right thing to do. You should focus on
helping business stakeholders. I think now that most teams have realized that the 10
person teams have shrunk, the people that are really great at what they do have expanded.
But it's like, I think the term business value is no longer, this is going to sound crazy because
I use it all the time. We've been talking about it nonstop. It's like, I think you have to go
one step further than that to actually think about what that means. And I think what John said is very much on point. Business value is not always revenue goes up
and costs goes down.
Like if you do those two things,
generally you are, should be okay.
But like, that's not always aligned
with what matters to a company.
And as the data person at Portable,
my CEO is the perfect example of how that's the case
when so yeah um like as the ceo and the data person if i had another if someone else was the
data person at portable they would look at me and be like all the stuff we did two months ago
you threw it out and now we're starting again on something new and then two months from now it's
like we just threw all that out and we're starting again on something new. And then two months from now, it's like, we just threw all that out and we're starting again on something new. Like,
why did our priorities change? And our priorities are changing every two months because we got 80%
of the answer to a question and that was good enough. And now we need to move on to the next
question compound on each other. So I think to John's point, it's like business value
is better than infrastructure.
Focus on business value.
But when you think about business value,
think about it through the lens of
get as close to your leadership team,
your board, your CEO,
your C-suite as possible.
They are priorities.
They could be KPIs, OKRs,
just like personal goals
and find a way to help them with that.
So like adding business value, adding, creating business value daily.
You don't have to like to a certain extent, you can ignore data.
I view data as a skill more so than a role.
And you're really great with data.
You can go to a marketing team, be like, I'm going to have value that you didn't even know
you could accomplish.
And you can do that when for them it would take six months or they know they might not
even realize it's possible so like thinking about your job through the lens of what are the strategic
priorities at any point in time they are not always revenue and cost like sometimes it's brand
awareness sometimes it's new logo acquisition even if it's not money figuring out what those are for execs and then going and be like, I'm going to help you accomplish that
all faster is where I think we now as an ecosystem need to move the conversation.
We need to, I think we've moved it now from stop focusing on infrastructure, focus on business
value. But I think to your point, Eric, like there's a lot of listeners, a lot of data teams
out there. They're like, cool.
I hear that business value, but like, what do I do?
Yeah.
My personal recommendation is always find the leader, find the top 10 leaders in your business.
Ask them their top three biggest pain points as leaders, the goals that they are being
compensated on, the goals they have to report to the board, to the CEO,
and find one of those 30 goals,
three goals per 10 people that you can make an outsized impact on and just do it.
That's my macro take on business value.
I actually think I use it a lot,
but I think it's the wrong term now.
I think people get that.
And now we got to move on to the next one.
Next one is how do you actually align yourself with strategic goals of your company and impact
them?
Love it.
Love it.
All right, John, you had, I've monopolized the conversation, but you had some really
good questions around product as it relates to data and some of the differences there.
So what?
Yeah, I'm excited about this.
Yeah.
Yeah, I think, yeah, I mean, great discussion on the business value.
I do have to say that I agree and I actually really identify with John.
I went from this transition of I'm a deeply technical person,
data person, into an executive role,
had the ability to do my own data stuff for myself, but also had a
team. So as time went on, my team did more of it than I did. And I got, sounds like similar to John,
got really obsessed with OKRs of all the way down to the team at my director level,
just to bring clarity of what are we doing and how are we going to measure it
and really trying to align around that. So very much identify with that.
So, okay. So switching gears into this data product, engineering product thing.
I'm so excited about this conversation.
Yeah. With Eric, head of product here. I think to start off with, I think for both of you,
all of you, even maybe I'll take maybe I'll have your take first.
Just your thoughts on,
let's just start with the differences.
What do you feel like fundamentally if you're a data engineer
versus a software engineer,
what are the crucial differences?
They're both technical roles.
Some of the skills can kind of overlap,
but what are those crucial differences
just between the roles
without talking about product yet?
Let's take a typical software company,
SaaS business.
I know this is going to change depending on the organization.
Well, let's say take a typical SaaS business.
Let's say you're a 100-person company.
In that SaaS business,
50 of the people could be software engineers.
In a 100-person SaaS business,
maybe you have three data people.
Maybe you have two, maybe you have one.
I would say that's probably the reason.
Maybe none.
I was in a company that was a thousand people and we had no centralized data people.
Yeah.
But what you have to realize there is in a similar size company, the relative software company,
this is going to be different in something like construction, might not have the software
engineers, but you might have the data people people you need to realize that when you compare your team structure your responsibilities
your roles in your tech stack and your approach to development you're not comparing your one person
data team to your 50 person software engineer yeah you need to compare your one person data team
to a company that has one engineer.
So even the idea of saying, like, so if you have one data person and they call themselves a data engineer, I think even that is wrong.
I would just call yourself a data person.
Like, you can't differentiate between a data engineer, a data scientist, a biologist.
Analytics engineer, yeah. You do not have the other people, so you are just a data engineer, a data scientist, a biologist. Like, you're just... Analytics engineer, yeah.
Like, hey, you do not have the other people,
so you were just a data person.
Just like, like, portable for two and a half years,
there were two of us.
We had me as the CEO and my CTO.
He was our one engineer, and I was everything else.
And when you think about the tooling, the processes,
the way you have to think about
the world, when you have one engineer, it's so wildly different than when you have 50
engineers.
So I, a lot of data people today are being, are drawing these analogies to product and
engineering orgs that are 30, 50, a hundred people.
There are less than a hundred companies in the world that should 30, 50, 100 people, there are less than 100 companies
in the world that should have a 50-person data team.
It's just not like that.
That is such a massive investment in just data that that's 50 people, let alone the
engineering orgs of 1,000 or 10,000 people that have to be shipping really complex things.
So like when you think about drawing analogies between the two i think the best analogies and the best framework
for data teams to model their tooling processes software development life cycle is startup
engineering or one day with one engineer up through 30 engineers. Look at how those teams are constructed,
how those teams ship,
how they focus on what's production
versus what's just get it out the door fast.
And unless you're at Google, Meta, Amazon, OpenAI,
your data team should never be looking at an engineering org
that's bigger than 50 people
and saying, I need best practices from that. So that's my perspective on a lot of the analogies is
there is an analogy there. It's just not the analogy most people think it is.
Yeah. Yeah, I tend to agree on that. I think the one thing that will be interesting, and this,
I mean, that makes a ton of sense for SaaS, but SaaS is in the business of building software. So
of course, the people that build the software is like a big chunk of the headcount.
And data is more supporting that initiative or effort.
But do we see more and more companies get into the business of like, data is what we do.
We do, I don't know, benchmarking or something.
All we do is ingest data from a bunch of different places.
And we're basically just in the business of data you would you would think in that scenario like
maybe you know maybe you have more of that like framework where you've got product data product
people and data you know data people split out into multiple roles however at that point you're
still kind of just mirroring the you know product and engineering like data is just like a
it's like they're engineers like it kind of doesn't matter right
so anyways John I want to get your take on this as well. Oh yeah
like this is prime for what I love to talk about right and I'm going to take a little bit more
of a technical approach to this. First and foremost I would say that
data is what
most companies are doing today. The software is just something that moves data or allows entry
of data. Like Netflix, you would think, oh, it's a streaming service company, but you look at their
stock price, it mimics the amount of data flowing through their system. Like, I mean, those kind of
parallels. I look at the difference between data engineering and software engineering, and I've done both.
So I can speak to this a little bit more closely.
Think about building a house.
You have very defined specs.
You have very defined rules for what you're building on a house, right?
That's software engineering, right?
That's you have very things that have to be done a certain way for it to work
data engineering is more like how does the water move through the house
right what temperature does it need to be over here versus over here hot cold like there there
are nuances to the workflows but you're essentially moving data throughout this software that's there. So fundamentally it's very different because
you do only have a couple of people typically focused on that. And there are more ways,
the precision of software that it needs, because you can't have anything break. You can't have
anything that doesn't work. Where in data, it's a little different, right? Like you're looking at data, you're looking at the outputs as opposed to the entirety.
So you have a little bit more flexibility.
I find data engineering is more configuration than it is anything else at this point.
We have, as a data engineer, we'll have 10 tools to build our stuff.
Whereas a software engineer might have one, right?
They build structured software.
They put it in JIRA.
They push it up.
We get a peer review.
Like there's a specific thing that is there.
Whereas us, we can choose portable.
We can choose a different tool.
We can choose.
We have a myriad of tools that have no bearing on the rest of the company.
They don't care.
It becomes a business decision.
Do we want to build this little thing?
Do we want to buy it i
mean it's truly product to your point like it it is truly more product driven than say software
development so that's the way i kind of look at the difference between a software engineer i mean
you look at look ml or you look at these middleware languages right i mean it's just basically all the
same stuff just a different flavor with different features.
And you have to learn in software engineering, if you're a, say, you know, a Java developer,
you know, Java and you're good at it.
And that's what you build in.
Whereas in our world, I mean, I may choose R for a project.
I may choose Python.
I'm choosing the right tool for what I need to build for what it can do.
So there's some, I think we have as data engineers,
more flexibility in our day-to-day than say a software engineer would.
That's a really, that's a great point, John, around,
especially with some of the new architectures and some of the new,
I mean, we talked about Iceberg a couple of times in the show,
other technologies like that actually are really accelerating this ability to say,
you know, you can actually pull this apart and kind of use whatever tool system or, you know,
whatever tools you want, and you don't even have to move the data, you know? It's really interesting
from that standpoint, whereas your software is built in Java, you know, or Go, or it's like, okay, I mean, yeah, that's,
guess what, software engineer, like we hired you, you're going to write Rust, you know, or whatever.
So yeah, that's a super interesting point. Yeah. But data contracts, right? So just to that point,
data contracts are essentially what allows software to interact at those touch points.
So you think about something like microservice strategy in software development, where that came out of looking at how data did
their stuff and being as flexible and simplifying it so that there's a data contract between two
pieces so you could interchange. And that's why most companies are moving to a microservice
strategy because it says, hey, I'm going to use Mongo over here. Oh, I'm going to use Postgres
over here because some databases have better functionality for operational purposes.
Instead of using a monolith, now they're moving to these smaller. And that, in my opinion,
I mean, some people might just argue about this, but I think they got those benefits from the
data work. They saw what we were doing in those workflows that we could interchange and inject and move things through.
And now all of a sudden they're doing that because they realize technical debt was getting to be such a burden and having to restructure every four to five years of rebuilding their entire platform.
Now they don't have to do that because everything's an API.
I'm going to collect data from here.
I'm going to send it over here.
And those data contracts are so critical. Yeah. Yeah. I really like your analogy of the water and the plumbing in the
house. My previous gig, we were in the specialty plumbing and water industry. And I've thought
about that analogy a lot, right? That's right. Yeah. I thought about it a lot because it's
actually really complicated. There's the basics of like, don't want it to leak right that's like the basics but when you get into like what region you live in
and what your water source is and what you want to filter out of the water and then you're getting
down to parts per million you're not ever like getting perfectly pure very rarely do you have
like perfectly clean pure water like distilled water like nobody it tastes gross nobody wants
that typically but when you're going through you're like all right i'm gonna get the like chlorine down to
this level and iron down to this level and like that's so much more similar to data as far as
like we're not ever going for perfect we're going for like this manageable like level of accuracy
that's that is the right level of accuracy for the business decisions that have to be made on the data
and that's so different than like hey we frame we frame this in, we use the exact spec screws
and wood and drywall, et cetera.
The big takeaway there is if the CEO says, I want this report to be 100% accurate, you
now have license to say, nobody wants that.
It tastes gross. Yeah, exactly.
So the slightly different take on the water analogy is also like data in most scenarios
is an internal facing role.
Like your job is to give the CEO or the CRO or the CMO
information to do their job.
So let's say you have an office building.
It's like if you know that 90%
of the people in the office building all work on the second floor not the first floor you need
more faucets you need more toilets you need and you need to have them dispersed in the floor
in a way that actually makes it so those people don't have to walk down two flights of stairs
right across all the back because now
you're just wasting everyone's time and you're not doing your job so it's a cool job as a data person
is to not put a thousand like faucets in the corner like you have to be all next to each other
it's like what's the right number of faucets yeah um given the is it a faucet a dashboard in this analogy okay
but it's like that's a really interesting point because it's one of
those things where if you have too many you just you spent way too much money
way too much time sharing those stuff it's not being used if you get companies
like now they're like everything gets delayed and it's like so like I think a
lot of data teams think about like the
oh is it a shiny faucet is it not a shiny faucet what's the parts per million when in reality they
need to take a step back and be like what are the people doing on this floor yeah yeah where are the
desks where are the like meeting rooms like when do people have lunch like no that's like we need
a faucet in the kitchen and it's like i think it's difficult for data people in a lot of companies
to watch everyone for a little bit of time to figure out which way.
Awesome.
Observed observation is probably the number one thing I could tell a new data
person coming into a company, observe everything.
Hmm. Yup. Yup. Then to john's point too one more i
love the plumbing analogy so we'll like we're i love this one but this is a medium this is like
a juicy need post you know 1500 words on data data i've thought of this post like it's somebody
had somewhere but there's the other thing with the tooling that you mentioned.
Again, in like plumbing world, you've got point of use,
which is like I'm going to do an under sink filter or like a refrigerator filter.
You've got whole house, like you can do your whole house.
You've got the municipal level where you're doing like a whole city.
And data, I mean, data can really be similar where you have this big centralized team.
They're doing everything together.
But maybe that's right. But sometimes it's like, hey, we just need a little point of use, like refrigerator filter, like right here. And it services, you know, five people. And that's
great. And we would have way overbuilt if we made the change up the municipal.
In the next episode, we're gonna, we're going to, we're going to complete this analogy
by figuring out the data corollary to the guy who wheels in those big tanks on a hand truck and puts the water thing on and it's not connected to anything.
My favorite part of this whole analogy, and I guess we can get off the water analogy, or we can stay on it.
I hate self-service analytics.
I think most companies go way overboard and build things that everyone can do.
That is a very controversial take.
But from a water perspective, so taking the analogy one step further, I think what you should do if you're building plumbing for an office, for a floor in an office building, give people a faucet.
Let them fill up their glass and bring it back to their table.
You built the faucet.
They can use it. they can go drink the
water at their table. Or buy a bevy
machine. Let them pick their six
different types of juice, and
they can walk it themselves to their desk.
Self-service analytics is
like putting a miniature faucet on
everyone's desk. I think that's stupid.
You don't know how many people are going to be there.
You don't want to have too many faucets. You don't want to have too many faucets.
You don't have too few faucets.
Pay a little bit more
for 12 packs of Diet Coke
that sit in the fridge
that the people deal with at themselves.
Don't try to put a soda dispenser
on everyone's desk.
We need cooks in the kitchen.
What's worse is that
the faucet at everyone's desk has
19 knobs that change
things and you know they don't really know how it works but and everybody's making decision about
the temperature of the water and they're right doing different temperatures i love it i love it
okay we have time for one more question here i feel like we could keep going for hours we have
time for one more question okay i'm gonna break going to break the analogy. You know, so I'm so sorry. But one of the things, like,
if you think about like the, you know, the pipes and the actual hardware that are like interfaces
for water, in some ways, those are much more similar to software in that they need to reliably handle scale, they need to be robust, and they operate largely the same way every time in an ideal world.
And data is pretty different than that in that the products, and Ethan, what made me think about this was you were saying, okay, we were starting
over two months ago because we got 80% to a question, right? And so the product actually
can look very different at different phases, right? Now, okay, maybe you have like a KPI dashboard
that is durable and there are some really good things there. But I'd love really quickly to get
actually all three of your takes on this. So going for the triple threat here on the differences between the definition of product
in software and in data.
So John, why don't we start with you?
Back to our discussion, I think, of like, you've got the team size thing.
And I think it's what I would say is similar to what Ethan said.
If it's a very small team, where you've got one engineer,
if you have one engineer and you have one data person,
probably at different companies,
because they don't typically scale one-to-one.
Say different companies, one engineer versus one data person.
Similar.
You have to have some sort of at least slight product ability
as a software engineer or a data person.
Some sort of slight design ability
at least enough to communicate
out to people. And those are
more of your startup
small company unicorn
type people.
I think it's similar. It just depends on the size.
Ethan?
There are a few more
but I think there are three main ways people are building products with data.
Number one is dashboards.
It's a way, and dashboards are really just helping execs make decisions.
There's two different ways to think about a dashboard.
One of them is a pipe that's continuously flowing water.
It's not going anywhere.
It's going to be there for the next years.
And then the other one is just a one-off answer to a question.
But like I put those into the dashboard.
An insight bucket, that's the read-only use cases.
That's the just like, hey, I'm going to get you the insights
so we can make the best decisions possible.
Yep, yep.
The second one is workflow automation.
This is, you have a manual task
where we have to move data from point A to point B.
And right now it takes 10 hours a week.
As a data team, you can use an iPass tool.
You could use RutterStack.
You could use anyone to take data from point A, transform it, put it into point B.
And the goal of that is remove manual tasks and productionize things.
The third one, which I've actually been seeing more of recently,
I find it fascinating, is marketing.
I'm seeing more in-house data people either change their roles
or being hired into companies that have unique data assets
and use them to create public-facing insights
about their own data internally to create benchmarks.
Like Carta is doing a great job of this.
This guy, Peter Walker over at Carta, able to look across every startup's fundraising,
et cetera, and he's using it to create insights that then drive people to Carta.
And Matt Schulman over there, the CEO, they do compensation and benefits for companies
around the world.
And they have a unique data asset of benchmark salaries, benchmark benefits.
And similarly, their data team is not building internal insights for strategic making.
They're not automating workflows.
They are creating insights that show the world that they have either data products that sell.
Internal versus external, yeah.
It's a real external product.
But those three use cases,
I would stoke most data teams down to that.
And if you start bleeding into most other stuff,
you're either, not in a bad way,
but like you're either a data team
and a software development team
or you're doing something else.
Or you're a marketing team
that has really smart data people in it. And none of these are a problem. Just think about how you want doing something else. Or you're a marketing team that has really smart data people in it.
And none of these are a problem. Just think about how you want your
company structure. But I think
people with data skills, it's those
three use cases.
One quick comment on that that I think is
a really
important common thread is that
each of those use cases have very
well-defined consumers.
The middle one, you probably have multiple consumers
because you're getting data into a ton of different systems, right?
Product, customer success, whatever, right?
Finance.
You're getting data into a system.
But each of those that you called out,
it's crystal clear that there's a consumer on the other end,
even for the external.
So that's great. Really like that. All right, John, you get the final word. and what the companies are building, there's a couple of paths that I would have taken or that I have taken.
Definitely running things as a product, understanding that business component of what is expected
of the data team.
That is primary.
You got to know coming in what the expectations are.
I've seen it work where somebody comes in and they really don't even know what they're
supposed to be doing because the business just says, we need somebody to do data. Um, it's very rare. That person is
successful in that role, unless they take that, that approach of, Hey, yes, I can do this,
but I'm also going to be aware of what the company needs. Um, now there's also something
we haven't talked about, which is capitalization, right? Like if you truly want to be somebody as a data leader coming into a new
company, you are immediately a cost center immediately, right?
So you have a target on your back of when cuts need to be made,
you're the first one to go. Yep.
Or your team is going to be primarily on that cut list and they're going to
shrink it down.
Now the way you target and you combat that is build one thing for a customer within the product.
Start one thing and then all of a sudden your reward becomes capitalized.
Yep.
And what I mean by that for people watching that don't understand the business aspect of this
is the government will give money to your company as a kickback part of your salary and the work and the tools you build as as as kind of like a incentive to to build more stuff.
Right. They incentivize that if you are building only for internal purposes, you are 100 percent not capitalizable.
So start looking for ways to get in front of customers, get your dashboards in front of customers, get your reports, right?
Build what I call exception reporting so that you can create these things within the product.
Instead of somebody having to look through a thousand things, they know the five things they're looking for, serve those up.
All of a sudden now you could conceivably become an analyzable asset, right?
Changing your mindset from purely technical
to leading into product. You really have to do the, why am I building this? Not just,
I need to build this, right? And being able to say no is a very important part of that
prioritization component. Yep. I love it. Alrighty. Well, we are at the buzzer, as we like to say. That was such a great conversation.
And there were many things that we did not talk about.
So we'd love to have you all back again soon.
We need to talk about integrations.
We need to talk about your analytics framework, John.
So we'll have Brooks find another time for us to have you back on in the next couple weeks.
Love it. Amazing. Really enjoyed the chat. Thanks, guys.
The Data Stack Show
is brought to you by Rudderstack,
the warehouse-native customer data platform.
Rudderstack is purpose-built
to help data teams turn customer data into
competitive advantage. Learn more
at Rudderstack.com. Thank you.