The Data Stack Show - 97: How To Build an Organization-Empowering Data Team with Emilie Schario of Amplify Partners
Episode Date: July 27, 2022Highlights from this week’s conversation include:Emilie’s background and career journey (3:00)Hypergrowth at GitLab (5:23)Being close to the money in data (9:50)Big things taken from GitLab to Net...lify (13:00)Defining “data organization” (17:53)The first roles you should hire for (22:06)Defining “analytics engineer” (23:44)One role to bridge different needs (27:26)Why data analysts are needed (30:51)How to avoid a kitchen sink of data (40:20)Data engineer archetype (45:48)Data roles crossing over (48:09)The Data Stack Show is a weekly podcast powered by RudderStack, the CDP for developers. Each week we’ll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data.RudderStack helps businesses make the most out of their customer data while ensuring data privacy and security. To learn more about RudderStack visit rudderstack.com.
Transcript
Discussion (0)
Welcome to the Data Stack Show.
Each week we explore the world of data by talking to the people shaping its future.
You'll learn about new data technology and trends and how data teams and processes are run at top companies.
The Data Stack Show is brought to you by Rudderstack, the CDP for developers.
You can learn more at rudderstack.com.
Hey Data Stack Show listeners, Brooks here.
Usually, I'm behind the scenes keeping things rolling for the show, but today I'm coming out
of hiding to share some exciting news. We have another live show coming up, and we want you to
join us for the recording. This time, we're bringing back Tristan from Continual and Willem
from Tekton to talk about the future
of machine learning. We'll record the show on August 10th at 2 o'clock Eastern, 11 o'clock
Pacific. So mark your calendars and visit datastackshow.com slash live to register today.
Welcome back to the Data Stack Show. Costas, we had Paige from Netlify on the show a while back.
And both you and I walked away from the show sort of enamored with the way that the data team seemed to operate, how efficient it was, their structure.
There were just a number of things where we sort of said, man, that feels like best in class. And today we get to talk with Emily, who really helped architect that team.
And my burning question is, she was at GitLab before Netlify, hyper growth phase.
You know, they added a thousand employees in a little over two years.
And she was on the data team and then, you and then sort of went on to do a number of things
beyond that.
But my sense, my hypothesis
is that she actually took a lot of those lessons
from the experience at GitLab
and that sort of exponential growth
and exponential need for data.
And that really informed a lot of how she built
what is now the preeminent Netlify data team. So I would love the backstory on that. And that's what I'm going to ask. How
about you? Yeah, I'd love to chat with her about the role of the data inside the company and try
to understand a little bit more about the people who work under this organization, like data engineers, data analysts, and actually figure
out like, who are all these people that should be members of this organization?
And I think we have the right person to have this exact conversation today.
So let's go and do it.
Let's do it.
Emily, welcome to the Data Sack Show.
We have been so excited to have you on the show, and we have so many things to talk about.
But let's start where we always start.
Give us a history of your work in data,
and more if you would like, and what you're doing today.
Thanks so much for having me, gentlemen.
My name is Emily Sherio.
I'm currently a data strategist in residence
at Amplify Partners, which is an early-stage VC fund focused on dev tools, data tools, and cloud infrastructure.
I was previously director of data at Netlify.
Before that, I was the first data analyst at GitLab, joined the company at 280 people, watched the company grow to over 1,300 in less than two and a half years, and spent my last
year there as interim chief of staff to the CEO while we hired our chief of staff. And have had
a couple of other data jobs along the way, but have spent most of my career in the modern data
stack. I also helped admin a data community or a community of
data practitioners called Locally Optimistic, which is a wonderful place for people to just
talk about data problems and how they think about them and what tooling looks like and all of that
goodness that goes on when you have a mindshare. So that's me. I live in Columbus, Georgia,
which is a couple hours outside of Atlanta. Very cool. Okay. First of all, locally optimistic,
amazing. Could not recommend it enough. I'm sure a lot of our listeners are already involved,
but if you're not, absolutely check it out. Amazing community. We learned so much from it.
Let's talk about GitLab. So this has been so interesting to me. You know, being part of a hyper growth phase in a company like you were a part of, you know, where let's say you add
a thousand people in a little over two years, you know, almost every part of the organization is
having to reinvent itself, right? And what's interesting to me is that when we think about
using data to grow, it really touches every part of the organization, right? And so you were
involved in the data practice as every part of the organization was reinventing itself. And I'm sure
that the impact of that on the data team was significant. Can you just tell us what that was like? Where did it start? You know,
sort of, you know, when you your first couple months on the job, what did the data team look
like? And then I'd love to know, what were sort of the big milestones along the way in that sort
of hyper growth period? So when I joined the GitLab data team, we were three.
So Taylor Murphy was our manager, I think was his title,
but he led the team and he reported directly to the CFO.
And then Thomas Lapiana was our data engineer
and I was our data analyst.
And today we would call what I was doing
analytics engineering,
but 2018 was both not that long ago and also very, very long time ago in data timelines.
And so I mentioned that we reported into the CFO because that meant that our priorities were
financial. When the asks came to the data team, we were focused on finances, on FP&A and serving
that part of the organization. I was going to ask about that because that caught me off guard
right out of the gate, right? Because you don't typically see that. I don't know that I would say
you don't typically see it. I think we see a couple of places where data teams start.
Excuse me.
So we see data teams generally start under finance if they're enterprise-led, enterprise sales-led businesses.
Yep.
Under growth if they're a product-led growth business.
Under product if they're still trying to figure that out, right?
But that's kind of the three buckets I would put data works in, like, in terms of origin, right?
And then the question becomes, all right, this is where the data team started.
How do we serve the whole company?
There are different pressures at play in each of those
contexts. And so data team reporting into the finance org, finance, for those who don't know
a ton about how you manage a profit and loss statement, finance generally falls under GNA,
right? General and administrative. And the two other categories you care about there are sales and
marketing and R&D, research and development. And as a crude rule of thumb, product and engineering
is going to go under R&D and sales and marketing is sales and marketing. And things like HR and
the other things fall under GNA. And there is a big pressure for GNA to always be a tiny portion of your expenses.
So if your data team is underfinanced, your data team is probably going to find themselves
underfunded. There are ways to mitigate this. For example, one model that I have found to be
very successful is what I call centrally reporting but locally prioritized. So your data team should
report into your data org. But if Sally is going to spend all her time working on marketing,
then her headcount should be funded by the marketing department. And if Adam is going to
spend all of his time working on product or growth or whatever the department is,
their headcount should be funded by that respective department. Oh, interesting. Okay.
And that allows, you know, we shouldn't think of like the team where the data org reports up into
should not drive their cost model because we don't want to handicap our team before it started. We should really think about
how do we serve the business and then let's make sure we have the people and let the headcount
budgeting allocation for that support our business goals.
Yep. Super interesting. It's the classic, if your team is a cost line item, watch out because when things get tight, they're
going to be coming for you.
Well, okay.
That is so helpful.
Let me ask a question, zoom out a little bit.
So there's an old adage, follow the money, right?
If you're close to the money, you're close to the problems and sort of the most important
things, do you think that that is true and sort of beneficial, you know, sort of
regardless of the headcount, you know, and sort of cost center considerations? How did being really
close to the money influence the way that you thought about data, you know, for functions
outside of finance, right? Sort of sales, marketing, engineering, even. Did that influence
the way that you thought about that? Sort of having a financial lens on the work that you did?
I think having the CFO's priorities drive our work is what really stuck out to me. So finance
was definitely a piece of it. And in fact, I think, you know, it's like I said, 2018 is both not that long ago and a very long time ago.
A short eternity.
Definitely.
But it was ARR and MRR reporting and things like revenue retention and customer retention were a huge part of our early projects back then. I remember racking my brain and like, what are the edge cases of
retention? And then at GitLab, not only do you have to think about it on a customer level,
but customers roll up, right? And so if you sell to a company and then they have a parent company,
how do you think about how many customers you have? So there was a lot of complexity there.
And I think having those products as the foundation on which everything else we worked on came from
really drove how I understood, I personally understood the company. More than that, though,
I think this CFO's priorities driving the data team's work then meant that other parts of the organization felt underserved by data.
And the consequence of that was that we saw miscellaneous data hires pop up in other parts of the org. Interesting. And so that's where this emphasis on separating headcount from reporting structure. If you're hiring data people, they should get a manager who understands their job and can help them with their career development and can help shape their work independent of where their priorities are coming from. Yep. Yeah, that's super interesting.
I mean, one thing that you said that sticks out to me
is that, you know, just getting aligned on definitions
and like having really sharp definitions is hard,
but it sounds like that was a really big emphasis.
But part of sort of the hidden cost of that
in the way that things were structured
was that even though you had a really tight definition, you weren't necessarily, like the team wasn't
structured in a way that that tight definition actually served other teams and sort of provided
value to them by making things simpler or providing data. Fascinating. Okay. So let's,
because I know Costas has a ton of questions and I could keep going all day,
would love to hear about maybe one or two more big things that you learned at GitLab.
And then the main lessons that you took from GitLab to Netlify, you know, because we've had Netlify on the show.
Like, I mean, amazing team, you know, sort of preeminent in the data space as operating, you know, so well and being a model.
And you were, you know, in many ways, an architect behind
that. So I'd love to know, what were the big things that you took from GitLab to Netlify?
Shout out to Paige, who spoke on the show. Paige is just absolutely incredible.
I think one that I realized was I saw a lot of the consequences, pros and cons of having data people spread out throughout the organization.
So one thing that I really brought to me when I joined Netlify was this centrally reporting locally prioritized. And so as the team grew, we were having other executives allocate their head
count to the data team, and we would have them fund them. But very much from a, okay, but you're
going to establish a business partnership with that executive, and 80% of your time is allocated
towards them, 10% to professional development,
and 10% to technical debt
and helping maintain our data infrastructure.
And that was a rule of thumb
that overall averaged out to be true.
We found that to work really well.
Can I ask us an org chart question here?
I'm just thinking about our listeners who are managers
and who are dealing with some of these things.
Because there are sort of two ways to do that, right? You have,
you know, your actual boss is the someone in the data org, right? The leader of the data org.
And then you have a dotted line to the functional area. Yeah. For the inverse, right? Where you actually operate under the functional area and you sort of,
and I, you know, that gets tricky with evaluations and a number of things.
Paige, you know, helped us break some of that down. What do you think the best way to do that is?
I do think you should report into the data org. So that should be your,
your direct line with a dotted line to the rest of the business or to whomever your business
partner is. The thing that startup people hate is like
matrix org might as well be a curse word, right? Like if you say that too loud, the startup people
come after you with their pitchforks and they're like, no, we're functional. And I'm like, data is
cross-functional. Try again. And so like the only way a data org works really well is if it's a matrix
org. And so the only way a data org can truly be effective is if we're matrix orgs. Because data,
where it moves the needle for the company, isn't when you've just got product data talking to
product data or when you've just got sales data talking to sales data, right?
It's when these things come together,
when you're looking at data that originates from Salesforce
and you're enriching it with product analytics
to understand what drives conversion
or what are predictors of upsell.
Like those are the things where the data team
really moves the needle for the
business. And so if we think we're going to have functional data organizations, we're going to have
12 data teams within the company. And so I frame it in this centrally reporting locally
prioritized way because data or startup people are allergic to matrix organizations.
But the high output management, the good old great Andy Grove talks about how matrix orgs
are the way to go. And so that is what it is. Yeah, I love it. Okay, one more question then,
Costas, I'm handing the mic to you for a long time. And this is just sort of a selfish question.
At GitLab, who is the first team to make a rogue data hire?
Oh, you're asking me to throw someone under the bus.
Oh, I am. That's so... Okay. Sorry. You don't have to answer that. You're totally right.
I just had a suspicion that it was marketing because I work in marketing.
And that seems like the kind of thing you would do.
Is that what you're saying?
Yes.
I'm not throwing someone under the bus.
I'm trying to self-incriminate you.
I mean, he did that.
He did that in other sites as well.
I've done this before.
You don't have to answer the question.
Here is what I will say.
Very rarely do rogue data people show up with data titles, right? So they might have like operations analyst or some variation of buzzwords that don't mean anything.
Love it.
A visualization engineer to really move the needle.
My question is answered
and my guilt is laid out for all to see.
So thank you so much.
Okay, Costas.
Oh, thank you, Eric.
Thank you.
So, all right.
You're talking, the two of you,
for quite a while now about DataWorks, right?
And I think it's a great opportunity
to start providing some definitions.
So let's start with what a data organization is.
What's the mission of a data organization in a company?
Good question.
I'd say it's on every data leader who comes into an org to make sure your team has a mission.
Our mission at Netlify was pretty straightforward.
And I reference it a lot when I think about my work today,
because, well, one, I wrote it. But two, I think it's a really good example of how having this
thing to look towards and drive kind of as a, it's not a North Star metric, but it is like the
light at the end of the tunnel to point people to.
So our data team mission at Netlify was that the data team exists to empower the entire organization
to make the best decisions possible by providing accurate, timely, and useful insights.
So it's really about making the best decisions possible.
In terms of what is a data org, that's one of those
touchy-feely, I know what it is when I see it sort of thing. But I tend to think of it as
everyone in your company should be a data person. I asked someone once, what is your first data hire
when you're starting a company? And the pushback I got was, when you're starting a company. And the pushback I got was when you're starting
a company, all of your first hires have to be data people because it doesn't matter if they're
in marketing or sales or product. They just have to have this sort of data drivenness about them
if that's the way you're building a company. And so I think about that a lot because all of
your people have to be data people throughout the company. It doesn't matter what their job title is. Some fuzzy combination of operations analyst is fine.
But beyond that, the data team are the people who are managing kind of your data stack and
infrastructure and whose goals are to use those tools specifically to help drive the best
decision making in the company. And so when we think about data teams, some people like to think
of them or frame them as supportive functions. The data team doesn't always roll out the next
marketing campaign, but they make sure marketing has the information they need to roll out the best campaign they can.
And so it's a bit of a squishy answer.
I don't know that there's one that I could give you that would be better
other than I know it when I see it.
Alex Meadellas- No, makes total sense.
Okay.
And then what would be your definition of, let's say, a
minimum viable data organization?
I think the easiest way to get started is probably a data warehouse,
some off-the-shelf ETL tools,
some reverse ETL or data activation,
and some easy way to access that data using SQL,
whether it's a notebook or a BI tool or whatever it might be, or just like a way to download CSV
so you can put them in a spreadsheet where other teams can access it, where the rest of the company can access it.
And that's why when I run through that list, I mentioned specifically data activation or reverse ETL or operational analytics, whatever people are calling it these days, but the general idea of let's take data that only exists in some
systems and put them in other systems. Let's democratize and make this data accessible to
other folks in our company. I think that is the most low-hanging or high ROI work that a data
team can tackle early on is really give people access to the data
that they need to do their work well.
Yeah, makes total sense.
And in terms of roles,
like let's say a company considers
of like starting the data org, right?
Like what should be the first roles
hired to build this organization?
Where we should start from?
In 2022, if you're getting started
with like a snowflake data warehouse,
then you can get started with an analytics engineer
who's going to manage your full infrastructure
pretty easily.
You don't need a DBA.
You don't need, you know, a lot of custom data engineering.
In a world where engineering is such a DBA, you don't need a lot of custom data engineering. In a world where engineering is
such a precious commodity, right? You talk to any engineering leader and they're like,
hiring right now is so hard because it is. It's because there's much more demand for
engineering time than there is engineering time. And if you're making that calculus,
it almost always makes sense to buy versus build.
And so when we look at one of the big advantages
of the modern data stack,
it's that you can go buy so many of the pieces
and have everything up and running in an afternoon.
Mm-hmm. Yep.
And, okay.
My next question,
because you used, like,
the term analytics engineer, right?
And, again,
we will stick with definitions
because I think it's important.
It really helps, like,
people to understand
because, you know, like,
we could be using terms,
but we don't spend, I think,
like, enough time making
the semantics around this term for like well communicated, right?
And that's important, especially like for people who are out there considering what
the next step of their career should be, right?
So, okay.
Analytics engineer.
What does this mean?
What is an analytics engineer?
Good question.
So I think of data team roles as falling into four
buckets and i call these the core four roles because if you name it it makes it marketing
or something you could ask eric later and right so core four roles happy to. Data engineer. Data engineer moves data from outside of your ecosystem in.
Analytics engineer works with data inside of your ecosystem.
Data analyst focuses on surfacing insights to the business.
Machine learning engineer builds and productionizes machine learning models.
There is some some wishy squishy soft gray boundaries here everyone needs to be able to push insights to the business
everyone needs to be willing to do whatever the and solve the problem that's in front of them. And that's okay. That's called working.
The general idea is if the bulk of your time is being spent in one of these categories,
your job title should be reflective of that.
And you'll notice not included is data scientist, right?
Because if you ask 10 people, what is a data scientist, even if they all have that title,
they will give you 10 different answers. I know because I've tried. And so I think that when job
titles don't mean anything, we should get rid of them, right? Like the language we use is so
important. And so data engineers move data
from outside of your ecosystem in.
Analytics engineers work with data
within your ecosystem.
Data analysts focusing on surfacing insights
to the business.
Machine learning engineers focus on building
and productionizing machine learning models.
Mm-hmm.
I love that.
That's very, very clear and precise.
Thank you so much for that core four roles.
I think, Eric, you should do something with that.
A lot of marketing can happen on top of this.
So, all right, that's great.
So we started with analytics engineer, right?
And then we have...
Actually, that's interesting because I would expect to hear from you
that started with data analyst, to be honest,
because I think that's also probably the most common thing
that probably companies do, especially if they are not, let's say,
I'll say that, very engineering-driven companies, right?
Yeah.
We also, I mean, one of the mistakes that many times we do is that we consider like every company out there is like a tech startup in Silicon Valley, right?
Like we have way too many engineers to influence how we do things and how we think.
And that's not actually the reality out there, right? So let's say you have a typical company that at some point
wants to start leveraging data that they have.
And I would think that they will start with data analysis, right?
But you said, no, you shouldn't do that.
It's better to start with an analytics engineer.
And my question is, is this because, let's say,
when you start with an analytics engineer, you can have like a little bit of a data engineering together with some, let's say, capacity to do the actual analytics.
And you can have, let's say, one role that can bridge all the different needs that you need at that point.
Yeah, I'm going to answer your question with a sad story, right?
So what happens when a company hires a data analyst first, right?
Someone who's, there's no tech stack,
there's no data infrastructure.
They are just going to like pull some spreadsheets from places
and combine them and do some like Google She or Excel wizardry, right?
The business loves it. People have numbers. How
exciting. There's no automation underlying it though. So every Monday they have to rerun the
executive report for the Tuesday meeting. And then there's another report that they build and it's
really exciting and it's a monster spreadsheet. They've got historical revenue information for all time that needs to go into the sales
VP meeting.
So now every Wednesday, they spend the whole day rerunning the sales VP revenue meeting
spreadsheet for the Thursday meeting.
And then Friday comes around and they realize they spent half of their week rerunning spreadsheets
and they didn't get anything done.
And this becomes a world where you have to continue
to throw data people at the problem because there's no automation, there's no systems,
there's nothing that lets that data person scale. And so over and over, what we actually see
is that when companies do this, two things happen. One, they get very frustrated with data and go back to the beginning.
Or those people develop the technical skills
to bring more engineering practices into their organization.
An example I'd point you to is Claire Carroll.
Claire Carroll is a product manager at Hex.
She was previously the DBT community manager.
And she'd tell you her career story is that she
was an excel person who stepped into a data analyst role at a company where there wasn't a
ton of engineering support for her and she learned things like git and the command line and dbt and
sql and all of that over time as she grew her career. And the end result is that
her influence in the company grew, right? But it's unfair for us to say like the only way for our data
analysts to be successful is if we force them to acquire more engineering skills, right? There is
a fundamentally different skill set in surfacing insights and being an analytics engineer, which is what Claire would tell you her career journey was.
And so I think we set people up, set our data orgs and our data people up for failure if we don't write higher the role.
And so to answer your question, I think that hiring an analytics engineer early on is in a lot of ways the best of both worlds.
You get a little bit of that more complex engineering skill set when that's the solution you need.
But you also get someone who's very comfortable working with your data, communicating with stakeholders, and is expected to also be able to surface insights to the business.
Okay.
And then why do we need data analysts?
Or when we start needing data analysts?
If like we can, let's say we start with analytic engineers, then we get data engineers at some point to make sure that like we automate the whole like in and out of data. So when the data analyst becomes like a need for the team, for the org, for the data org?
When you're building your data team from scratch, there are two models that I've seen be particularly successful.
One is you want to take a divide and conquer approach early.
So you want to service, let's say, four different functions in the business or five different functions in the business.
And so in that case, you take the approach of hiring an analytics engineer who's going to be a business partner to each of those.
And they're going to build out the core modeling and they're going to be responsible for the insights, right? So if we were to think of, if we think of a spectrum where we have a zero to one,
this is a little bit hard for people to listen
and visualize here, but hopefully they'll indulge me.
If you think of a zero to one line
and we think of our data infrastructure
as having three parts,
where zero to 0.33 is moving data in.
We think of working with data in our ecosystem as 0.33 to 0.66.
And then we think of business insights as 0.66 to one, right?
An analytics engineer's mandate is not just that middle section.
It's the right two thirds.
And so we can focus on hiring them to do kind of the full stack eness of it.
Or you focus on a particular part of the business.
You have an analytics engineer focus on just that core modeling and you bring in a data
analyst who's now going to really focus on insights.
A lot of the like which approach is best is specific to your business. How well do people
already understand the data? What already exists? What numbers are people used to looking to?
Are you being driven by a particular change agent in your organization that's going to drive your
priority? And so it's a little hard to come in and say like, this is the right way to do it, right?
There is no one right way.
It's a lot of the context of your organization, but understanding the
trade-offs of each, I think is a great way to understand and make the
decisions specific for your organization.
Okay.
Okay.
That, that helped a lot to understand.
And have you ever seen, or you have experience of like what the result is of starting a data org and starting with data engineering?
No, I don't think I have.
I'm thinking about this, but I'd like to think I have swayed enough people to avoid that scenario,
but I don't know.
I,
I don't,
I don't know though.
I know companies that have some weird title ledge going on that makes it hard
to like really tell who they hired.
Right.
If you hire a BI engineer,
what does that mean?
If you hired a look ML developer,
what does that mean? If you hired a LookML developer, what does that mean?
That is part of why I think it's so important that we centralize on these core four roles, is that people should be able to see data engineer and have a good understanding of what the skill set being asked of them is. Okay, the reason I asked that is because actually both Eric and I
have an experience with that, and that's actually RatherStack.
Now, we're starting to share a little bit of embarrassing side information, Eric,
but I think that's fine.
I love it. I see jigs on the DataStack show.
Yeah, but okay, RatherStack, I mean, started as a company
because RouterStack itself is like, okay, it's a platform pipeline, right?
So it's mainly like people who work there that are like systems engineers
and data engineers.
So we had data engineers.
And when we had to start creating some kind of infrastructure
to collect some data, we started with the data engineers.
And the result is...
Wait, can I guess?
Of course.
You had a lot of data, but not a lot of insights.
Oh, yes.
But I'll give a little bit of even more embarrassing information.
And I would say that you end up with a snowflake instance that has a database that is named EricDB.
It has come out publicly.
Now I know why Eric hired his rogue data person.
That's exactly right. But, but isn't that exactly it?
If you don't, if you don't empower people with the data they need, they're going to
do whatever it takes for them to get it if they're good at their job.
Right.
And so, so that is exactly the problem.
But I, I think, you know, what I've seen, you know, engineers love to engineer.
That's why they're engineers, right?
And so they nerd out about CI and linters and all that kind of stuff.
And don't get me wrong.
I, too, was an engineer once upon a time, a mediocre one, but an engineer nonetheless, right?
And we cannot, I think one of my jobs as a manager has always been to help coach my team members.
I'm like, I do not care.
I mean, I care.
Don't get me wrong.
I'm not here for tech debt, but I don't care that much about how cool your engineering infrastructure is.
I care about the impact we're driving to the organization.
And you cannot lose sight of that no matter what you're doing.
Yeah. And I would say, you know, it's interesting, Kostas, like reflecting on that
good old Eric D.B. Like the funny thing is when that happened, it just it wasn't a huge deal,
right? Like you're trying to this. And I think this is kind of what you're talking about. How do you keep sight of the longer term goal, Emily? And when we were like building those
analytics use cases, there were just a couple of questions we needed to answer, right? Like we just
need to answer a couple of questions here, right? And that seems so innocuous, right? And then you
don't realize that on sort of the backside of things, like,
well, you're creating a lot of future tech debt, you know, which can be dealt with,
but there's always a cost to that, right? Like, okay, well, now you have to choose between
insights and tech debt, right? And the more you choose insights, the more the tech debt grows,
and, you know, you sort of eventually have to pay the piper and so it's a very it's a very slippery slope right and it's very easy to do that early on
in a company especially if you sort of take the like well there's an engineering solution to every
problem and we just need to answer a couple questions cost of some this has been cathartic
i mean i'm i'm admitting all sorts of sort of data crimes publicly, which is freeing.
So thank you.
Yeah, this is actually a therapy session for you.
That's why we are doing it.
The data therapy show coming soon.
Yes, that's going to be called Eric DB.
Yeah.
Well, that probably says something about how territorial you are when it comes to data. So, I don't know. There are deeper conversations that need to happen offline.
It's getting deep.
In retrospect, Eric, it sounds like you just needed a better code name. Like, rather than naming it Eric, you should have named it like some Disney movie you watched recently and throw
people I didn't I didn't name it an engineer named I worked at a place where our replica was called
Jakku which is a Star Wars reference for anyone who didn't understand but at the time I did not
understand I had never seen Star Wars yet. I have today.
Things have been redeemed.
But I remember like, Jakku, this is such a weird name.
Why would anyone come up with this?
And they were like, wow, you have so much to learn about being an engineer.
That's so true.
And just about something like the conversation that we had is that one of the most common crimes that engineers do is overengineering.
And that's like extremely, becomes extremely obvious, like in an early stage startup or when you start something from scratch.
And that's exactly like what happened at Rutherstack, right?
Like at the end, we had way too much data.
Like it, it was like extremely hard to separate like noise from signal there.
Because we just asked from the engineers, okay, guys, we need data.
Oh, sure.
Wait and you'll see.
I mean, we delivered, we gave you all the data that you will ever need.
And I think that's, that I think that's quite important.
And the lesson that I've learned is kind of the hard way
by transitioning from being like a software engineer myself
to getting other roles.
But over-engineering can be like a really hard,
let's say, thing to deal with.
Because more engineers does not mean a better solution.
Yeah.
But I think the other thing is,
and Emily would love your thoughts on this,
because I think one of the things
that makes that difficult, Kostas,
is that when we were going through that whole cycle,
we were a really early stage company,
you know, so scrappy.
And so many times in that
phase, like your engineers are sort of your de facto data engineers, analytics engineers are
sort of everything. Right. And so you get this dynamic of you ask for something simple and,
you know, the kitchen sink is sort of thrown at it. And that's really challenging, right? Because
you only have so many hours in the day and you're
still trying to figure out product market fit and all this sort of stuff, right?
So Emily, we'd love your thoughts on that. How do you mitigate that? Because I'm sure we have
listeners who are in that environment where, I mean, look at RutterSack, things are great now.
We have such a great analytics setup, but we certainly over-engineered things early on.
We're now
using some of that data. So in many ways, it's like, oh, I'm actually glad we did that. We
probably should have done it a little bit differently, with a little bit more calculation.
But I would think that's really common, right? That happens all the time where to an engineer
who's building a product, they throw engineering as sort of the hammer and everything's a nail. And so you throw that at data and you end up with sort of over-engineered
things. But Emily, would love your thoughts on that. Yeah, I think I have seen it all the time.
And one, another way this manifests sometimes is engineers love their tooling. And so suddenly
they're using Prometheus for their business metrics, right? And I mean, it happens. And so suddenly they're using Prometheus for their business metrics, right?
And I mean, it happens.
And so people know the tooling they know.
And it's unfair as a data practitioner to assume that like everyone is going to know the best practices of the modern data stack. There's this great blog post by Vicky Boykus called, you don't need Kafka that
specifically picks on WeWork a little bit, which I am a big fan of doing as, as a
big fan of following the WeWork.
Vicki Boykus- We did a little bit in the show prep, which was super
incredible.
Yeah.
And so I think about it as like, we, there's going to always be this natural, cool engineering tendency to want to over-engineer.
And the thing that is going to pull us back on that is just this, what is the thing that drives the biggest impact?
What is the simplest solution that drives the biggest impact? What is the simplest solution that drives the biggest impact?
And something I use as my own anchor often
was something I learned from the GitLab CEO.
We had some problem in front of me
and I was talking to him about it.
I was like, I'm not really sure what to do next.
And he said to me,
what can you do that moves the needle that you can ship
in the next hour? Not in the next day, not in the next week, in the next hour. And so that forces
you to really think like, what is the smallest change I can make that makes a difference to
this problem? And I come back to that as like, what can I ship in the next hour?
And I would ask my, the Netlify team will tell you, I would ask them that all the time. Yeah,
what's the one hour version of this? They're like, one hour, that's never enough time. I'm like,
yeah, but what is the one hour version? And that helps you scope to focus on impact.
So that's part of it. The other is, as you grow a company, part of what you're doing as you're
hiring is just like filling out gaps in your org skill set. And so it's okay that your engineers
got started with the engineering tools that they had, and they gave Eric his own database. Like,
please go run. And part of what you do when you're ready to hire a data leader is say,
data leader, you have to accept this technical debt that's already in place. And at the time,
we made a business decision around a trade-off between getting the information that we needed with the tooling and people we had in front of us versus doing it the right way and hiring a data person.
And I think that that's an okay trade-off for companies to make.
We just need to, every once in a while, look up and acknowledge that that's the trade-off we're making.
It's total sense, I would say.
By the way, Eric, is Eric the Beast still there?
Or it has been renamed?
Let's cover that in a future episode.
All right.
Yeah, that's part two of true crimes.
True crimes in data.
All right.
So, Emily, we talked about the different roles and the core four roles, as you said.
And I'd like to ask you, can you help us identify the, let's say, the important traits that each one of these
roles has, like, let's say if you could create like an archetype for a data
engineer or for an analytics engineer, like what you would look when you would
be hiring for this, for these roles.
You mentioned at some point, for example, like communication skills, when we were
talking about data analysts, for example.
So I don't know that I have an archetype for each of those, but I will tell you something that has been core to my own hiring philosophy.
So I grew up inside of a Dunkin' Donuts.
My mom has been working at Dunkin' Donuts since 1999, So 20 plus years now. And I mean, when I say I grew up inside of a Dunkin'
Donuts, I mean, the emergency contact at school was the Dunkin' Donuts down the street, because
if anything happened, someone from the store could come pick me up. And so I remember and
I have these memories of like my mom walking across the dining room and seeing a straw wrapper on the floor and picking it up and putting it in her pocket.
And I think about that.
I'm like, you know, here my mom is the manager of the store.
Like someone was going to clean the dining room at some point in the next hour.
But she saw a problem and she just kind of fixed it.
Right. And she didn't solve it with the perfect solution. It's not like she took the straw wrapper
and she put it in the trash. She just put it in her pocket for later. So the next time she was
out of trash, she dumped it. So I look for that quality when hiring. And another way to frame it
is floor sweepers. People who are going to see the mess
in front of them and clean it up. Or if they see that the trash is full, they're going to take out
the trash, right? When I'm hiring, I don't want people who are like, this is my job. This is the
boundary. This is what I do. And that's that. I want people who are going to see a problem and
fix a problem. They're driven and they're taking
initiative. They don't need the mandates issued to them. And I care much more about that than I do
about specific technologies you've worked with or companies you've worked at or your education
background. If you are a floor sweeper, then I can teach you all the rest, but I can't teach you to be the kind
of person who walks by the straw wrapper and puts it in your pocket. Okay. That's, that's some
amazing advice for hiring in general, I would say, not just like for data orgs. Okay. One last
question from my side, and then I'll give the microphone back to Eric.
All these roles we are talking about, they are pretty new, right?
I would assume that most universities out there, they probably don't even mention data
engineering or engineering or the rest of the roles that we talked about.
What are the paths that people can get, especially like younger people who
are looking right now to figure out like what to do with their careers?
If they want, let's say, to become like a data engineer or like an analytics engineer
and or like an MLOps engineer or ML engineer.
So what are the paths there?
And do they ever cross also?
This is a hard question because I don't know anyone who works in this who would tell you their college education was particularly relevant.
And some of the best folks in data that I have ever worked with never went to college at all. So there's a little bit of like, we're going to
see how things shake out and what the next decade looks like. But today, I don't look at educational
backgrounds when I'm hiring and I make sure that education isn't a prerequisite for any of the
roles that I'm involved in hiring for. They they shouldn't be, they don't necessarily move the needle.
Like if you're doing advanced statistical research, then you probably need a PhD.
But otherwise, if you're trying to calculate ARR and MRR, your education
background doesn't really matter.
Makes sense.
All right, Eric. All yours.
Okay, we are at the buzzer
unfortunately, so I have one last
question, but this may be the most important
question of the entire show.
What's your favorite kind
of donut?
All of them? I knew it was going to be
difficult because it's a difficult question
for almost everybody. I know some people know, but
it's hard to choose. It's It's definitely like a mood influence thing.
Last night, a commercial came on for, and it had, but in the commercial was like strawberry
frosted with sprinkles. And I turned to my husband and I was like, that strawberry frosted looks so
good right now. And so sometimes the strawberry frosted, sometimes it's a blueberry cake,
sometimes it's a chocolate glaze, sometimes it's a Boston cream, sometimes it's a fresh
French crawler. They're still kind of warm. The glaze is still like not set in. All of the above
is the correct answer here. Love it. Love it. Well, Emily, this has been such a fun time in
the show. We've learned so much. Thank you for sharing your time and your insights.
I know it's been helpful for us and our listeners,
and we'd love to have you back sometime.
Thanks.
I'm here and ready.
Costas, first of all,
Emily just seems like such a fun person.
I had a great time on that show laughing.
And honestly, I feel way lighter for some reason
as a person after getting it out there that Eric DB is something that exists.
But in all seriousness, I think one of my biggest takeaways was her recommendation on.B., but, you know, that's sort of exhibit A of what can happen when you sort of have an engineering first approach to data without necessarily trying to establish the underlying questions around value in the organization.
Which sounds funny to say almost, right?
Like when you say engineering first approach to data,
I mean, that sounds very natural.
And in some ways it sounds like the correct thing.
You know, but when you're thinking about how to build a team,
it may not be.
And so that's going to stick with me.
I'm going to think about that a lot this week.
Yeah, I totally agree with you.
I think we have a great opportunity to talk about, first of all, the role of the data team inside, the data organization inside the company, but also the part where customers when we as engineers, we actually over-engineer the solutions that we provide to them, right? And we can, let's say, get exposed to what it means like to over-engineer
something without having very clear business objectives there. And that's what I'm keeping
from this conversation because it's like kind of like a realization that I also made during this
conversation. And it's, I don't know, it's super, super valuable.
Outside of obviously
of all the rest of like
the conversation that we had
with here about the roles,
the organizations,
and how to position
a data organization
inside the component and grow it.
It was a very, very
insightful conversation.
I agree.
All right.
Well, thanks for listening
to today's SAC show.
Tell a friend about it if you haven't. We love new listeners and we will catch you on the next one.
We hope you enjoyed this episode of the Data Stack Show. Be sure to subscribe on your favorite
podcast app to get notified about new episodes every week. We'd also love your feedback. You
can email me, ericdodds, at eric at datastackshow.com. That's E-R-I-C at datastackshow.com.
The show is brought to you by Rudderstack, the CDP for developers.
Learn how to build a CDP on your data warehouse at rudderstack.com.