The Data Stack Show - 97: How To Build an Organization-Empowering Data Team with Emilie Schario of Amplify Partners

Episode Date: July 27, 2022

Highlights from this week’s conversation include:Emilie’s background and career journey (3:00)Hypergrowth at GitLab (5:23)Being close to the money in data (9:50)Big things taken from GitLab to Net...lify (13:00)Defining “data organization” (17:53)The first roles you should hire for (22:06)Defining “analytics engineer” (23:44)One role to bridge different needs (27:26)Why data analysts are needed (30:51)How to avoid a kitchen sink of data (40:20)Data engineer archetype (45:48)Data roles crossing over (48:09)The Data Stack Show is a weekly podcast powered by RudderStack, the CDP for developers. Each week we’ll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data.RudderStack helps businesses make the most out of their customer data while ensuring data privacy and security. To learn more about RudderStack visit rudderstack.com.

Transcript
Discussion (0)
Starting point is 00:00:00 Welcome to the Data Stack Show. Each week we explore the world of data by talking to the people shaping its future. You'll learn about new data technology and trends and how data teams and processes are run at top companies. The Data Stack Show is brought to you by Rudderstack, the CDP for developers. You can learn more at rudderstack.com. Hey Data Stack Show listeners, Brooks here. Usually, I'm behind the scenes keeping things rolling for the show, but today I'm coming out of hiding to share some exciting news. We have another live show coming up, and we want you to
Starting point is 00:00:37 join us for the recording. This time, we're bringing back Tristan from Continual and Willem from Tekton to talk about the future of machine learning. We'll record the show on August 10th at 2 o'clock Eastern, 11 o'clock Pacific. So mark your calendars and visit datastackshow.com slash live to register today. Welcome back to the Data Stack Show. Costas, we had Paige from Netlify on the show a while back. And both you and I walked away from the show sort of enamored with the way that the data team seemed to operate, how efficient it was, their structure. There were just a number of things where we sort of said, man, that feels like best in class. And today we get to talk with Emily, who really helped architect that team. And my burning question is, she was at GitLab before Netlify, hyper growth phase.
Starting point is 00:01:38 You know, they added a thousand employees in a little over two years. And she was on the data team and then, you and then sort of went on to do a number of things beyond that. But my sense, my hypothesis is that she actually took a lot of those lessons from the experience at GitLab and that sort of exponential growth and exponential need for data.
Starting point is 00:02:01 And that really informed a lot of how she built what is now the preeminent Netlify data team. So I would love the backstory on that. And that's what I'm going to ask. How about you? Yeah, I'd love to chat with her about the role of the data inside the company and try to understand a little bit more about the people who work under this organization, like data engineers, data analysts, and actually figure out like, who are all these people that should be members of this organization? And I think we have the right person to have this exact conversation today. So let's go and do it. Let's do it.
Starting point is 00:02:39 Emily, welcome to the Data Sack Show. We have been so excited to have you on the show, and we have so many things to talk about. But let's start where we always start. Give us a history of your work in data, and more if you would like, and what you're doing today. Thanks so much for having me, gentlemen. My name is Emily Sherio. I'm currently a data strategist in residence
Starting point is 00:03:02 at Amplify Partners, which is an early-stage VC fund focused on dev tools, data tools, and cloud infrastructure. I was previously director of data at Netlify. Before that, I was the first data analyst at GitLab, joined the company at 280 people, watched the company grow to over 1,300 in less than two and a half years, and spent my last year there as interim chief of staff to the CEO while we hired our chief of staff. And have had a couple of other data jobs along the way, but have spent most of my career in the modern data stack. I also helped admin a data community or a community of data practitioners called Locally Optimistic, which is a wonderful place for people to just talk about data problems and how they think about them and what tooling looks like and all of that
Starting point is 00:03:59 goodness that goes on when you have a mindshare. So that's me. I live in Columbus, Georgia, which is a couple hours outside of Atlanta. Very cool. Okay. First of all, locally optimistic, amazing. Could not recommend it enough. I'm sure a lot of our listeners are already involved, but if you're not, absolutely check it out. Amazing community. We learned so much from it. Let's talk about GitLab. So this has been so interesting to me. You know, being part of a hyper growth phase in a company like you were a part of, you know, where let's say you add a thousand people in a little over two years, you know, almost every part of the organization is having to reinvent itself, right? And what's interesting to me is that when we think about using data to grow, it really touches every part of the organization, right? And so you were
Starting point is 00:04:55 involved in the data practice as every part of the organization was reinventing itself. And I'm sure that the impact of that on the data team was significant. Can you just tell us what that was like? Where did it start? You know, sort of, you know, when you your first couple months on the job, what did the data team look like? And then I'd love to know, what were sort of the big milestones along the way in that sort of hyper growth period? So when I joined the GitLab data team, we were three. So Taylor Murphy was our manager, I think was his title, but he led the team and he reported directly to the CFO. And then Thomas Lapiana was our data engineer
Starting point is 00:05:40 and I was our data analyst. And today we would call what I was doing analytics engineering, but 2018 was both not that long ago and also very, very long time ago in data timelines. And so I mentioned that we reported into the CFO because that meant that our priorities were financial. When the asks came to the data team, we were focused on finances, on FP&A and serving that part of the organization. I was going to ask about that because that caught me off guard right out of the gate, right? Because you don't typically see that. I don't know that I would say
Starting point is 00:06:21 you don't typically see it. I think we see a couple of places where data teams start. Excuse me. So we see data teams generally start under finance if they're enterprise-led, enterprise sales-led businesses. Yep. Under growth if they're a product-led growth business. Under product if they're still trying to figure that out, right? But that's kind of the three buckets I would put data works in, like, in terms of origin, right? And then the question becomes, all right, this is where the data team started.
Starting point is 00:07:00 How do we serve the whole company? There are different pressures at play in each of those contexts. And so data team reporting into the finance org, finance, for those who don't know a ton about how you manage a profit and loss statement, finance generally falls under GNA, right? General and administrative. And the two other categories you care about there are sales and marketing and R&D, research and development. And as a crude rule of thumb, product and engineering is going to go under R&D and sales and marketing is sales and marketing. And things like HR and the other things fall under GNA. And there is a big pressure for GNA to always be a tiny portion of your expenses.
Starting point is 00:07:47 So if your data team is underfinanced, your data team is probably going to find themselves underfunded. There are ways to mitigate this. For example, one model that I have found to be very successful is what I call centrally reporting but locally prioritized. So your data team should report into your data org. But if Sally is going to spend all her time working on marketing, then her headcount should be funded by the marketing department. And if Adam is going to spend all of his time working on product or growth or whatever the department is, their headcount should be funded by that respective department. Oh, interesting. Okay. And that allows, you know, we shouldn't think of like the team where the data org reports up into
Starting point is 00:08:37 should not drive their cost model because we don't want to handicap our team before it started. We should really think about how do we serve the business and then let's make sure we have the people and let the headcount budgeting allocation for that support our business goals. Yep. Super interesting. It's the classic, if your team is a cost line item, watch out because when things get tight, they're going to be coming for you. Well, okay. That is so helpful. Let me ask a question, zoom out a little bit.
Starting point is 00:09:13 So there's an old adage, follow the money, right? If you're close to the money, you're close to the problems and sort of the most important things, do you think that that is true and sort of beneficial, you know, sort of regardless of the headcount, you know, and sort of cost center considerations? How did being really close to the money influence the way that you thought about data, you know, for functions outside of finance, right? Sort of sales, marketing, engineering, even. Did that influence the way that you thought about that? Sort of having a financial lens on the work that you did? I think having the CFO's priorities drive our work is what really stuck out to me. So finance
Starting point is 00:09:57 was definitely a piece of it. And in fact, I think, you know, it's like I said, 2018 is both not that long ago and a very long time ago. A short eternity. Definitely. But it was ARR and MRR reporting and things like revenue retention and customer retention were a huge part of our early projects back then. I remember racking my brain and like, what are the edge cases of retention? And then at GitLab, not only do you have to think about it on a customer level, but customers roll up, right? And so if you sell to a company and then they have a parent company, how do you think about how many customers you have? So there was a lot of complexity there. And I think having those products as the foundation on which everything else we worked on came from
Starting point is 00:10:53 really drove how I understood, I personally understood the company. More than that, though, I think this CFO's priorities driving the data team's work then meant that other parts of the organization felt underserved by data. And the consequence of that was that we saw miscellaneous data hires pop up in other parts of the org. Interesting. And so that's where this emphasis on separating headcount from reporting structure. If you're hiring data people, they should get a manager who understands their job and can help them with their career development and can help shape their work independent of where their priorities are coming from. Yep. Yeah, that's super interesting. I mean, one thing that you said that sticks out to me is that, you know, just getting aligned on definitions and like having really sharp definitions is hard, but it sounds like that was a really big emphasis. But part of sort of the hidden cost of that
Starting point is 00:12:01 in the way that things were structured was that even though you had a really tight definition, you weren't necessarily, like the team wasn't structured in a way that that tight definition actually served other teams and sort of provided value to them by making things simpler or providing data. Fascinating. Okay. So let's, because I know Costas has a ton of questions and I could keep going all day, would love to hear about maybe one or two more big things that you learned at GitLab. And then the main lessons that you took from GitLab to Netlify, you know, because we've had Netlify on the show. Like, I mean, amazing team, you know, sort of preeminent in the data space as operating, you know, so well and being a model.
Starting point is 00:12:44 And you were, you know, in many ways, an architect behind that. So I'd love to know, what were the big things that you took from GitLab to Netlify? Shout out to Paige, who spoke on the show. Paige is just absolutely incredible. I think one that I realized was I saw a lot of the consequences, pros and cons of having data people spread out throughout the organization. So one thing that I really brought to me when I joined Netlify was this centrally reporting locally prioritized. And so as the team grew, we were having other executives allocate their head count to the data team, and we would have them fund them. But very much from a, okay, but you're going to establish a business partnership with that executive, and 80% of your time is allocated towards them, 10% to professional development,
Starting point is 00:13:46 and 10% to technical debt and helping maintain our data infrastructure. And that was a rule of thumb that overall averaged out to be true. We found that to work really well. Can I ask us an org chart question here? I'm just thinking about our listeners who are managers and who are dealing with some of these things.
Starting point is 00:14:08 Because there are sort of two ways to do that, right? You have, you know, your actual boss is the someone in the data org, right? The leader of the data org. And then you have a dotted line to the functional area. Yeah. For the inverse, right? Where you actually operate under the functional area and you sort of, and I, you know, that gets tricky with evaluations and a number of things. Paige, you know, helped us break some of that down. What do you think the best way to do that is? I do think you should report into the data org. So that should be your, your direct line with a dotted line to the rest of the business or to whomever your business partner is. The thing that startup people hate is like
Starting point is 00:14:46 matrix org might as well be a curse word, right? Like if you say that too loud, the startup people come after you with their pitchforks and they're like, no, we're functional. And I'm like, data is cross-functional. Try again. And so like the only way a data org works really well is if it's a matrix org. And so the only way a data org can truly be effective is if we're matrix orgs. Because data, where it moves the needle for the company, isn't when you've just got product data talking to product data or when you've just got sales data talking to sales data, right? It's when these things come together, when you're looking at data that originates from Salesforce
Starting point is 00:15:32 and you're enriching it with product analytics to understand what drives conversion or what are predictors of upsell. Like those are the things where the data team really moves the needle for the business. And so if we think we're going to have functional data organizations, we're going to have 12 data teams within the company. And so I frame it in this centrally reporting locally prioritized way because data or startup people are allergic to matrix organizations.
Starting point is 00:16:06 But the high output management, the good old great Andy Grove talks about how matrix orgs are the way to go. And so that is what it is. Yeah, I love it. Okay, one more question then, Costas, I'm handing the mic to you for a long time. And this is just sort of a selfish question. At GitLab, who is the first team to make a rogue data hire? Oh, you're asking me to throw someone under the bus. Oh, I am. That's so... Okay. Sorry. You don't have to answer that. You're totally right. I just had a suspicion that it was marketing because I work in marketing. And that seems like the kind of thing you would do.
Starting point is 00:16:45 Is that what you're saying? Yes. I'm not throwing someone under the bus. I'm trying to self-incriminate you. I mean, he did that. He did that in other sites as well. I've done this before. You don't have to answer the question.
Starting point is 00:17:00 Here is what I will say. Very rarely do rogue data people show up with data titles, right? So they might have like operations analyst or some variation of buzzwords that don't mean anything. Love it. A visualization engineer to really move the needle. My question is answered and my guilt is laid out for all to see. So thank you so much. Okay, Costas.
Starting point is 00:17:31 Oh, thank you, Eric. Thank you. So, all right. You're talking, the two of you, for quite a while now about DataWorks, right? And I think it's a great opportunity to start providing some definitions. So let's start with what a data organization is.
Starting point is 00:17:48 What's the mission of a data organization in a company? Good question. I'd say it's on every data leader who comes into an org to make sure your team has a mission. Our mission at Netlify was pretty straightforward. And I reference it a lot when I think about my work today, because, well, one, I wrote it. But two, I think it's a really good example of how having this thing to look towards and drive kind of as a, it's not a North Star metric, but it is like the light at the end of the tunnel to point people to.
Starting point is 00:18:25 So our data team mission at Netlify was that the data team exists to empower the entire organization to make the best decisions possible by providing accurate, timely, and useful insights. So it's really about making the best decisions possible. In terms of what is a data org, that's one of those touchy-feely, I know what it is when I see it sort of thing. But I tend to think of it as everyone in your company should be a data person. I asked someone once, what is your first data hire when you're starting a company? And the pushback I got was, when you're starting a company. And the pushback I got was when you're starting a company, all of your first hires have to be data people because it doesn't matter if they're
Starting point is 00:19:10 in marketing or sales or product. They just have to have this sort of data drivenness about them if that's the way you're building a company. And so I think about that a lot because all of your people have to be data people throughout the company. It doesn't matter what their job title is. Some fuzzy combination of operations analyst is fine. But beyond that, the data team are the people who are managing kind of your data stack and infrastructure and whose goals are to use those tools specifically to help drive the best decision making in the company. And so when we think about data teams, some people like to think of them or frame them as supportive functions. The data team doesn't always roll out the next marketing campaign, but they make sure marketing has the information they need to roll out the best campaign they can.
Starting point is 00:20:06 And so it's a bit of a squishy answer. I don't know that there's one that I could give you that would be better other than I know it when I see it. Alex Meadellas- No, makes total sense. Okay. And then what would be your definition of, let's say, a minimum viable data organization? I think the easiest way to get started is probably a data warehouse,
Starting point is 00:20:39 some off-the-shelf ETL tools, some reverse ETL or data activation, and some easy way to access that data using SQL, whether it's a notebook or a BI tool or whatever it might be, or just like a way to download CSV so you can put them in a spreadsheet where other teams can access it, where the rest of the company can access it. And that's why when I run through that list, I mentioned specifically data activation or reverse ETL or operational analytics, whatever people are calling it these days, but the general idea of let's take data that only exists in some systems and put them in other systems. Let's democratize and make this data accessible to other folks in our company. I think that is the most low-hanging or high ROI work that a data
Starting point is 00:21:42 team can tackle early on is really give people access to the data that they need to do their work well. Yeah, makes total sense. And in terms of roles, like let's say a company considers of like starting the data org, right? Like what should be the first roles hired to build this organization?
Starting point is 00:22:03 Where we should start from? In 2022, if you're getting started with like a snowflake data warehouse, then you can get started with an analytics engineer who's going to manage your full infrastructure pretty easily. You don't need a DBA. You don't need, you know, a lot of custom data engineering.
Starting point is 00:22:24 In a world where engineering is such a DBA, you don't need a lot of custom data engineering. In a world where engineering is such a precious commodity, right? You talk to any engineering leader and they're like, hiring right now is so hard because it is. It's because there's much more demand for engineering time than there is engineering time. And if you're making that calculus, it almost always makes sense to buy versus build. And so when we look at one of the big advantages of the modern data stack, it's that you can go buy so many of the pieces
Starting point is 00:22:57 and have everything up and running in an afternoon. Mm-hmm. Yep. And, okay. My next question, because you used, like, the term analytics engineer, right? And, again, we will stick with definitions
Starting point is 00:23:13 because I think it's important. It really helps, like, people to understand because, you know, like, we could be using terms, but we don't spend, I think, like, enough time making the semantics around this term for like well communicated, right?
Starting point is 00:23:26 And that's important, especially like for people who are out there considering what the next step of their career should be, right? So, okay. Analytics engineer. What does this mean? What is an analytics engineer? Good question. So I think of data team roles as falling into four
Starting point is 00:23:47 buckets and i call these the core four roles because if you name it it makes it marketing or something you could ask eric later and right so core four roles happy to. Data engineer. Data engineer moves data from outside of your ecosystem in. Analytics engineer works with data inside of your ecosystem. Data analyst focuses on surfacing insights to the business. Machine learning engineer builds and productionizes machine learning models. There is some some wishy squishy soft gray boundaries here everyone needs to be able to push insights to the business everyone needs to be willing to do whatever the and solve the problem that's in front of them. And that's okay. That's called working. The general idea is if the bulk of your time is being spent in one of these categories,
Starting point is 00:24:52 your job title should be reflective of that. And you'll notice not included is data scientist, right? Because if you ask 10 people, what is a data scientist, even if they all have that title, they will give you 10 different answers. I know because I've tried. And so I think that when job titles don't mean anything, we should get rid of them, right? Like the language we use is so important. And so data engineers move data from outside of your ecosystem in. Analytics engineers work with data
Starting point is 00:25:29 within your ecosystem. Data analysts focusing on surfacing insights to the business. Machine learning engineers focus on building and productionizing machine learning models. Mm-hmm. I love that. That's very, very clear and precise.
Starting point is 00:25:45 Thank you so much for that core four roles. I think, Eric, you should do something with that. A lot of marketing can happen on top of this. So, all right, that's great. So we started with analytics engineer, right? And then we have... Actually, that's interesting because I would expect to hear from you that started with data analyst, to be honest,
Starting point is 00:26:11 because I think that's also probably the most common thing that probably companies do, especially if they are not, let's say, I'll say that, very engineering-driven companies, right? Yeah. We also, I mean, one of the mistakes that many times we do is that we consider like every company out there is like a tech startup in Silicon Valley, right? Like we have way too many engineers to influence how we do things and how we think. And that's not actually the reality out there, right? So let's say you have a typical company that at some point wants to start leveraging data that they have.
Starting point is 00:26:49 And I would think that they will start with data analysis, right? But you said, no, you shouldn't do that. It's better to start with an analytics engineer. And my question is, is this because, let's say, when you start with an analytics engineer, you can have like a little bit of a data engineering together with some, let's say, capacity to do the actual analytics. And you can have, let's say, one role that can bridge all the different needs that you need at that point. Yeah, I'm going to answer your question with a sad story, right? So what happens when a company hires a data analyst first, right?
Starting point is 00:27:30 Someone who's, there's no tech stack, there's no data infrastructure. They are just going to like pull some spreadsheets from places and combine them and do some like Google She or Excel wizardry, right? The business loves it. People have numbers. How exciting. There's no automation underlying it though. So every Monday they have to rerun the executive report for the Tuesday meeting. And then there's another report that they build and it's really exciting and it's a monster spreadsheet. They've got historical revenue information for all time that needs to go into the sales
Starting point is 00:28:06 VP meeting. So now every Wednesday, they spend the whole day rerunning the sales VP revenue meeting spreadsheet for the Thursday meeting. And then Friday comes around and they realize they spent half of their week rerunning spreadsheets and they didn't get anything done. And this becomes a world where you have to continue to throw data people at the problem because there's no automation, there's no systems, there's nothing that lets that data person scale. And so over and over, what we actually see
Starting point is 00:28:39 is that when companies do this, two things happen. One, they get very frustrated with data and go back to the beginning. Or those people develop the technical skills to bring more engineering practices into their organization. An example I'd point you to is Claire Carroll. Claire Carroll is a product manager at Hex. She was previously the DBT community manager. And she'd tell you her career story is that she was an excel person who stepped into a data analyst role at a company where there wasn't a
Starting point is 00:29:14 ton of engineering support for her and she learned things like git and the command line and dbt and sql and all of that over time as she grew her career. And the end result is that her influence in the company grew, right? But it's unfair for us to say like the only way for our data analysts to be successful is if we force them to acquire more engineering skills, right? There is a fundamentally different skill set in surfacing insights and being an analytics engineer, which is what Claire would tell you her career journey was. And so I think we set people up, set our data orgs and our data people up for failure if we don't write higher the role. And so to answer your question, I think that hiring an analytics engineer early on is in a lot of ways the best of both worlds. You get a little bit of that more complex engineering skill set when that's the solution you need.
Starting point is 00:30:14 But you also get someone who's very comfortable working with your data, communicating with stakeholders, and is expected to also be able to surface insights to the business. Okay. And then why do we need data analysts? Or when we start needing data analysts? If like we can, let's say we start with analytic engineers, then we get data engineers at some point to make sure that like we automate the whole like in and out of data. So when the data analyst becomes like a need for the team, for the org, for the data org? When you're building your data team from scratch, there are two models that I've seen be particularly successful. One is you want to take a divide and conquer approach early. So you want to service, let's say, four different functions in the business or five different functions in the business.
Starting point is 00:31:11 And so in that case, you take the approach of hiring an analytics engineer who's going to be a business partner to each of those. And they're going to build out the core modeling and they're going to be responsible for the insights, right? So if we were to think of, if we think of a spectrum where we have a zero to one, this is a little bit hard for people to listen and visualize here, but hopefully they'll indulge me. If you think of a zero to one line and we think of our data infrastructure as having three parts, where zero to 0.33 is moving data in.
Starting point is 00:31:45 We think of working with data in our ecosystem as 0.33 to 0.66. And then we think of business insights as 0.66 to one, right? An analytics engineer's mandate is not just that middle section. It's the right two thirds. And so we can focus on hiring them to do kind of the full stack eness of it. Or you focus on a particular part of the business. You have an analytics engineer focus on just that core modeling and you bring in a data analyst who's now going to really focus on insights.
Starting point is 00:32:20 A lot of the like which approach is best is specific to your business. How well do people already understand the data? What already exists? What numbers are people used to looking to? Are you being driven by a particular change agent in your organization that's going to drive your priority? And so it's a little hard to come in and say like, this is the right way to do it, right? There is no one right way. It's a lot of the context of your organization, but understanding the trade-offs of each, I think is a great way to understand and make the decisions specific for your organization.
Starting point is 00:32:58 Okay. Okay. That, that helped a lot to understand. And have you ever seen, or you have experience of like what the result is of starting a data org and starting with data engineering? No, I don't think I have. I'm thinking about this, but I'd like to think I have swayed enough people to avoid that scenario, but I don't know. I,
Starting point is 00:33:26 I don't, I don't know though. I know companies that have some weird title ledge going on that makes it hard to like really tell who they hired. Right. If you hire a BI engineer, what does that mean? If you hired a look ML developer,
Starting point is 00:33:46 what does that mean? If you hired a LookML developer, what does that mean? That is part of why I think it's so important that we centralize on these core four roles, is that people should be able to see data engineer and have a good understanding of what the skill set being asked of them is. Okay, the reason I asked that is because actually both Eric and I have an experience with that, and that's actually RatherStack. Now, we're starting to share a little bit of embarrassing side information, Eric, but I think that's fine. I love it. I see jigs on the DataStack show. Yeah, but okay, RatherStack, I mean, started as a company because RouterStack itself is like, okay, it's a platform pipeline, right?
Starting point is 00:34:31 So it's mainly like people who work there that are like systems engineers and data engineers. So we had data engineers. And when we had to start creating some kind of infrastructure to collect some data, we started with the data engineers. And the result is... Wait, can I guess? Of course.
Starting point is 00:34:55 You had a lot of data, but not a lot of insights. Oh, yes. But I'll give a little bit of even more embarrassing information. And I would say that you end up with a snowflake instance that has a database that is named EricDB. It has come out publicly. Now I know why Eric hired his rogue data person. That's exactly right. But, but isn't that exactly it? If you don't, if you don't empower people with the data they need, they're going to
Starting point is 00:35:32 do whatever it takes for them to get it if they're good at their job. Right. And so, so that is exactly the problem. But I, I think, you know, what I've seen, you know, engineers love to engineer. That's why they're engineers, right? And so they nerd out about CI and linters and all that kind of stuff. And don't get me wrong. I, too, was an engineer once upon a time, a mediocre one, but an engineer nonetheless, right?
Starting point is 00:36:02 And we cannot, I think one of my jobs as a manager has always been to help coach my team members. I'm like, I do not care. I mean, I care. Don't get me wrong. I'm not here for tech debt, but I don't care that much about how cool your engineering infrastructure is. I care about the impact we're driving to the organization. And you cannot lose sight of that no matter what you're doing. Yeah. And I would say, you know, it's interesting, Kostas, like reflecting on that
Starting point is 00:36:34 good old Eric D.B. Like the funny thing is when that happened, it just it wasn't a huge deal, right? Like you're trying to this. And I think this is kind of what you're talking about. How do you keep sight of the longer term goal, Emily? And when we were like building those analytics use cases, there were just a couple of questions we needed to answer, right? Like we just need to answer a couple of questions here, right? And that seems so innocuous, right? And then you don't realize that on sort of the backside of things, like, well, you're creating a lot of future tech debt, you know, which can be dealt with, but there's always a cost to that, right? Like, okay, well, now you have to choose between insights and tech debt, right? And the more you choose insights, the more the tech debt grows,
Starting point is 00:37:21 and, you know, you sort of eventually have to pay the piper and so it's a very it's a very slippery slope right and it's very easy to do that early on in a company especially if you sort of take the like well there's an engineering solution to every problem and we just need to answer a couple questions cost of some this has been cathartic i mean i'm i'm admitting all sorts of sort of data crimes publicly, which is freeing. So thank you. Yeah, this is actually a therapy session for you. That's why we are doing it. The data therapy show coming soon.
Starting point is 00:37:56 Yes, that's going to be called Eric DB. Yeah. Well, that probably says something about how territorial you are when it comes to data. So, I don't know. There are deeper conversations that need to happen offline. It's getting deep. In retrospect, Eric, it sounds like you just needed a better code name. Like, rather than naming it Eric, you should have named it like some Disney movie you watched recently and throw people I didn't I didn't name it an engineer named I worked at a place where our replica was called Jakku which is a Star Wars reference for anyone who didn't understand but at the time I did not understand I had never seen Star Wars yet. I have today.
Starting point is 00:38:45 Things have been redeemed. But I remember like, Jakku, this is such a weird name. Why would anyone come up with this? And they were like, wow, you have so much to learn about being an engineer. That's so true. And just about something like the conversation that we had is that one of the most common crimes that engineers do is overengineering. And that's like extremely, becomes extremely obvious, like in an early stage startup or when you start something from scratch. And that's exactly like what happened at Rutherstack, right?
Starting point is 00:39:23 Like at the end, we had way too much data. Like it, it was like extremely hard to separate like noise from signal there. Because we just asked from the engineers, okay, guys, we need data. Oh, sure. Wait and you'll see. I mean, we delivered, we gave you all the data that you will ever need. And I think that's, that I think that's quite important. And the lesson that I've learned is kind of the hard way
Starting point is 00:39:50 by transitioning from being like a software engineer myself to getting other roles. But over-engineering can be like a really hard, let's say, thing to deal with. Because more engineers does not mean a better solution. Yeah. But I think the other thing is, and Emily would love your thoughts on this,
Starting point is 00:40:12 because I think one of the things that makes that difficult, Kostas, is that when we were going through that whole cycle, we were a really early stage company, you know, so scrappy. And so many times in that phase, like your engineers are sort of your de facto data engineers, analytics engineers are sort of everything. Right. And so you get this dynamic of you ask for something simple and,
Starting point is 00:40:38 you know, the kitchen sink is sort of thrown at it. And that's really challenging, right? Because you only have so many hours in the day and you're still trying to figure out product market fit and all this sort of stuff, right? So Emily, we'd love your thoughts on that. How do you mitigate that? Because I'm sure we have listeners who are in that environment where, I mean, look at RutterSack, things are great now. We have such a great analytics setup, but we certainly over-engineered things early on. We're now using some of that data. So in many ways, it's like, oh, I'm actually glad we did that. We
Starting point is 00:41:08 probably should have done it a little bit differently, with a little bit more calculation. But I would think that's really common, right? That happens all the time where to an engineer who's building a product, they throw engineering as sort of the hammer and everything's a nail. And so you throw that at data and you end up with sort of over-engineered things. But Emily, would love your thoughts on that. Yeah, I think I have seen it all the time. And one, another way this manifests sometimes is engineers love their tooling. And so suddenly they're using Prometheus for their business metrics, right? And I mean, it happens. And so suddenly they're using Prometheus for their business metrics, right? And I mean, it happens. And so people know the tooling they know.
Starting point is 00:41:51 And it's unfair as a data practitioner to assume that like everyone is going to know the best practices of the modern data stack. There's this great blog post by Vicky Boykus called, you don't need Kafka that specifically picks on WeWork a little bit, which I am a big fan of doing as, as a big fan of following the WeWork. Vicki Boykus- We did a little bit in the show prep, which was super incredible. Yeah. And so I think about it as like, we, there's going to always be this natural, cool engineering tendency to want to over-engineer. And the thing that is going to pull us back on that is just this, what is the thing that drives the biggest impact?
Starting point is 00:42:42 What is the simplest solution that drives the biggest impact? What is the simplest solution that drives the biggest impact? And something I use as my own anchor often was something I learned from the GitLab CEO. We had some problem in front of me and I was talking to him about it. I was like, I'm not really sure what to do next. And he said to me, what can you do that moves the needle that you can ship
Starting point is 00:43:07 in the next hour? Not in the next day, not in the next week, in the next hour. And so that forces you to really think like, what is the smallest change I can make that makes a difference to this problem? And I come back to that as like, what can I ship in the next hour? And I would ask my, the Netlify team will tell you, I would ask them that all the time. Yeah, what's the one hour version of this? They're like, one hour, that's never enough time. I'm like, yeah, but what is the one hour version? And that helps you scope to focus on impact. So that's part of it. The other is, as you grow a company, part of what you're doing as you're hiring is just like filling out gaps in your org skill set. And so it's okay that your engineers
Starting point is 00:43:59 got started with the engineering tools that they had, and they gave Eric his own database. Like, please go run. And part of what you do when you're ready to hire a data leader is say, data leader, you have to accept this technical debt that's already in place. And at the time, we made a business decision around a trade-off between getting the information that we needed with the tooling and people we had in front of us versus doing it the right way and hiring a data person. And I think that that's an okay trade-off for companies to make. We just need to, every once in a while, look up and acknowledge that that's the trade-off we're making. It's total sense, I would say. By the way, Eric, is Eric the Beast still there?
Starting point is 00:44:55 Or it has been renamed? Let's cover that in a future episode. All right. Yeah, that's part two of true crimes. True crimes in data. All right. So, Emily, we talked about the different roles and the core four roles, as you said. And I'd like to ask you, can you help us identify the, let's say, the important traits that each one of these
Starting point is 00:45:26 roles has, like, let's say if you could create like an archetype for a data engineer or for an analytics engineer, like what you would look when you would be hiring for this, for these roles. You mentioned at some point, for example, like communication skills, when we were talking about data analysts, for example. So I don't know that I have an archetype for each of those, but I will tell you something that has been core to my own hiring philosophy. So I grew up inside of a Dunkin' Donuts. My mom has been working at Dunkin' Donuts since 1999, So 20 plus years now. And I mean, when I say I grew up inside of a Dunkin'
Starting point is 00:46:09 Donuts, I mean, the emergency contact at school was the Dunkin' Donuts down the street, because if anything happened, someone from the store could come pick me up. And so I remember and I have these memories of like my mom walking across the dining room and seeing a straw wrapper on the floor and picking it up and putting it in her pocket. And I think about that. I'm like, you know, here my mom is the manager of the store. Like someone was going to clean the dining room at some point in the next hour. But she saw a problem and she just kind of fixed it. Right. And she didn't solve it with the perfect solution. It's not like she took the straw wrapper
Starting point is 00:46:50 and she put it in the trash. She just put it in her pocket for later. So the next time she was out of trash, she dumped it. So I look for that quality when hiring. And another way to frame it is floor sweepers. People who are going to see the mess in front of them and clean it up. Or if they see that the trash is full, they're going to take out the trash, right? When I'm hiring, I don't want people who are like, this is my job. This is the boundary. This is what I do. And that's that. I want people who are going to see a problem and fix a problem. They're driven and they're taking initiative. They don't need the mandates issued to them. And I care much more about that than I do
Starting point is 00:47:32 about specific technologies you've worked with or companies you've worked at or your education background. If you are a floor sweeper, then I can teach you all the rest, but I can't teach you to be the kind of person who walks by the straw wrapper and puts it in your pocket. Okay. That's, that's some amazing advice for hiring in general, I would say, not just like for data orgs. Okay. One last question from my side, and then I'll give the microphone back to Eric. All these roles we are talking about, they are pretty new, right? I would assume that most universities out there, they probably don't even mention data engineering or engineering or the rest of the roles that we talked about.
Starting point is 00:48:21 What are the paths that people can get, especially like younger people who are looking right now to figure out like what to do with their careers? If they want, let's say, to become like a data engineer or like an analytics engineer and or like an MLOps engineer or ML engineer. So what are the paths there? And do they ever cross also? This is a hard question because I don't know anyone who works in this who would tell you their college education was particularly relevant. And some of the best folks in data that I have ever worked with never went to college at all. So there's a little bit of like, we're going to
Starting point is 00:49:06 see how things shake out and what the next decade looks like. But today, I don't look at educational backgrounds when I'm hiring and I make sure that education isn't a prerequisite for any of the roles that I'm involved in hiring for. They they shouldn't be, they don't necessarily move the needle. Like if you're doing advanced statistical research, then you probably need a PhD. But otherwise, if you're trying to calculate ARR and MRR, your education background doesn't really matter. Makes sense. All right, Eric. All yours.
Starting point is 00:49:46 Okay, we are at the buzzer unfortunately, so I have one last question, but this may be the most important question of the entire show. What's your favorite kind of donut? All of them? I knew it was going to be difficult because it's a difficult question
Starting point is 00:50:01 for almost everybody. I know some people know, but it's hard to choose. It's It's definitely like a mood influence thing. Last night, a commercial came on for, and it had, but in the commercial was like strawberry frosted with sprinkles. And I turned to my husband and I was like, that strawberry frosted looks so good right now. And so sometimes the strawberry frosted, sometimes it's a blueberry cake, sometimes it's a chocolate glaze, sometimes it's a Boston cream, sometimes it's a fresh French crawler. They're still kind of warm. The glaze is still like not set in. All of the above is the correct answer here. Love it. Love it. Well, Emily, this has been such a fun time in
Starting point is 00:50:42 the show. We've learned so much. Thank you for sharing your time and your insights. I know it's been helpful for us and our listeners, and we'd love to have you back sometime. Thanks. I'm here and ready. Costas, first of all, Emily just seems like such a fun person. I had a great time on that show laughing.
Starting point is 00:50:59 And honestly, I feel way lighter for some reason as a person after getting it out there that Eric DB is something that exists. But in all seriousness, I think one of my biggest takeaways was her recommendation on.B., but, you know, that's sort of exhibit A of what can happen when you sort of have an engineering first approach to data without necessarily trying to establish the underlying questions around value in the organization. Which sounds funny to say almost, right? Like when you say engineering first approach to data, I mean, that sounds very natural. And in some ways it sounds like the correct thing. You know, but when you're thinking about how to build a team,
Starting point is 00:51:56 it may not be. And so that's going to stick with me. I'm going to think about that a lot this week. Yeah, I totally agree with you. I think we have a great opportunity to talk about, first of all, the role of the data team inside, the data organization inside the company, but also the part where customers when we as engineers, we actually over-engineer the solutions that we provide to them, right? And we can, let's say, get exposed to what it means like to over-engineer something without having very clear business objectives there. And that's what I'm keeping from this conversation because it's like kind of like a realization that I also made during this conversation. And it's, I don't know, it's super, super valuable.
Starting point is 00:53:05 Outside of obviously of all the rest of like the conversation that we had with here about the roles, the organizations, and how to position a data organization inside the component and grow it.
Starting point is 00:53:18 It was a very, very insightful conversation. I agree. All right. Well, thanks for listening to today's SAC show. Tell a friend about it if you haven't. We love new listeners and we will catch you on the next one. We hope you enjoyed this episode of the Data Stack Show. Be sure to subscribe on your favorite
Starting point is 00:53:34 podcast app to get notified about new episodes every week. We'd also love your feedback. You can email me, ericdodds, at eric at datastackshow.com. That's E-R-I-C at datastackshow.com. The show is brought to you by Rudderstack, the CDP for developers. Learn how to build a CDP on your data warehouse at rudderstack.com.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.