The Data Stack Show - 230: The Cynical Data Guy: Data Tech Debt, Data Mesh, and Dashboard Directives

Episode Date: February 26, 2025

Highlights from this week’s conversation include:The Return of the Cynical Data Guy (0:14)Risks of SQL Complexity (2:16)Technical Debt in Data (4:34)Data Mesh Critique (6:38)Governance vs. Decentral...ization (9:55)Never Let a Stakeholder Tell You They Need a Dashboard (12:05)Dashboard vs. Table (13:34)Organizational Dynamics in Data Requests (16:35)AI and Prompt Writing (19:43)Search Techniques and User Behavior (21:20)Discussion on Code Optimization Tools (23:19)Final Thoughts and Takeaways (24:47)The Data Stack Show is a weekly podcast powered by RudderStack, the CDP for developers. Each week we’ll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data.RudderStack helps businesses make the most out of their customer data while ensuring data privacy and security. To learn more about RudderStack visit rudderstack.com.

Transcript
Discussion (0)
Starting point is 00:00:00 Hi, I'm Eric Dotz. And I'm John Wessel. Welcome to the Data Stack Show. The Data Stack Show is a podcast where we talk about the technical, business, and human challenges involved in data work. Join our casual conversations with innovators and data professionals to learn about new data technologies and how data teams are run at top companies. Welcome back to the Data Stack show for one of our favorite monthly installments, our time with the cynical data guide, Matt.
Starting point is 00:00:36 Welcome back to the show. Thanks for having me. From the bowels of corporate data America, we are going to do three rounds today, possibly with a fourth round. I have some tasty LinkedIn selections here for us to get through. As always, we will only mention the name when appropriate. And this first one actually is from our good friend,
Starting point is 00:00:59 Ben Rogajan, who's been on the show multiple times. Great guy, great thinker. But Ben, we're gonna put you on the docket here. Data teams can often portray the illusion of high productivity while accumulating a devastating amount of technical debt. Just write another dozen DBT models or build a few new Tableau dashboards
Starting point is 00:01:20 with internally calculated metrics that only a single analyst knows how or why they were developed. Speed is the goal, right? Just because the number of data pipelines your team manages is growing and your data team's headcount is getting larger, doesn't mean you're more impactful or even providing high quality data products. The cynical data guy. Yeah, be careful, especially there's this core thing, like more people I have, Yeah, be careful, especially if it's this core thing, like the more people I have, the more important I am. That is not the case if you're a data engineering team.
Starting point is 00:01:50 You are setting yourself up for some pain if you're not careful on that one. There is some, you're setting yourself up really bad. But it's not just data pipelines either. I have known people that like they thought if they wrote really long SQL queries, it meant they were really smart and not just horribly inefficient and obscuring everything so no one else can try to take their job. Yeah, there's I've seen some of those queries and I think it's hard for me to understand like is it like intentional?
Starting point is 00:02:25 Like there's, there's a like very long, like, like any sort of sequel included of like, okay, like they just lit up their doing the figuring out their learning. And it's just really long. And then I guess there's another version of like, I want to be fancy. I want to do this in a cool way. That just ends up with that like crazy amount of complexity. I actually worked at a place where we had someone that they were two opinions on this person. They were a people who worked with him were like, Oh, he's just, do you see his
Starting point is 00:02:52 sequel players are just amazing. And everybody else who had to deal with that was like, this guy's an idiot. This is not what he's doing. It was completely complicated. There's no reason. doing. It's completely complicated. There's no reason. So I had a conversation this morning, which I thought that that really applies here where we were talking about sales commission. So if you've, if you've ever done any reporting on sales commissions, like scary stuff, like you don't want to be the one to make the mistake. Yes. So they got to tell me that I've got my sales commission query and for the query, there is more English writing, explaining what each section is doing
Starting point is 00:03:30 to like justify each business rule in line than there is actually in code. Overline, iteration, up to iteration. And for the analyst, those like, I think that's the right thing. I think you're doing it right. Yeah. Yeah. Even it's. Yeah. Yeah.
Starting point is 00:03:45 It's so painful. Yeah. Well, in the number of data pipelines, going back specifically to this, like, realistically, that should never be what you're focused on is the number you have. Right. Another quick story was I worked with someone else where they were making the transition from on-prem to cloud, and they announced it. Everyone knew about it in the last minute.
Starting point is 00:04:05 One of the Power BI reporters was like, you can't do this. I have 120 queries. I don't have time to do that. We all just heard like, wait, you have how many queries? And it turned out every time you needed to add another column,
Starting point is 00:04:19 he built another query and mod in Power BI. So it was really only only three queries he needed. So bloat is a sign you're probably doing something. Yeah. One thing that's interesting about this is, data is exploratory. So I think one of the things is like planning on how you deal with, how you deal with that technical debt. Because if you think about software, like, I mean, whenever you're working in a technical environment,
Starting point is 00:04:55 you have technical debt. I mean, there is literally just no way around that, right? But there should actually be a lot of technical debt in data because you're trying to figure out the best way to solve something or things change, whatever. It's not like that doesn't happen in software, but you can't build technical specifications for exploratory problems in data quite like you can if you are doing a very detailed spec on certain things in software, which is interesting.
Starting point is 00:05:26 And I was thinking about like as a data leader, like a really good data leader is probably like planning on, okay, how are we actually going to deal with this? Because it's inevitable number one, but also I would argue like, if you are trying to produce a lot of value, you're going to create a lot of technical debt, you know, with data. So are they planning on how to deal with it? Or is that just a lie we tell ourselves to make our six-week long title? I think there's a product under here. It's like what is the equivalent feature
Starting point is 00:05:56 flag on start point? Because that's a way in a product that you can keep a lot of the complexity lower and it'll lead from on your own set component for set. Yeah. Like customer. That's called comments. That's called more comments than dash dash. No, actually I was looking at data fold recently and it looks like they're focusing on data migrations. Maybe we should
Starting point is 00:06:25 get them back on to talk about that because it's not quite feasible. Okay, that was round one. Let me pull up the next one. Oh man, this is a good one. I'm so excited about this. I'm just going to read it. I'm just going to read it. Data mesh is dead. Here's why. The last couple years, data mesh has been the next big thing. It promised to fix bottlenecks, decentralize ownership, and scale data-driven organizations. But today, most companies failed to make it work. What went wrong? Decentralization led to chaos. Governance has become a mess. More silos, not fewer, too complex to maintain. What actually works? A hybrid model, real accountability, practical governance. The
Starting point is 00:07:11 takeaway? Data mesh sounded great in theory, in reality, few made it work. Time to focus on scalable, manageable data strategies. What do you think? Was data mesh always flawed or did we just implement it wrong? The communism of data. I think if you take data mesh on its own terms, so regardless of what I think about it, I think there's a point here that there's an operational model that needs to be different for it. But on its own terms, we have two things that are competing against each other that are just, I don't see a solution to. One is it only makes sense if you're huge. It doesn't make sense if you're just a small team. But it's a way of life or a way of thinking or something like that. So you have to completely have a cultural revolution and operational
Starting point is 00:08:04 model changes and stuff like that to work, which who are really bad at that? Oh, giant organizations. So you've got these two forces that are just going to compete against each other that I don't know that's ever going to really square that circle. Yeah, it is such an odd fact of life of, all right, who can benefit best from extra layers of abstraction? It's like when I have more people and more teams in a larger context, like who's the least likely to change their processes to take advantage of side like extra layers of abstraction? Yeah. The same people.
Starting point is 00:08:45 So, cause like the people most likely could implement what's a SQL mesh, but small like smallest company and people that don't probably need SQL mesh like a smallish company. Where we're just gonna like pile and crap and go ahead and learn what we're doing. Right.
Starting point is 00:09:00 We had a conversation before the podcast today of like, code pieces that don't really need even like a DBT like layering. It's just not that much data. There's like one person, but I was saying it was like, coming in, working with one person in accounting, who I actually like, I think will be very diligent with whatever process we put in place and they can get away with a couple of views in a database. And essentially like that's what they'll pull data from. That does not deserve like an entire abstract and like
Starting point is 00:09:29 pipeline of builds and tests and all these things. Well it's also its size. You don't have to get to like multi-matching and things like that. It's hierarchical. We haven't figured out how to organize humans at that scale in a decentralized way. I don't think you probably can do it and have it all going in the same direction. So to be like, well, if you're really big, if you just decentralized, it'll work great. It's like, maybe, I don't know, you've got a couple thousand years of human history fighting. Well, and you have the like, the leaders that are concerned about data governance, security, et cetera, et cetera. And it's like sharing, sure, to some extent, the data sharing problem is a problem,
Starting point is 00:10:12 but we also want central governance and all these other things. Mm-hmm. Oh, and there's also inefficiencies, and CFO is going to come down on you for that. Why do we have 12 different vendors to do the same thing? Yeah. I mean, if you why does Microsoft work for us for the solution? What's your first question? Yeah, yes. I mean, if you are a chief data officer, you're like, I'm going to build my cloud by building the number of people under me. This is probably a great way to do that. But it's probably gonna to be temporary if I'm crashing down. My hot take on this is that I don't think that data mesh was ever alive to me. What's that?
Starting point is 00:10:52 And I'll tell you the reason. Was it like Schopenhauer's? Well, okay, so I know this from the show, I remember when data mesh started to become a hot topic. And we had multiple discussions about it on the show. And it was really hard for multiple, very smart people to really pin down what it was and what it meant practically for day-to-day data teams. And after going through that multiple times, I was like, okay, if these are like, we have smart people like I'm trying to like parse through it and they is like,
Starting point is 00:11:37 I don't really know what this is. Right. And so that to me was a major warning sign from the very beginning. You just end up interviewing, you're like, I just have one more question. What is data fashion? I mean, that was kind of... Yeah, that's what I say. It's like the communism of data. In a sense, it sounds good, but when you really get into the practical, how is it going to work?
Starting point is 00:12:01 You're like, oh wait, no, this is never going to work where I am. Yeah, that's been an interesting one. practical, how is it going to work? You're like, oh wait, no, this is never going to work where I am. Yeah, that's, that's been an interesting one. I'm sure there's some great takes. Okay. Round number three, never let a stakeholder tell you they need a dashboard. I almost just want to stop there because I'm running. That's a banger, but I'll read the whole thing. We'll start over.
Starting point is 00:12:24 Never let a stakeholder tell you they need a dashboard. They shouldn't even be asking. A dashboard is a solution and we, the data professionals, need to be the ones to determine the best solution for a specific problem. It's our job, not theirs. If a stakeholder is coming to you asking for a dashboard, they are not coming to you with a problem. They are coming to you with a solution. Don't let them do this.
Starting point is 00:12:49 Stand up for yourself. It is the domain of the data expert to determine if a dashboard is the correct solution, not the stakeholder. After all, who is the data expert here? Do you agree or am I totally wrong? This feels like you're getting up in the morning. I am a strong, powerful data professional. But I read that and kind of my first thought is like,
Starting point is 00:13:11 aw, sweet summer child you are. Like, I don't know. Yeah, they come, they send them on a dashboard. Maybe they actually need it. You could just ask follow-up questions and then go like, oh, well, you really just need us to report for the next two months. Well, we're good.
Starting point is 00:13:30 We don't need to be putting our foot down on this. Yeah, my take on it is sure, maybe they'll need a dashboard, but for the average user, it's like, here's our menu, we're going to make a dashboard. We're going to send you that dashboard on a regular basis and it's not going to have any graphics in it. Is that okay?
Starting point is 00:13:53 Yeah, that's what I wanted to say. Dashboard. But it looks like a table. Like it looks like a table. But if there's- I seriously had that conversation. It probably happened a dozen times over the years that my- I'm like, I'm not going to do that.
Starting point is 00:14:01 I'm going to do it. I'm going to do it. I'm going to do it. I'm going to do it. I'm going to do it. I'm going to do it. I'm going to do it. I'm table. Like it looked like a table. But if there is a table. I seriously had that conversation. I probably had half a dozen times over the years. That's what I wanted, like a table. This is almost like product discovery.
Starting point is 00:14:14 In a physical Excel file. Yeah, right. Exactly. Do we need to do it? Empower me. Just put it in a word. But this is part of the product discovery that you just do, I think it's part of it. I mean, also, I remember I was the one data person
Starting point is 00:14:32 on the marketing team and our data wasn't great. And I would meet with people on the team and they would have this, well, what do I want? And these outlandish things. And what do you say? You go, okay, that's great. So what are we gonna read out of this, out or that? Okay, so we can't do X. And here's the reasons why. But I can give you Y and Z until you go talk to them about how to make this work. Like there's
Starting point is 00:14:58 ways of kind of doing this without kind of being like, no, you don't say dashboard. I say dashboard. Yeah. I just read this and I think how long would you last with that attitude? Yeah. You know, I know, I know it's meant to be a good one. I would say no, not in here because I've seen this like, no, we're not doing that. It doesn't last a year. Yeah. Yeah. Well, and it gets worse, right?
Starting point is 00:15:29 Because now the more options there are, if this is your mindset, you're like, I'm a data professional, like I will recommend the solution. Then you come up with solutions that are like, you don't even have a dashboard, you only have a report, you need a LLM to talk to about, you know, this thing, you know,
Starting point is 00:15:46 like, it's just going to get worse because the solutions will be even wider. Well, it also reaches a good point there and the idea of, yes, you may know something about how it looks best when you're going to do it, but you need to understand how they're going to be consuming too. Right. So if you're like, I have a better idea, we're going to do this thing over here. Or the other one, which is like someone asks for a report, people decide they're going to make a dashboard. And it's like, no, they live in Excel. That's all they ever look at. You want to put them
Starting point is 00:16:14 in another system. They have to go log into and check. They're never going to use it versus something that just emails them in Excel work like every record or whatever. So there is a give and take to that. There's what's going to work best from the data professional standpoint, but there's also how are they using it? What are they going to use it for? Right. What do you think about the one other thing that this again, I understand it's a hot take on LinkedIn, but it sort of applies this rigid rule to everyone,
Starting point is 00:16:47 but man, it really depends on who's asking. I'm just like, okay, someone who's pretty low on the org chart, and it's like, yeah, if they say, no, we have other priorities. If your boss says they need a dashboard, you make them a dashboard. You can have that discussion, but if CFO says they want a dashboard, right? Exactly. They're making a dashboard. Exactly. Yeah. I was going to say it is so
Starting point is 00:17:19 organization dependent because one, like if there's a really tight prioritization process and everything gets daven, which is not very many companies, but some companies, then it really ends like, then that kind of shifts the power of the technical realm. We get to dictate what happens because we have this very strict method of prioritizing and whatever. We use business council meetings and stuff to set like a strategy for those companies. Like there's a lot more kind of power swing of like IT or technical or data people get to do it the way they want to do it. And the other way, if it's like a really sales
Starting point is 00:17:54 driven organization or marketing driven organization, everything's there and everybody else is there to like serve their purposes so they can hit their goals. Even Ops. Or Ops, yeah. Or Ops. Well, and then you can see people who try to do stuff like everything must be submitted in the format of a user's story as a blank. Right. And it's like, right, I don't want to do that. I can't do that. Right. But it's going to be, now you're bossing complain.
Starting point is 00:18:18 Yeah. But I think both of those, if it's like, you know, essentially like, all right, sales has this audacious, like audacious goal to grow to X number of dollars this year, like everybody else, like better support them. But if you don't support them, you're going to get blamed for them not having a goal. Right? Like that's one culture versus like the opposite. Yeah.
Starting point is 00:18:37 Yeah. Yeah. Yeah. I was just thinking about a user story of as your boss, I want a dash. That's the whole story. Yeah, that's the... There is no dash. I would like a dash. I've got a user story for you.
Starting point is 00:18:55 Adam. Okay, let me ask Rich, do we have time for an AI bonus round? Yes, we have time for an AI bonus round. Okay, this is hilarious. Okay, this is someone quoting a post on X. And I'll read the commentary that they added and then I'll read the post that they quoted. So Matt Novak, we're going to give you a shout out because this is gold, says AI folks have now discovered thinking quote unquote. And the post that he puts is, sometimes in the process
Starting point is 00:19:30 of writing enough prompt, writing a good enough prompt for chat GPT, I end up solving my own problem without even needing to submit. I mean, that was just pure gold. What's the rubber duck? Yeah, that's right. It's your rubber ducking. It's been a good strategy for a long time. You always have little boxes to pipe them. Yes. That's exactly right.
Starting point is 00:19:53 Matt, I love it. Thank you for that wonderful commentary. I'm actually interested to know, have you had that experience either of you writing prompts? No. No. I don't think I've ever typed something out. Oh, like I know what to do. Usually I could, you should have asked me for like something factual or like to summarize something or something. I don't really need it at that point. I understand the general like thought and do you think it's valuable to like have a thought Parker and like and shoot, you know, something back and forth
Starting point is 00:20:26 Hey ask me questions about this. Hey, I'll be clarifying that like I think that's what you yeah But you might as well do it in an interactive format Like I I don't think I've ever put enough effort into like a really large, you know, like initial prompt for that to happen But I mean it happened. That's true. That feels a little bit like I'm going to write my entire for the app and yeah, I'll do it. Right now. I'm just going to throw something in there and then kind of work from it. Right.
Starting point is 00:20:54 It's by nature. Like very iterative. Right. Yeah. Yeah. Super interesting. That could also be a person who tries to figure out the perfect Google search terms for the only actions that never met that person before.
Starting point is 00:21:10 Well, that's like all of the options you have when Googling. I mean, you can build some like pretty robust queries that, I mean, it's so powerful, but almost no one uses them. Yeah. Yeah. Yeah. Like that, like operate your terms. Yeah, totally. Or my favorite like search for PDFs Oh like there's some real totally well unlike like putting boats around like one word I mean, I mean there are all sorts of really helpful things that you can do, you know, we're buying question is 20 how long's Google been around like a premise of like 20 years at least? Yeah. Yeah, but what?
Starting point is 00:21:42 Component of like G'sT or AI is going to be that way where essentially like out in the open you've been able to do this for like 20 years with that and like no one knows it. Yeah. Yeah. For sure. Well, it's also like as they're searched out better handling your natural right. As of doing it became less necessary. Yeah. Yes. Very true. Yeah. And yeah, but they leave the features in there for when you need them. Yeah. If you can even remember them. Yes. Which you can. Which you have to Google with the Bible. You totally do. Google. You Google, you know, Google search. I do that with chat. GDT where I'm like, how best should I ask you to do this? Totally. Yeah. Yeah.
Starting point is 00:22:26 Which feels weird, but is really awful. It seems very logical at the same time. Yeah. Yeah. Or open up a new prompt and be, you gave me this error render. Yeah. Yeah. Actually, I do have an AI question related to the first post about technical data. Do you think that AI will help in those scenarios
Starting point is 00:22:52 you mentioned where there's like all these, you know, there's like an early query or whatever. I mean, it is really good at like iteratively saying you could say like, you know, make this more efficient. Like it's, it is actually very helpful for that. I mean, I know, you know, people probably aren't going to just like write production code and deploy it. Right. But at the same time, like I, I don't know. I mean, do you think it's going to have it? Do you think it's going to be a core way that people write SQL? Well, here we have, we had somebody on the show a couple of weeks ago that essentially they they were giving us like code AI space. Misha. Yeah.
Starting point is 00:23:28 Yeah. Ratchet AI. Yeah. Which I really liked there, how they're thinking about it. That if you can apply it through the SQL stuff too. We're essentially like, if we can be focused on some of the quote tech debt, so we can point this AI at our SQL code base,
Starting point is 00:23:47 for example, and then create pull requests of like optimizations like for humans to review. And they're good enough to where let's say 80% or 70% of the poor gets get merged and bring a little bit like that's value add. Yeah. But if we're like 40% gets merged, then it's like this is going to be a mess, right? Yep. So that was a solution they're working on. That makes sense to me for SQL as well. And then I've seen startups that'll analyze every SQL that runs a nature database, like Optima, for whatever database engine, that makes sense. I think we're racing more of that.
Starting point is 00:24:22 Yeah, for sure. I mean, I can see that you can just, in the editor, where you write something that can kind of be like, yeah, a little real quick. Do you really want to do that? Total. Total, yeah. It's like, well, you know, you're a vincere, you say, select stars from table, limit five,
Starting point is 00:24:40 because there are 130 columns, and that's maybe just one. Right. Yeah. Yeah. All righty. Well, we are at the buzzer. That concludes this month's Cynical Data Guy, Matt.
Starting point is 00:24:53 As always, thank you for joining us and we will catch you next month. Stay cynical. The Data Stack Show is brought to you by Rutter Stack, the warehouse native customer data platform. Rutter Stack is purpose-built to help data teams turn customer data into competitive advantage. Learn more at ruddersack.com.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.