The Data Stack Show - The PRQL: The Data Supply Chain with Chad Sanderson of Gable.ai

Starting point is 00:00:00 Welcome to the Data Stack Show prequel. This is a short bonus episode where we preview the upcoming show. You'll get to meet our guest and hear about the topics we're going to cover. If they're interesting to you, you can catch the full-length show when it drops on Wednesday. We are here with Chad Sanderson. Chad, you have a really long history working in data quality and have actually even founded a company, Gable.ai. So we have so much to talk about, but of course, we want to start at the beginning. Tell us how you got into data in the beginning. Yeah, well, great to be here with you folks. Thanks for having me on again. It's been a while, but I really enjoyed the last conversation.

Starting point is 00:00:52 And in terms of where I got started in data, I've been doing this for a pretty long time. Started as an analyst and working at a very small company in Northern Georgia that produced grow parts, and then ended up working as a data scientist within Oracle. And then from there, I kind of fell in love with the infrastructure side of the house. I felt like building things for other people to use was more validating and rewarding than trying to be a smart scientist myself and ended up doing that at a few big companies. I worked on the data platform team at Sephora and Subway, the AI platform team over at Microsoft. And then most recently, I led data infrastructure for a great tech company called Convoy. That's awesome.

Starting point is 00:01:46 By the way, it's not the first time that we have you here, Chad. So I'm very excited to continue the conversation from where we left and many things happened since then. But one of the things that I really want to talk with you about is the supply chain around data and data infrastructure. There's always a lot of focus, either on the people who are managing the infrastructure or the people who are the downstream consumers, right? Like the people who are the analysts or the data scientists.

Starting point is 00:02:17 But one of the parts in the supply chain that we don't talk that much. It's like going more and more upstream where the data is actually captured, generated, and transferred into the data infrastructure. And apparently many of the issues that we deal with stem from that. There are organizational issues.

Starting point is 00:02:41 We're talking about very different engineering teams involved there with different goals and needs. But at the end, all these people and these systems, they need to work together if we want to have data that we can rely on. So I'd love to get a little bit deeper into that and spend some time together to talk about the importance of this, the issues there, and what we can do to make things better, right? So that's one of the things that I'd love to hear your thoughts on.

Starting point is 00:03:13 What's in your mind, what you would like to tell about? Well, I think that's a great topic, first of all, and it's very timely and topical as teams are, you know, the modern data stack is still, I think, on the tip of everybody's tongue. But it's become a bit of a sour word these days, I think. maybe five to eight years ago, that by adopting the modern data stack, you would be able to get all of this utility and value from data. And I think to some degree that was true. The modern data stack did allow teams to get started with their data implementations very quickly, to move off of their old legacy infrastructure very quickly, to get a dashboard spun up fast to answer some questions about their product. But maintaining the system over time became challenging. And that's where the phrase that you used, which is data supply chain, comes into play. This idea that

Starting point is 00:04:18 data is not just a pipeline, it's also people. And it's people focusing on different aspects of the data. An application developer who is emitting events to a transactional database is using data for one thing. A data engineering team that is extracting that data and potentially transforming it into some core table in the warehouse is using it for something different. A front end engineer who is using, you know, rudder stack to emit events is doing something totally different. An analyst is doing something totally different. And yet all of these people are fundamentally interconnected with each other. And that is a supply chain. And this is very different, I think, to the way that software engineers on the application side think about their work. In fact, they try to become as modular and as decoupled from the rest of the organization as possible so that they can move

Starting point is 00:05:18 faster. Whereas in the data world, if you take this supply chain view, decoupling is actually impossible. It's just not actually feasible to do because we're so reliant on transformations by other people within the company. And if you start looking at the pipeline as more of a supply chain, then you can begin to make comparisons to other supply chains in the real world and see where they put their focus. So as a very quick example, McDonald's is obviously a massive supply chain, and they've spent billions of dollars in the producers, not the consumers. Meaning if you're a manufacturer of the beef patties that are used in their sandwiches, you are the one that's doing quality at the sort of patty creation layer.

Starting point is 00:06:15 It's not the responsibility of the individual retailers and the stores that are putting the patties on the buns to individually inspect every patty for quality. You can imagine the type of cost and inefficiency issues that would lead to when the focus is speed. And so the patty suppliers and the stores and McDonald's corporate have to be in a really tight feedback loop with each other, communicating about compliance and regulations and governance and quality so that the end retailer doesn't have to sort of worry about a lot of these capacity, about a lot of these issues. And the last thing I'll say about McDonald's, because I think it's such a fascinating use case, is that the suppliers actually track on their own how the patty needs,

Starting point is 00:07:05 like the volume requirements for each individual store. So when those numbers get low, they can automatically push more patties to each store when it's needed. So it's a very different way of doing things, having these tight feedback loops, versus the way that I think most data teams operate today. Yeah, yeah, makes sense. Okay, I think most data teams operate today. Yeah. Yeah. Makes sense. Okay.

Starting point is 00:07:25 I think we have like a lot to talk about. Eric, what do you think? Let's do it. Let's do it. All right. That's a wrap for the prequel. The full length episode will drop Wednesday morning. Subscribe now so you don't miss it.

Pet Camera - EBO Air 2

The Data Stack Show - The PRQL: The Data Supply Chain with Chad Sanderson of Gable.ai

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.

Your Ad Here

Pet Camera - EBO Air 2

The Data Stack Show - The PRQL: The Data Supply Chain with Chad Sanderson of Gable.ai

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.