The Data Stack Show - 168: Decoding Data Mesh: Principles, Practices, and Real-World Applications Featuring Paolo Platter, Zhamak Dehghani, and Melissa Logan

Episode Date: December 13, 2023

Highlights from this week’s conversation include:Defining data mesh (6:37)Addressing the scale of organizational complexity and usage (9:04)The shift from monolithic to microservices (12:24)The soci...ological structure in data mesh (13:59)Data product generation and sharing in data mesh (17:27)Data Mesh: Simplifying Data Work (24:09)Getting Started with Data Mesh (29:14)Building products for Data Mesh (36:42)Building a customizable and extensible platform to shape data practice (39:28)The characteristics of a data product (48:40)Defining what a data product is not (50:45)The origin of the term "mesh" in data mesh (53:32)The Data Stack Show is a weekly podcast powered by RudderStack, the CDP for developers. Each week we’ll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data.RudderStack helps businesses make the most out of their customer data while ensuring data privacy and security. To learn more about RudderStack visit rudderstack.com.

Transcript
Discussion (0)
Starting point is 00:00:00 Welcome to the Data Stack Show. Each week we explore the world of data by talking to the people shaping its future. You'll learn about new data technology and trends and how data teams and processes are run at top companies. The Data Stack Show is brought to you by Rudderstack, the CDP for developers. You can learn more at rudderstack.com. Costas, we love covering topics that we have not covered on the show before. We've done over 150 shows at this point. And it's kind of crazy that we haven't covered the topic of data mesh. But at Coalesce, DBT's conference, they announced a data mesh product.
Starting point is 00:00:52 And so we work with Brooks to get literally the author of the data mesh book on the show so that we can get the straight line story on what data mesh is, which is great. I mean, this is a topic that a lot of people are talking about. There's certainly a lot of conversation around it. And I think what we need to do, at least what I'm going to try and do, is just put a sharp definition on it. Data mesh means a lot of things to a lot of people. And so if we have the author of the book, the person who coined the term, we can sort of level set on what data mesh means.
Starting point is 00:01:24 That's what I'm going to do. But I'm sure you're going to have technical questions because data mesh is fundamentally technical. So what are you going to ask about? Yeah, I want to ask about like the products that support the data mesh. Data mesh has like similar, let's say, movements in like how we change, let's say change the way that we are thinking and operating in the business environment. It has many parts, right? It has the people part and also the technology part.
Starting point is 00:01:56 I think there has been a lot of focus on the people part and the change of culture and the way that organizations need to change and all these things for the data mesh to be implemented and deliver its value. But we have also the right people today to talk about products, like what kind of products someone could use that either support or enable the implementation of a data mesh. And that's what I would like to focus. And also, probably, if we have the time, I'd like to focus. And also probably if we have the time, like to focus a little bit more also on some terminology, like what's a data product, for example, and how do you implement a data product and these kinds of things that are like very fundamental
Starting point is 00:02:35 as part of like the data mess as an architecture. So we have the right people to understand data mess from both the organizational, the people aspect, and also the technology aspect. So I'm very excited. Let's go and chat with them. Let's do it. Welcome back to the Data Sack Show. What an exciting episode we have because we're going to dig into a topic that is really important
Starting point is 00:03:00 in the data industry, but that we really haven't covered in depth on the show yet. But we're going to solve that today. We're going to talk about data mesh, and we have an amazing lineup of guests here. So let's see. Zemach, why don't you start with an intro and a quick background, and then we'll go down the line. All right. I'm Zemach. I'm the of Data Mesh, founder and CEO of NextData, which is a Data Mesh technology startup. Excited to be here. Great. Thank you so much. Paola, how about you? Hi, it's everybody. I'm Paolo Plattler, CTO and co-founder of Agile Lab and in the Data Mesh story since the beginning. Wonderful.
Starting point is 00:03:46 And Melissa. Hi, everyone. I'm Melissa Logan. I'm the director of the data mesh learning community. Very excited to be here. Wonderful. Well, Melissa, why don't we start with you? Because data mesh, I think in some ways is a big topic, but in many other ways is a simple
Starting point is 00:04:03 topic in terms of what we're trying to accomplish. Can you tell us about the community? Because I think for our listeners, for anyone who wants to learn more about any of the topics we cover, the community is going to be the go-to place. Can you tell us about the community and how to engage just so our listeners have sort of orientation on where to go next if they have questions. Yeah, absolutely. Happy to. The data mesh learning community is a group of over 8,000 data pros who are somewhere in their data mesh journey, whether they're just getting started or if they've been at
Starting point is 00:04:37 this for a number of years. We have some of the pioneers of data mesh who are in the community answering questions and sharing their insights. Our mission for the community is to share resources, increase awareness for Data Mesh, help people understand how to get started. We have a website at datameshlearning.com. We've got a bunch of use cases on there, case studies, articles, podcasts. There is a Slack channel where we have conversations about sharing insights about data mesh. People ask all kinds of questions on there, and we have so many great folks who are willing to share their experiences.
Starting point is 00:05:19 We host a range of different virtual events. We have, I think they're even happening weekly right now. It's at least monthly, but there's quite a few topics we have on the list, a range of different topics. We held our first in-person event at Big Data London very recently in September. We shared the results from our very first community survey about getting buy-in for data mesh. There we had some great presenters. We also host virtual half-day events called data mesh days. We had one focused on the life sciences vertical earlier this year. We have one focused on the financial services vertical in Q1 of 2024. So quite a lot of different resources for the community. The survey we recently ran about buy-in, we have a white paper coming out about that soon. So truly all resources buy-in for the community. We really exist to help people as they go on their data mesh journey. Awesome. Well, if you have any questions, definitely check out the learning community and Melissa and team will be there to support. I'd love to actually start, you know, in the show,
Starting point is 00:06:31 we love to sort of get down to the root of things and define what data mesh means. Jamak, would you help us define what data mesh means? I think a lot of people have a lot of different ideas of what it is, what it could be. And I think a lot of people have a lot of different ideas of what it is, what it could be. And I think a lot of people sort of interpret it in their own context, but give us a level set. What is data mesh? Sure. You know, there are two hard problems in technology. One is naming things and the second one is defining them. So I'll have a go at it. I included the definition in the book, and I'm just going to say it word by word, perhaps, and then we can dive into what's really behind those words. So data mesh is a decentralized, socio-technical approach in managing, accessing, sharing data at scale for analytical and ML workloads.
Starting point is 00:07:28 There are a few things to unpack, perhaps, in that definition that might be worth just double-clicking into. One is that I coined this term and I called it socio-technical know for a shift to happen in how we share data how we discover use produce data we not only need to change in a decentralized way in a distributed way we not only need to change technology but we also need to change behavior relate a relationship with data relationship among teams so hence it's a both's a paradigm. It's an approach that includes technological changes as well as, I guess, the social changes in organizations. There is a word there at scale.
Starting point is 00:08:14 So that's, I think it's really key that we know that, you know, data warehouse, maybe data lake, the approaches we've had so far really address the problem of technical scale. You know, over the course of the last 10, 15 years, we have addressed the scale of volume of the data. We distribute the disk and distribute the storage, parallel processing of the data. We've addressed the scale of the velocity of the data with the streaming backbones. We've addressed the, you know, scale of diversity of the data with all sorts of vector databases to time series and the others. What we haven't addressed is really the scale of organizational complexity and scale of
Starting point is 00:08:55 usage and use cases and the scale of diversity of the sources of the data that really put a pressure point on this centralized approach in managing data. So it's really, we want to get to the point that the data sharing, the same way that application API sharing can happen at a global scale, DataMachines tries to push the data sharing method for analytics in a way that can scale out to the global network of analytical data sharing. And the last point is, with operational transactional databases, the data sharing problem largely is solved through transactional APIs
Starting point is 00:09:37 that services or microservices expose and enable applications, share changes to the data or the current state of the data that is suitable for transactional applications. What we hadn't solved was really data sharing at scale where you need to train machine learning model across multiple dimensions of data from many sources or running statistical or know, kind of statistical or other kind of analytics over a large volume of data and correlating. So that adds an interesting twist in how data can be managed at scale. And that's why all those kind of components existing in the definition of the data mesh and of course, behind that, then following that definition, there are a set of first
Starting point is 00:10:23 principles and then, you know, a set of technologies to support them. And we can get into that later. As you think about data mesh as a manifestation of socio-technological shift, as you look back over technology over the last several decades, is there another major socio-technological shift that you would say was industry-defining in the way that you believe data mesh is also potentially industry defining? Yeah, absolutely. And in fact, I took inspirations from it. I wish I could claim that I was this creative person that made this all up out of thin air. But it was really the observation of how we went from monolithic, single stack, big tech application development to microservices, domain-oriented, to pizza teams, only integrating applications and solutions through API. So I think that the migration or transformation in digital organizations from monolithic,
Starting point is 00:11:40 big IT team application development to microservices domain-oriented and smaller domain-oriented team services and API-oriented capability development. I think that's absolutely a parallel scenario. And in fact, my hope was that Data Mesh can piggyback on that transformation that's already happened in a lot of digital native organizations and follow that trend and continue with data. So we kind of had these DevOps teams. We want to get to these DevOps data teams and follow that trend. Let's pull on that thread a little bit. So if we think about sort of the monolithic IT team, right? And let's, you know, of course, we're generalizing here. So no offense to anyone who, you know, their job title is DBA, but let's just talk about a DBA as a job title, right? You sort of have, you of have this concept of a gatekeeper who
Starting point is 00:12:45 owns sort of a monolithic system. And a gatekeeper, not in terms of they don't want to help people, but just the technology means that there's sort of maybe a single pointer of a couple points of ingress and egress. And so it tends to bottleneck when people need data around the company, right? And so that's sort of the monolithic state. And then you move to microservices. And so then maybe you have a team who's responsible for building an API that delivers a team a certain type of data. And so let's just say a couple of examples could be maybe like sort of a core KPI dashboard for executives would be one example. It maybe could be some sort of marketing performance metrics for a CMO.
Starting point is 00:13:28 And there are various flavors of this, right? But you have a team that sort of wraps around that. They could be a data team. It could be an ops team. Those lines are blurred. So that's sort of the first like socio-technological shift. Can you walk us down that path further? Okay, so now we have a team and you choose the example. It could be the BI dashboard for the exec, performance dashboard for the CMO, but walk us further down the path to data mesh. What does the sociological aspect of that look like from a team perspective when we go from monolith to microservices and dedicated team to data mesh? What does the sociological structure look like
Starting point is 00:14:12 inside the org? Wonderful question. So yeah, so let's imagine we are at the point in time that an organization has gone through that initial transformation of domain-oriented teams, and you have an organization that has, let's say you're a retailer and you have various teams, you have an e-commerce team, takes care of e-commerce app. I'm sorry, I'm diverting from your exact example. No, no, this is totally, this is great. Yeah. So you have an e-commerce team, it's taking care of a bunch of e-commerce applications and services. And, you know, it has the transactional database for that application that captures basically all the events that happen on top of the, you know, as the user interfaces with the digital channel. You might have a logistics and routing team that is job
Starting point is 00:14:58 is optimizing how the items get across the different warehouses, right? Or from shipping from the warehouse to the store. And so when you have a sales team, right, taking care of the actual sales transactions or credit card process, so you've got all these domain-oriented teams. They have their own services. They are providing essentially application-oriented or transaction-oriented APIs to the rest of the organization.
Starting point is 00:15:21 But when it comes to the data for analytics and ML, that's where things actually don't quite look the same as the rest of the organization. So what has happened is at this point in time, let's say pre-date mesh, all of those teams, let's say they're actually quite advanced, modern in their data stack.
Starting point is 00:15:41 What they do is that they basically, those teams, e-commerce, finance, logistics, they provide or somehow externalize their data. Let's say they're pretty modern as domain events and some sort of streaming backbone that lands that data into the warehouse. And in the middle between those teams and then the rest of the organization that wants to do analytics and ML sits a data team that's often ingesting those events and then try to model them or semi-structure them and put them into a warehouse and lake and a few other places. And then, you know, define another team sitting there.
Starting point is 00:16:16 The data team tried to define the actual semantic on top of it, provide condensates and access control. There is a governance team is probably sitting at the corner defining some sort of a taxonomy over this data. So they see the whole machinery sitting in the middle, try to turn those domains, events that came from upstream that they have no visibility to get their head around it, arm around it, and then store it in a way that is suitable for analytics. And then you have sometimes business domains on the other side of this pipe that, or centralized data scientist teams that are being kind of borrowed by those domains to be able to generate
Starting point is 00:16:56 value from that data, whether that value is as simple as some dashboards that help the CFO make some decisions, so CMO make decisions, or the more sophisticated turn into machine learning models that then get embedded into those applications, right, to make recommendations. But nonetheless, you've got this centralized team and a centralized set of responsibilities to make meaning of the upstream data that they have no control over, structuring it, making it available in a way that is suitable for those type of workloads that we just described, right? So when data mesh happens incrementally, that middleman goes away. The responsibility of sharing data in a way that can be discovered, understood, trusted, and accessed,
Starting point is 00:17:40 and used by dashboards and by machine learning training models and pipe nines by analysts and data scientists is shared in a way right from the source or maybe some newly team domain oriented teams that are formed in a way that in a peer-to-peer fashion you can now have consumers of analytical consumers of the data talking to the producers directly, and they're sharing data through this concept of a data product that we can get into the definition of it later. And there is no middleman to ingest somebody else's data, produce data for somebody else who is not part of a particular business domain. So that leads to existing business domains, taking new responsibilities around data product generation and sharing. That leads to perhaps creating new domains that didn't exist before,
Starting point is 00:18:41 that the job is purely providing data products for those newly formed domains. Let's say the recommendation domain might be just the data product domain because they're just providing recommendation data. But here you don't have the centralized data team that is under immense amounts of pressure consuming data that they have no control over and providing data that they don't really understand the use case for. And they're just mechanical kind of tors in the middle. They're trying to kind of
Starting point is 00:19:06 just move data on and, you know, then without really having the knowledge. And that's really, really tall ask to ask any centralized team to do. And I think that's where complexity of the organizations
Starting point is 00:19:19 reach a pivot point or a tipping point that model fails to perform and you've got to kind of stop making the shift. Yeah, what a great... Can I add one thing? Because you mentioned a really important topic, so the gatekeeper. So in the old paradigm, obviously, there were multiple gatekeepers. In the decentralization process, if we want to make these domains really autonomous, it's super important to introduce self-service capabilities. So all the duties that were managed by the gatekeeper in the decentralization process must be provided as a service.
Starting point is 00:20:08 So in this picture, we need to insert a platform that is providing service capabilities. And you mentioned also about previous transformation where the data mesh has been inspired, I guess also platform engineering practice is something that is present in the data mesh concept. So the concept of having a platform team that is taking care of providing services to all the other teams is crucial for the data mesh. This is coming from platform engineering and team topologies that is defining the different kind of interaction among teams. So who is value stream aligned and who is providing collaboration as a service or full
Starting point is 00:21:06 service. Paolo, let's dig into that a little bit because one thing, and I'm sure that our listeners have varying thoughts on this, but I think one of the questions, Jamak, as you talked about that, I think one of the big questions that comes to people's mind is, I'm just going to use the term data literacy. One of the roles of a central team, even though, of course, you described the challenges with the bottleneck, is that they serve as somewhat of a data literacy translator, right? And so, and Paolo or Jamak, help us understand, are there different requirements around data literacy when you move to a mesh model? And specifically what I mean by that is we're sort of, there's a very strong sense of democratization. But at the same time, anyone who's working in an enterprise that's a downstream consumer of data knows very clearly that it can be pretty hard to become
Starting point is 00:22:16 data literate when you try to go upstream, simply because that's not your area of expertise. And so can you explain this sort of the, if someone wants to move to a more data mesh type team structure and technological architecture, what does that look like from a data literacy standpoint? And is that the sociological aspect of what we're talking about? Yeah, I think I can have a stab at it and probably jump in. I think when you have a stab at it and probably jump in.
Starting point is 00:22:46 I think when you say data literacy, three things occur for me. And I think there are very different kind of aspects of data literacy. One is data infrastructure literacy, like the things the data engineers know. In fact, the engineers are very good at, you know, knowing that data platform
Starting point is 00:23:03 and tooling that is available to them to do, you know, ingestion and processing and cleansing and all of that. The other aspect is actually understanding the domain data. say you're a pharmaceutical company and you're doing drug research, understanding what is disease, what are different kinds of diseases and how you actually discover medicine for these kinds of diseases, what's considered a clinical research, understanding the domains of data, that knowledge and literacy exists in the organization and in fact exists in the domains of business.
Starting point is 00:23:43 And the third one is how the data can be used to in all possible scenarios from, you know, generative AI to all, you know, good old ML or statistical model to, you know, dashboards and various kind of analytical usage. So these are three, I think, classes of perhaps aspects of literacy that we can look at. And then if you think about like how data mesh relates to that, data mesh, as Fela mentioned, tries to remove the requirement for having three PhDs in data engineering and data infrastructure
Starting point is 00:24:22 before you can work and share with your data. So that's where the concept of self-serve platform or a new set of tools that remove the complexity of infrastructure out of the way from the cognitive load of a domain data person so that they can just work on, you know, kind of the data work that is related to that domain, right? Discovering medicine, looking at that variety of genes and variety of clinical trials with the tools that are very just suited for doing that data work. The tools are not necessarily tools of moving the data around or storing data on a large
Starting point is 00:24:59 scale. That should be, that's the level of complexity of literacy has to be pushed down to the platform team. And they take care of that and provide a, you know, kind of a nicer developer experience to the data folks in the domain. I think that what DataMesh tries to do is actually embrace the fact that people that are closest to the data in domain are best suited to be responsible for that data. Because they know about drug discovery better than anybody else. They know about what constitutes disease better than any other data engineer. So that literacy remains in the domain and it is embraced by data mission. And then the last part of it is, you know, the range of kind of analytical usage
Starting point is 00:25:45 and application of data from sophisticated machine learning to maybe more basic statistical analysis. Again, data mesh tries to, and I think I've been actually orthogonal to data mesh that has traditionally been in the domain because for you to do that sort of work, you really need to understand the business domain. And again, DataMesh embraces that and it doesn't impact me. I think it's an orthogonal concern. But again, it's more aligned with keeping people that are getting usage
Starting point is 00:26:18 out of the data and understand how to use the data within the domain as close as to the business to be able to innovate as fast as the business, right? As fast as the market goes, not as fast as the centralized data team goes. I hope that that helps demystify that kind of data literacy debate, but I'm curious what Paolo thinks. Paolo Zanetti No, you're super right. Anyway, I think that some kind of shift in terms of data literacies or competencies is needed. It's a different paradigm than a decentralized one. So some skill and competence must be moved towards the domain and embedded into the domain. These are obviously depending of the level of automation and abstraction that you are able to obtain.
Starting point is 00:27:11 It could be different. In some cases, it's a huge shift. In some cases, it could be less. Obviously, every company should try to minimize the shift of these competencies because otherwise it's becoming a huge transformation. And anyway, I usually say data mesh is a huge transformation. We should not minimize this because it's a journey like it has been the digital transformation 20 years ago.
Starting point is 00:27:45 It's a journey. it's a discovery journey. It's not only a matter of skills, it's also a matter of mindset and culture regarding on how do we manage and trade the data. All right. So I have a question about both the technical side of things when it comes to the data. All right. So I have a question about like both the technical side of things when it comes to like to do the data mesh. And I'd like to start with that and then like talk a little bit more about like the people side, right?
Starting point is 00:28:15 Because it is clear, I think that you can't have like a data mesh only with one. You need both of them there. And my first question, and I'd love to hear an answer to that from all three of you, is what should come first? Like in this transition, is the technology that needs to come into and become the spark of
Starting point is 00:28:43 starting the change? or people need to change first and then introduce the technology. So, and Melissa, I'd like to start from you, actually. What do you think? Because you're coming like from the, let's say, more of like the pure people side of things because of the community there. And then I'll finish with Paolo just because he's the engineer here. So hopefully he's going to talk about the machines and that machine rule and all that
Starting point is 00:29:11 stuff. But let's start with you first. Yeah, no, it's a good question. And it's something that we get asked quite a bit, which is where do you start? How do you kind of get started with data mesh? And in fact, we just ran a survey in the community to ask these types of questions to say, okay, you all have been there, done that. Where did you start? What would you recommend for people who are kind of starting this process?
Starting point is 00:29:35 And there's kind of two parts of it. So one, there are four pillars of data mesh. If you haven't read the book, it explains all the pillars of data mesh. And what folks say is don't start with all of them all at once. Understand it, really try to dig in and see how this is going to work for your organization. But in the survey, it was resoundingly clear that people said do incremental adoption in stages, don't do some kind of big bang thing and try to do it all at once and start with like proof of concept in a specific area. So essentially start small and grow with data meshes, what the recommendation was from community members and of the four pillars, the, what folks recommended or what they typically start with is data as a product followed then by domain ownership and then the rest. But those are the, that's where the entry point for a lot of people is. It's part technical, part social. It's a mix of all of the socio-technical bits,
Starting point is 00:30:33 but that's the typical recommendation from the community members. Okay. And let's move to Jean-Marc. I'm going to answer without answering your question. Like what Melissa said, start with moving. move to Jean-Marc? I'm going to answer without answering your question. Start, like what Melissa said, start with moving and then do both
Starting point is 00:30:51 technology and people side at the same time. There is a notion of, I think there is a notion of movement-based change which was defined or introduced by folks at IDEO which actually looks at historical kind of social revolutions that have movement actually creates change at scale.
Starting point is 00:31:11 As Melissa said, I think starting with kind of understanding that first of all, your ability to do data, are you ready? Do a readiness assessment and really understand, is this the right thing for you at this moment in time? Given the maturity of the technology, given the maturity of understanding the industry as a whole, you're still in the early days.
Starting point is 00:31:37 So once you do that assessment, I think start moving by finding a particular domain or a particular area that could move toward this pattern and, you know, making shifts in the people side, on the social side, as in have the actual domain people engaged in, even if it's just producing simple data products in the conversation and empower them and provide the tooling at the same time. Of course, you can't just start telling people to change behavior and do something different
Starting point is 00:32:15 without having the tools, tools reshape behavior. And we can't just throw technology at people without having their incentives aligned and having them part of the conversation. So it's really hard to say, do this first, then the other. It's really both at the same time. I've been part of many transformations, or at least a few that I can talk about, that really started with not considering domains first. So in some ways, I understand kind of survey that Melissa did
Starting point is 00:32:46 and the point of view that people have in, you know, the domain ownership is the hard part and maybe don't start there just yet, but I've been part of many transformations that didn't include the domains. It came from centralized data team trying to think about a platform that they can throw later at the domains. And guess what? It just stayed within the data team. You have a sophisticated data mesh run by the? It just stayed within the data team.
Starting point is 00:33:06 You have a sophisticated data mesh run by the data team, done by the data team. So we finally didn't change anything. And on the other hand, you engage domains. Like let's say you engage analysts or folks in the domains, but you have no technology to actually support them to generate data products. So it's kind of both needed, but we've got to start moving at an iterate over both angles.
Starting point is 00:33:30 Yeah, but makes a lot of sense. And okay, Paolo, you're my last chance here to say that it's all about the computers. Like, tell me. No, I will not go. I will not talk about technology. So the data mesh adoption is a data strategy. And it must be part of a data strategy. So like all the data strategy, you must start defining your goals, assessing your, your SEs and the risk that you, that you are, that you are planning in adopting the data mesh, because data mesh anyway, is a practice and has all the practice has pros and cons, and they must be evaluated. After that, you go in the data strategy planning. So you need to take care of people, processes, planning, budget,
Starting point is 00:34:40 take care of adoption since the beginning. And finally, you can jump into the architecture and when I talk about architecture I'm not talking about specific technologies so I'm talking about standards how do we decouple layers capability technology and how do we plan to create a platform that is enabling all the principles of the data mesh? So how do we boost the cross-functionality among technical people and business people? So it must be a platform that is capable to onboard all the different skills and level and background that we have in the company. Otherwise, adoption will never grow. Then after that, you can start to implement something. But it must be taken really carefully.
Starting point is 00:35:39 Okay, that makes total sense. And okay. If you are, let's say someone who sees like the data mess as an opportunity to go and build a company around that. Right. And okay. And we've had like both you, Paolo and like Jean-Marc here, like trying to build products right around like data mess. How you can do that? I mean, by definition,
Starting point is 00:36:07 building a product requires something that has value, obviously, but it's also repeatable. You don't go and implement it differently everywhere because that ends up being a service, not a product. So how do you productize the data mesh? From the point of view of a vendor, right?
Starting point is 00:36:30 Like now I'm not talking about like the user who wants to go and implement it, which is fine. It's great. But as we've seen like in other, let's say like similar, like let's say cases, like we had had Agile. Again, it started more as a way to structure your work and go and create deliverables and create value.
Starting point is 00:36:55 But there's plenty of tools that were built at the end to support the implementation of Agile. Of course, you would never have Agile just by having the tools. You needed the people to go and be Agile. And correct if i'm wrong obviously it's not the same thing but there are like some similarities in the socio-technical side of things there right how you need like both there so please tell me a little bit more about that like how do you build your products right and give also you also ideas out there to people who are builders
Starting point is 00:37:28 and would love to go and build something. Do you want us to share a secret sauce? Is that what you're asking? The short answer is you don't. There is no such a thing as database product or database in a box. I know that a lot of vendors, and rightly so, and as a new paradigm comes, as vendors think about, okay, how am I going to enable these paradigms? And data mesh became a feature or an addition of an existing or a new product line of an existing. So as you said, I think just like Agile,
Starting point is 00:38:06 there is no Agile in a box or Agile as a product, and there is no data mesh as a product. I think every vendor has to choose their own battles and think what is relevant to them and what problem, what angle of enablement or removing friction or removing bottlenecks for getting to data mesh they choose to fight for and they choose to remove, right? Who are the users they're trying to enable?
Starting point is 00:38:37 What shifts they're trying to create? What is the before and after would look like given their products? So I think Paula and I both have probably our own perspective as what we want to do and who we want to enable and how we want to enable the movement towards data mesh. But I don't think neither of us building a data mesh product. So that's a funny way of kind of framing it.
Starting point is 00:39:03 Yeah, yeah, 100%, 100%. And I totally get that. But yeah, let's talk about what you're building, right? Just to reverse, let's say, engineer. Let's see how a tool looks like that promotes and supports the concept of the data mesh instead of creating the opposite, right? Like hurting, like making it harder for the data mesh to be implemented. If you want, we can make another analogy.
Starting point is 00:39:31 So maybe with DevOps and GitOps, that they are practice as well. So you can't buy DevOps or GitOps, but anyway, GitLab or GitHub are products that are enabling the creation of such practice in a company. So, for example, my vision is to build a platform that is customizable, extensible, and is helping companies to define their standards, their best practice, and to shape their data practice. In the specific, it's not only data mesh, it can be whatever data practice you want to adopt, but basically it's helping you to shape, define it, not just providing guidelines that maybe someone will follow because, you know, even if you think about branching model, it's very easy, but
Starting point is 00:40:34 nobody is going to follow it if you don't enforce in some way. So the platform that we are building is helping companies to shape their practice with a better time to market without reinventing the wheel every time. And it's creating a place where different data practitioners can exchange value, like it happens in GitLab, for example. You have who is taking care of the pipeline, who is taking care of the code, who is writing the issues. So different personas have a single place where they can interact, exchange value,
Starting point is 00:41:21 and co-create value that in the end is the artifact, the product that does the vision. Okay. That makes a lot of sense. So, Melissa, I'll go back to you because, you know, for some reason in my mind, you're like the voice of the people out there. That's why I keep asking you these things. So, from your experience,
Starting point is 00:41:45 like with, okay, all these like thousands of people like in the data mess community, right? Like, do you see people associating, let's say like the data mess with specific like vendors or like technologies out there? Or they tend like to focus primarily or only, right?
Starting point is 00:42:01 Like in more of like the organizational side of things, like and the strategy and the design. That's a good question. I'm not sure I can quantify that. We've never done, I hear people talk about some of the things that they use to implement data mesh, but a lot of the conversations, they're in fact less, a lot less about technology in the community it's all about how did you get buy-in or how did you think about federated governance or things like this we do get asked sometimes what vendors can i turn to what consultancies can i turn to and we plan to add a landscape page to our website that kind of showcases you know here are some of the people that you can turn to but we haven't done it yet so stay tuned because that will be coming okay that's awesome i'd love to see that and going back to you john like do you want to feel
Starting point is 00:42:53 like to share a little bit more about like the product that you built not the secret sauce because if you make a mistake here and say anything about the secret sauce i'll have to be removed i'm telling you okay well we'll share our secret sauce we will share a cue for a little bit more about our success but yeah i think we started really when i went deeper into what are the inhibitors from making data mission reality right gardner makes all sorts of predictions that data mesh will be defeated and will be crossed out in the next five years. And I really took that to heart as in what is going to stop us from making this shift that has resonated with industry at wide and everybody raised their hands and said, we want data mesh, please.
Starting point is 00:43:43 The current model doesn't work. And what's going to stop that happening? And what is going to stop it is the lack of ability for domain practitioners, right? These are data practitioners or data hackers that are just like, you know, they're discovering bugs or they're calculating ROIs, but they need to package that as a product and share it with the rest of the organization and be incentivized to do that, that lack of empowerment of those folks
Starting point is 00:44:11 is going to make data fail because it's going to be limited to a centralized data engineering, data platform team. So the product we are building is we had to work a little bit ground up. First of all, we had to kind of codify this concept of a data product. We had to abstract away a lot of complexity that goes beyond what constitutes a data product.
Starting point is 00:44:34 Because for us, data product is a lot of things that are encapsulated as one. Data model, data contracts, data transformation, data storage, all of that. And once that's abstracted away in a build time and a runtime concept, then we define a developer experience for that peer-to-peer data product sharing that is designed for data workers, not necessarily data engineers. I think data engineers have a lot of tools that are serving them today. So that's what we're working on. Hopefully, then we can nudge the needle toward that distribution of ownership to people that know the data, work with data, but they're not necessarily data engineers. And we can then confidently remove,
Starting point is 00:45:22 going back to the beginning of this conversation, confidently remove the gatekeepers that they have the best intention in their heart that I don't want people mess up, you know, data availability or data integrity. So you have this guided kind of developer experience. So, you know, they can safely share data products or safely discover and use data products. So that's kind of what we're working on. This is a big, hairy problem. And as a product company, I'm sure Paola knows this, like be able to get your arms around a product that is feasible to build and can achieve this. It's a difficult problem in itself, right? And it's a new category that doesn't exist.
Starting point is 00:46:06 It's not that Paula and I can look at other, you know, products and say, we're just building that a little bit better. We're creating almost a new category for this. So we have, yeah, we have quite a lot of challenges to get these products off the ground and make an impact, but there's plenty of opportunity for innovation. I know you posed the question as other makers that might be inspired to do something.
Starting point is 00:46:30 There's plenty of opportunity here to build enablers. Yeah, yeah, 100%. Okay, and let's talk a little bit more about the concept of a data product now. You mentioned that as part of describing what you are working on. But what is a data product?
Starting point is 00:46:53 And the reason I'm asking is because it's one of the first pillars out there. Melissa also talked about that. And I think it's probably, let's say, one of the things that we can... When we're talking about data, it's the first thing that like people will focus on right naturally so what is a data product and why it's different than compared to let's say
Starting point is 00:47:15 depending on where you come from like from how it is different from a dashboard or from a mail model or like a table on a database or a file on my hard drive, right? What makes the data product, like in the context of data? Yeah, that's a great question. I think it started and I'd love to hear Paolo's kind of reflection on that. So when I wrote the definition in the book, again, I started first conceptually as, what is, if you were going to share data as a product,
Starting point is 00:47:52 what would it look like? Like a successful product. Start from first principle of a definition of a successful product, which is something that is usable by the users. They love it. It's feasible to build. It's valuable to the user.
Starting point is 00:48:08 So when you start from those first principles and then work backwards, I arrive and you look at the users that you have. These are analysts. These are scientists, right? Then you work backward. I define data as a product, essentially as the units of exchange of value
Starting point is 00:48:24 between the producer and consumer with a set of, I think, eight characteristics that can be acronymed as DAF units. It's a unit of exchange of value in terms of data that is discoverable on its own autonomously. It's addressable. It is valuable. It is secure. It's natively accessible. The N in it is important as data scientists, data analysts, no matter how you want to access it, you can access the same thing and you use
Starting point is 00:48:50 it in the automated way. So it's trustworthy. It's like eight characteristics. And then when you just peel the stack, peel the onion a bit further and say, okay, if you want to build something that has all of these characteristics, it's just like a product, can be shipped to the users, which are a wide spectrum of users, what is in it? Like, what are the bits and bytes that are in it? Is it a file? Is it a file with metadata? And I think that's where the diversion of opinion and execution exists within the industry. There is no standard.
Starting point is 00:49:23 So our definition of a data product is kind of similar to the implementation, I think, that I put the architecture around and in the book, which is the smallest unit of your architecture that structurally has everything that is needed to make the data accessible, natively accessible, usable, and so on,
Starting point is 00:49:41 which means basically the pipeline of code that is generating the data, as well as the data, as well as this metadata, as well as APIs that control the data, the policies that control it, APIs that get access to it, APIs that discover it. So for us, it's a lot more than just the bits and bytes. It's the bits and bytes, as well as ways of getting to it. It's the metadata that lets people discover and understand it. It's a compute that's going to keep making this data possible
Starting point is 00:50:12 and then APIs to get data in and out of it. It's more than views with metadata. It's more than catalog entries of metadata. And unless we have a reference model of this, I think that will at some point maybe get mass adoption. We're at this point that the actual implementation look very different from people to people. And I'm curious kind of what's Paolo's position
Starting point is 00:50:39 on data product and its technical implementation at this point in time. Yeah, your definition is perfect, obviously. So I will tell you what a data product is not. So I always start from this because it's removing interpretation. So for me, a data product is not a table, is not a dashboard, is not a monolithic system, is not just a set of API, is not an operational system, is not a logical or physical portion of a data warehouse or a data lake, because we need to change the practice behind that.
Starting point is 00:51:29 It's not a logical view on top of some pre-existing data and it's not a bronze layer or something like that. So all the things we are used to think about as the term data product was already present, unfortunately, in the data management space. And data product, data asset was used to identify whatever kind of data at rest out there. So data as a product instead is a totally different thing. And also, I would say we can also see the data product as a composition of layers. So infrastructure, data and metadata, and code in terms of deliverable that must be included in a data product. Okay.
Starting point is 00:52:29 I mean, we probably need a couple of hours to go through the data product concept on its own. So hopefully we'll be able to do that in another episode in the future because we are close to the end here on this one. So Eric, the microphone is yours again. Yeah. Jamak, I think the question is for you. Mesh is an interesting sort of analogy for how to describe this. Did you consider any other terms or terminology when you were thinking about sort of this concept? I'd just love to know, like, where did the term mesh come from? I mean, it sort of makes logical sense when you look at it now, but, you know, the names of things,
Starting point is 00:53:20 the etymology and sort of where it comes from is often a different story. So just genuinely curious, what other terminology did you explore as part of this sort of naming process? Not many. Fabric was already taken. Fabric is a great word as well. It was already taken. So I could use that. I did a Google search. Yeah, of course. Deep networking. So I did a lot of work on distributed system networking and protocol design. So mesh was in my vocabulary. And in fact, data mesh has multiple meshes in one, right? It's a mesh of flow of the data between input data products, the mesh of relationship between data products, data types, referral, link data.
Starting point is 00:54:05 So there's multiple actually mesh layers in that. But yeah, I think maybe my networking background influenced it and the fact that my Google search didn't show many examples of it. Perhaps if they were there, but maybe I didn't find it. And I'm glad I used it. It seems to be a catchy one. It stuck. Yeah. Yeah, for sure. Naming things is actually, if we think about even the fundamental principles of software engineering, naming things is the hardest part.
Starting point is 00:54:46 That's part of why I asked him. I have a little story. In fact, the very first public talk that I gave about data mesh was at O'Reilly Conference in New York. I think it was early 2019. And the topic of the subject of the talk was beyond data lake. And I put this call out at the i didn't have a name for this thing and i put a call out at the end of the conference if people have a name for this please come and talk to me yeah yeah nobody showed up as you said this right we get shy a
Starting point is 00:55:18 little bit about being judged you know with their choice of names. So yeah. With data, like it could have been data canal, right? Like you have a way to access, you know, all these things, but yeah, that's a great, well, thank you to all three of you for joining the show. This has been very helpful. Thank you for helping us put a definition to data mesh, you know, after over 150 episodes, I can't believe we didn't cover it. So thank you for, you know, bringing all of this to light. And we'd love to have you back to dig into some of the things that we didn't have time to cover. Thank you for hosting us. Yeah, thank you. We appreciate it. Thank you. We hope you
Starting point is 00:56:03 enjoyed this episode of the Data Stack Show. Be sure to subscribe on your favorite podcast app to get notified about new episodes every week. We'd also love your feedback. You can email me, ericdodds, at eric at datastackshow.com. That's E-R-I-C at datastackshow.com. The show is brought to you by Rutterstack, the CDP for developers. Learn how to build a CDP on your data warehouse at rudderstack.com.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.