The Data Stack Show - 77: Standardizing Unstructured Data with Verl Allen of Claravine

Starting point is 00:00:00 Welcome to the Data Stack Show. Each week, we explore the world of data by talking to the people shaping its future. You'll learn about new data technology and trends and how data teams and processes are run at top companies. The Data Stack Show is brought to you by Rutterstack, one platform for all your customer data pipelines. Learn more at rutterstack.com.

Starting point is 00:00:24 And don't forget, we're hiring for all your customer data pipelines. Learn more at ruddersack.com. And don't forget, we're hiring for all sorts of roles. Exciting news. We are going to do another Data Sack Show livestream. That's where we record the episode live, and you can join us and ask your questions. Kostas and I have been talking a ton about reverse ETL and getting data out of the warehouse. So we had Brooks go round up the brightest minds building reverse ETL products in the industry. We have people from Census, High Touch, Workato, and we're super excited to talk to them. You should go to datastackshow.com slash live. That's datastackshow.com slash live and register. We'll send you a link.

Starting point is 00:01:02 You can join us and make sure to bring your questions. Welcome to the Data Stack Show. Today, we're going to talk with Virl and Casas. They are doing some really interesting things for some really large companies, some of the largest companies in the world, actually, which is fascinating. And just to give a little preview here, Virl is part of Clarivine. And what they do is basically take unstructured context around internal data at a company as it exists in the form of things like marketing assets and essentially applies a schema to them so that there's standardization across this massive,

Starting point is 00:01:46 you know, multinational organization, which is really interesting. I want to ask him what, we talked about the concept of a schema as we were, as you and I were covering this show. I want to know what is a, if you think about a schema across a large multinational organization for creative content, what does that even look like? I mean, what are those data points and what are they, what are, you know, what kind of data are they, are they populating that schema with? How about you? Yeah. I want to ask him how you can build one schema to rule them all. You know, it sounds very powerful. It does. So yeah, I really want to see like how you can build something like this,

Starting point is 00:02:31 what are the approaches, the stakeholders involved, and how different it looks from organization to organization. Like at the end, how much we can standardize things. So it's going to be a very interesting conversation, I think. It's one of these problems that you don't hear about too often, but in the future, we will have to deal with it in smaller organizations. So it's interesting to have this conversation with him and see how tomorrow will look like. Yeah. All right. Well, let's dig in with Varel. Varel, welcome to the Data Stack Show. Great to be here, Eric. Thanks for having me.

Starting point is 00:03:10 Okay. So much to talk about with data standardization, especially in the context of marketing data, which is going to be a treat for me. But before we get there, just give us a brief background on yourself and how you ended up doing what you're doing today at Clarivine. Yeah. So prior to Clarivine, I joined Clar doing what you're doing today at Clarivine. Yeah. So prior to Clarivine, I joined Clarivine in 2018. Prior to Clarivine, I spent about 12 years first at a company called Omniture, which was kind of a leader in the web analytics space. The company was acquired by Adobe and I think it was 2010 and then spent about another eight years at Adobe. And in my role there, I was leading up strategy around what now is the experience cloud

Starting point is 00:03:46 and also corporate M&A. So corp dev M&A, if you think about what the experience cloud at Adobe really is, is a compilation of about 11 or 12 acquisitions done over a 10, 12 year period that ultimately has kind of resulted in what is now the experience cloud they have there. And so I spent a long period of time helping kind of build that business, if you want to think about it that way. In 2018, I kind of ran into a friend who had started a small company, which is now Clarivine. And as he and I were talking, he was kind of at a point where he's like, I'm not sure what I would do with this. We've got some great customers. We're kind of stalled out as far as growth. We have product issues and we need to raise capital. And so as we were talking, I kind of said, listen, I can introduce you to some people.

Starting point is 00:04:31 I'm happy. I love what I'm doing. And I've kind of got a four-year plan to retire. And as we kept talking, and I saw what he was doing in relation to where I was seeing challenges from my time at Adobe, where we had spent all this time acquiring all these technologies and solutions and had done a lot of work around integration at the workflow layer and in other ways and integrating those solutions.

Starting point is 00:04:59 What hadn't happened, though, is there was no kind of standardized data model underneath that. Adobe's really good at this now with their kind of the adobe data platform other things and there's emergence of cdps but there wasn't at that point any kind of focus around the data side of the integrations and the data side of kind of standardizing that and so as i started thinking about what he was i started looking at what he was what they were doing here at clarabine again it was like four or five people at that time it really struck me that there's a need in the marketplace, especially as we think about this,

Starting point is 00:05:29 what I think of as the 2010s were the kind of the decade of SaaS applications and explosion of that marketplace to where now you have, you know, 50 to 100 point solutions in any enterprise in the marketing organization. The problem that they're running into is they were never really architected to work well together and even at the data layer. And so as you, have you seen the emergence of kind of the enterprise kind of it's going to be a cloud-based kind of data infrastructure that's exploded in the last

Starting point is 00:06:01 couple of years and it's becoming more readily available, even in the functional areas, it became clear to us that, to me at least, that there's going to be a need to standardize and kind of create common language or taxonomy or if we call it dictionary, we're going to call it across these applications, especially as that, and it has to have context, especially as you collapse that data into these single instances in the cloud. And so as I was there thinking about the problems they're solving, the problems I was seeing even in our own business inside of the applications that we had acquired, it became clear to me that if we're struggling with this at Adobe for our customers,

Starting point is 00:06:45 the brands themselves have got a bigger problem because the number of applications they're trying to deal with is multiple times, you know, what we were dealing with just from a solution perspective to take the experience cloud to market. Fascinating. Okay. I want to take a quick detour here. So what a fascinating experience sort of being involved in building out a product suite from the M&A side. I mean, that's just fascinating, right? That's so interesting to me on so many levels. But I was going to ask you, and you mentioned it a little bit, looking at when we think of, especially about like marketing tooling

Starting point is 00:07:17 or customer engagement tooling and the suite of infrastructure that surrounds that from analytics to actually the tools that are sending messages. Did you have an evaluation criteria on the data side? I know you said there was struggle around that, but as you're thinking about building a product suite, I think that when you think about customer experience, evaluating the ability to layer those products in from a data standpoint was part of the rubric, even from an M&A standpoint. How did you think about that? Yeah, I think, you know, it's interesting. And I think this has evolved dramatically. I think the thought,

Starting point is 00:07:53 thinking about this is I think much more mature. I think it's much more mature today than it was like five, six, seven years ago. And largely it's been driven by some of the changes in the data ecosystem and just kind of the ways that companies are looking at their business, not so much where it was. I think back then there was more about the silent approach. And you think about applications about how do we get efficiency and scale in a channel? And I think what the world's turned to more holistically, I saw some a data point the other day saying before the pandemic about 30 percent of the digital the interactions with brands was was

Starting point is 00:08:29 digital now it's like 55 60 so it's a huge push forward yeah what's happened is is that when when when I was at Adobe we were thinking about it as this application and this application there's data here that we need to get here. And for specific pieces. So it's more about how do we push specific pieces of data between the applications? It wasn't kind of stepping back and looking at it and saying, holistically, what should that operating data model look like for the marketing organization holistically because i think the industry even back in the 2016 2017 time frame was just starting to kind of everyone's talking about a single view of the customer and unified profile and all this stuff but the reality is is that you

Starting point is 00:09:21 had on one side ad tech solutions you you had MarTech solutions, you had CX solutions, and they were sort of in different groups. I think what you're seeing now is the convergence of this stuff around the experience and around the customer more so, and it's driving this really different way of thinking about the data necessary to operate the business, not the data to run the application and do my job in a channel, if that makes sense. Yeah. Super interesting. I've referred to that before as kind of the daisy chain paradigm where it's like, okay, well, I have data here and then I need to get it here and then I have it here, but then I also need to get it here. And so you end

Starting point is 00:09:59 up with kind of this daisy chain architecture that degrades over time, almost like a game of telephone because every, every system has its own flavor of database and data definitions and all that sort of stuff. Yeah. And I saw it even in a sense of like, even within there, within the cloud that Adobe had built, forget about integrating other applications in that are not owned and owned by under the Adobe brand. That was even a conflict that, under the Adobe brand, that was even complex.

Starting point is 00:10:29 That was, again, it was a daisy chain even within those applications. And then when you think about it from the, you put the lens on the, from the brand's perspective, it's even, it's much more complicated than it looks like from Adobe's perspective or from Salesforce's perspective, or the kind of the big, you know, the large enterprise software companies out there.

Starting point is 00:10:47 Yeah, for sure. Okay. So thank you for humoring me with that little detour because it's, it's fascinating. Let's talk about the, the let's, maybe could you use a specific example of the type of brand and the user who's like, I am facing this problem every day in my job and it's really painful. And Clarivine comes in and like, this is so much better. Like describe that for us. I'd love for you to get specific if you can of like, I was doing things this way. There are

Starting point is 00:11:20 data problems because of X, Y, and Z. And this is the new way that we're doing it. Yeah. It might be helpful for me to even kind of back up a little bit and explain. When I came to Clarivine, it really was we were helping analytics teams are publishing or creating reports and, and doing analysis. What we, what we see with a lot of our customers is they do the analysis and then they come up with this, you know, you've got the report and at the bottom, there's this other bucket that 25, 35, 40, 50, 60, 70% of the data drops into. And it's very degrees, depending on how,

Starting point is 00:12:04 how, you know how complicated or how integrated they're trying to you know to report on but what we what we really kind of were initially initially helping solve was taking data out of the other and actually specifying it and actually putting context around it so you can actually attribute it in some way. And so just to put a sharper point on that, so like, and I'm just thinking through like our data engineers and analysts who are listening to my own experience,

Starting point is 00:12:34 it's like, okay, and reports that I've seen or that I've like helped build data for whatever. It's like, okay, we have, you know, paid search campaigns as a bucket. We have like event, you know, sort of like where, I guess, is that what you're getting at? No, that's one type of report. It's just event, you know, sort of like where are, I guess, is that what you're getting at? That's one type of report is just like, you know, by channel or whatever. I mean, we've all sat in those meetings. You know, I have a finance background, but I spent

Starting point is 00:12:54 in 99, I switched over to digital marketing because I thought this is more of an analytical problem than it's a creative problem. It's actually an interesting problem to solve. And so I sat in these meetings before with teams where everybody comes to the meeting, you're reading out channel by channel by channel, and you start rolling it up and the numbers do not roll up. They do not roll down. And so, you know, you aggregate the individual numbers from the channel, people sitting in the rooms and seats, and it's X. You look at the aggregated reports and you're like, no, it's like 0.4X or 0.6X or 0.7X. Like where, who's, who's being successful. And it's still a problem today because a lot of that reporting is, even though it's done in, you know, in a centralized,

Starting point is 00:13:37 through centralized applications, the foundation is broken. The premise that we have is the foundation of that data. If there's not, again, we think about as data standards, but so that's one of the things we're helping solve is really kind of taking and creating more specificity and more detail and more context to that data that improves that reporting. That's one thing. But it starts, that's just on the reporting side.

Starting point is 00:14:01 But if you take that out even further and say, hey, well, the same data that in a lot of cases you're using for reporting, you're using data to do other things like optimize spend across channels. And I've had situations, I was talking to one of the largest consumer electronic companies out there that, you know, the brand is associated with a fruit and they were, we sat with one of their larger teams and they're like, listen, we got a problem. One out of every seven days,

Starting point is 00:14:35 we cannot optimize ad spend because we're having to rebuild all the models. We're having to clean all the data up. And so there's literally one day out of every seven where we're still spending, we spend, you know, 50, a $100 million a day. We have no, we're flying blind and then everything's delayed. And so with that, with that organization, what we really helped them do was to reduce the time to insight dramatically by reducing the amount of data that had to be cleaned up in the operations

Starting point is 00:15:02 side of things. So think about where marketing ops, data ops, and ad ops, they spend a lot of time between execution and kind of optimization, cleaning data. And that's what we're trying to help them eliminate by adding context and creating standards in the way that data is captured. Okay. So when we say standards and context, I want to dig into those terms, but could you just give us a sense what is the breadth of data? Because certainly that's a huge contributor to the problem is that you have a huge variety of different types of data coming from a bunch of different places.

Starting point is 00:15:52 Absolutely. And to be clear, we are not a data collection, like an analytics or data collection application. The way I describe it is, and there's a company that I think in the 80s and 90s called, it was BASF.

Starting point is 00:16:03 And they're like, we don't make the products you use every day. We make them better. And so our opportunity is to improve the ROI and the value that our customers are getting from other applications or analytics, whether it's analytics, whether it's CDPs, whether it's ad serving or whatever else they're doing, whether we're spending dollars. And so what we really are, when we think about that types of data, it is clickstream data,

Starting point is 00:16:28 but it's not saying we're collecting the clickstream data and replacing it. It's appending onto that clickstream data context about a campaign or an experience. It's appending to content, standardized data about the content itself in relation to all the other content in the organization and across different dams, across different CMSs. And so it's really trying to take

Starting point is 00:16:52 this complexity that exists in marketing organizations. Today it's marketing. I think there's other applications outside of marketing that we're talking to companies about, but it's really trying to take some of that complexity and create a layer underneath of it or alongside it that has standards around it, that is attached to or can be attached to that data to enrich it, extend it, and also create meaning between some of the data that right now doesn't have necessarily, you know, really, really great ways to associate it. This may not be the right way to think about it. So tell me if I'm off here, but it almost sounds, I mean, this sounds amazing. It almost sounds like you're a schema designed for full visibility and stamping it on the data across every data repository? I actually think that's a great way to describe it. I think that's a simple way

Starting point is 00:17:58 to describe it because it is almost like an imprint against that. You know, it's not, it's not that we're collecting the, the, the behavioral, the streaming data. It's, it's a, it's a set of data that gets appended to that or stamped to that as well. Yeah. Yeah. Because like, if you think about like stamping a schema on a certain set of data, like there's a Delta, if it doesn't, you know, contain all pieces of the schema, maybe I'm extending that metaphor a little too far. Kostas, let me know. But okay, I have two more questions. I know Kostas has some as well. The first one is, could you give some examples of the context that you add on to a specific type of data or a couple types of data? Just like you mentioned, advertising performance or even content, which is interesting, like in the context of digital asset management or other things like

Starting point is 00:18:50 that is pretty fascinating. What does that look like? You have a piece of content, what are you adding to it? You have advertising data, what are you adding to it? Yeah. So if you think about content, there's a lot of situations where you have people creating content which is more kind of the content creation you i'll call it the creative side of things you've got the content side of things where all of a sudden it gets you know loaded up into a cms what you have happening is you have typically people creating it that have a creative brief and all this context around this piece of content was created for this purpose, for this business unit, for this stage of funnel, for this geography, for this demographic,

Starting point is 00:19:30 for, you know, it's all that information and insight that sits in the creator's head. The problem is, is that once that stuff goes in the dam, there's A, there's not a great way to, you've got creators all over the world. Like you think of a large multinational organization, you you have creators and agencies you have creators internally you have people all over the globe they're speaking different languages like how do you create a standardized language across all those teams peoples and geographies and business units and

Starting point is 00:19:58 that's really what we kind of help them provide and so instead of having once it gets loaded into the cms instead of trying to have the you you know, the content, the people that are loading the content into the CMS, add that context, that piece of creative, that idea with that creative is actually associated with a bunch of context in our application that allows them to really kind of create a different way of solving this problem. That's really interesting because the, anyone who has worked with data knows that, like, I mean, I don't want to be too incriminating here, but relying on human input, you know, for critical data is never a good idea because people are always going to get it wrong, you know, fat finger it like it's, you know, it's just, it's the least reliable way to capture data in many ways. Yes. And it's interesting though, when you, so if you think about it, though, those individuals have all the context and a lot of the context around what is actually happening. And so in some ways, it's funny. I was talking to one of our customers about this, and they said, this woman said, you create the illusion of choice inside of our application for the end user in that situation where there's an end user in the application that really kind of forces them down a path that limits the errors that can

Starting point is 00:21:30 that can kind of get created so you're almost kind of forcing a set of decisions on a much smaller and depending on who they are what channel they manage what you know there's all sorts of controls that you can build and logic you can build around what level of choice you create. And as data is, you know, through integrations, there's a lot of context you can get from the integrations of where other data is coming from that help kind of inform what options we should or shouldn't give them. Yeah, that's interesting. It's kind of taking a consumer app optimization mindset where you define very clear pathways and success for the user and applying that to internal creators inside of a business, which is super interesting. Could you give one example maybe on the paid side, just so we have another example of the context that you layer on to a particular type of data, like advertising data,

Starting point is 00:22:29 is that performance data or? Yeah. So we work really closely with a lot of our customers, both internally and with their agencies around that performance data. And so what we're helping them add into that is in some cases, there may be data fields that are collected in one application that are not available in other applications or the way they name fields in one application are not consistent with other applications. So it's trying to help solve naming kind of differences in the way that fields are named and we can do some mapping for them. The other way to think about it is, again, similar to a creative brief, think about a campaign brief. There's all sorts of context and insight in that campaign brief, which are typically managed inside of spreadsheets that we help to onboard into kind of the enterprise data, how to say this, the data model, if you want to call it that sure it's it's things

Starting point is 00:23:25 around what stage of funnel is the campaign on who was that what was the segment what was the creative and it's it's mapping that creative then back to standard you know mapping it's creating almost like a way to map even across elements of an experience that that just not are not specifically and standardized data across those elements of an experience that are not necessarily just specific to the campaign itself. Super interesting. Okay. I have one more question and then I'm going to hand the mic to Costas because I've been monopolizing this conversation. So how do you do that is my next question, because that sounds very complex, especially when you're thinking about,

Starting point is 00:24:05 you know, organizations. And I mean, it sounds like, you know, sort of fortune 500 level, 100 level companies that are just massive, complex organizations that, you know, are producing content across who knows how many vectors and business units and product lines and all that sort of stuff. So how do you do it? Yeah, it's interesting. So where we thrive, where I think the opportunity exists for this is in those organizations that where there's more complexity and you're hitting on it. So we have a customer, I think one of our customers, they have about 700 users across the globe, both internal and agency users, specifically around

Starting point is 00:24:46 standardizing taxonomy around content. And in that situation, we are baked into the workflow. When you're going through that creative process and submitting creative briefs and things, we are integrated into the workflow and capture data, in that example, out of Workfront. And so it's through integrations that we get access to data. And it's through integrations as users are adding data into fields either in other applications, we derive that information into our application, into our solution. What we have at that point is the ability to compare what was input to what the available standards are and identify where there are differences between what is maybe input manually in another application and what the organization identifies as the standard around

Starting point is 00:25:38 that field or that attribute in the data. And we're able to identify where there's breakage in that, either through our solution automate the correction of that or allow the individuals and organizations to surface those areas where there are problems, fix them in our application, and then identify ways to enforce it upstream. This is very interesting. I mean, I don't know if I'm going to be a little bit too technical, but I cannot stop thinking all this time on how do you design a data model like this? Where do you start? In my mind, when we are doing data modeling, because it's like modeling in general, right?

Starting point is 00:26:20 What we are trying to do is create, let's say, an abstraction of real worlds. So there are like two ways that i can think that you can do that go like high level and be like okay what i want to do is like i want to model the marketing domain what are the main concepts that we have there what are like the main processes that we have and try like to create a data model around that and then all the data all the instances that they come come from different applications go there and try to connect them on this, align them with this data model. The other ways that go from the application level, which is the other extreme. I have these applications, they have 10 different data models. Let's align these 10 different data models and see what happens at the end. But my feeling at least is that none of them is at the end. But my feeling, at least, is that none of them is, at the end,

Starting point is 00:27:08 like so successful. Like you need something else, something in between, probably. So how did you do that? Yeah, so I, by the way, I agree with what you're saying, those options. And the way we think about it is it's not necessarily an either, it's an either or. It should be much more of an and. So it's interesting where we come into organizations

Starting point is 00:27:29 and it's becoming more and more this way, I think as companies are becoming more mature about this and we've seen it specifically in the last 18 months, is we are now sitting in situations where it's not just the marketing organization sitting in the room talking about the data model and the data taxonomy. What's happening now is they're bringing in the enterprise. And I didn't know these people existed, but they're all over out there.

Starting point is 00:27:56 There's enterprise data taxonomists. It's the enterprise kind of architects and data architects that are coming in and working with the marketing organization to kind of do exactly what you're saying, which is some of this is going to come from the application side and how do we, how do we connect data across the different applications? And that's some of what we, we help them do is to really kind of string, create relationships in the data that don't natively exist in, in the applications themselves. And then secondly, it's coming from the top down saying, hey, there are other attributes and elements that we want to capture that are really specific to the enterprise that have nothing to do

Starting point is 00:28:32 with the applications that we want to be able to incorporate into that model. And so some of that becomes, you know, the way we think about it is, I'm thinking of just a simple example is, there's associations between like, for example, if you think of a car manufacturer, like a Toyota, for example, there's associations between make model and then other sorts of, you know,

Starting point is 00:28:53 trim packages and things like that. Just think about, think about from, from that perspective, like a data model for a car manufacturer, like there's also, and those are, those are all related and they are not and they're and they're exclusive in some situations like if i have this mod if i have this manufacturer like lexus versus toyota i only have certain models available so you can exclude things and so some of that becomes work that we do with them in helping and some of what we do internally we also work with some of the large si's out there that help work. They help them work through a lot of this coming up with the right data model and identifying what are the relationships that we're missing

Starting point is 00:29:32 that we want to try and enforce in this data model that don't exist through the applications themselves. And so it's a little bit of kind of meeting both ways if you want to think about that. Can you give us an example of such a relationship with, I don't know, like... Yeah, I mean, I think the example I gave really with creative. What you have in creative, for example, is if you think about an ad server, an ad service, you know, one of the elements of an ad campaign is the actual creative. And in a lot of cases,

Starting point is 00:30:03 the way that creative gets named inside of the ad serving solution, because in some cases it may be a dynamic ad that gets created, has nothing to do with the way that the asset or the creative is named or identified in the digital asset management solution or other applications you have. And so one way I think about it is, and then if you have assets that you're managing inside of your dam and it's got a bunch of metadata with their attributes about that piece of creative, how do you associate that creative with that campaign when the way that the IDs are not, there's no linkage between them. And so that's where we, and then how do you take and associate that campaign, that creative with all these other attributes?

Starting point is 00:30:46 That's kind of one simple example of how we help them create those relationships and stitch these things together that don't naturally exist within the applications. Because they weren't necessarily architected and they weren't meant to work that way. Yeah, yeah, of course. And if I understand correctly, this model that we are talking about is a combination of, let's say, what traditional alcoholists schema, which is more about like relationships and taxonomies. Yes. So, or is there also something else or it's just this too? Yeah. I mean, the other way to think about it is, you know, we work with a lot of, a lot of our customers work with agencies. Like I'm thinking about one, one of our customers right now that has, I think almost a hundred agencies around

Starting point is 00:31:28 the globe they work with on the media side. And each one of those agencies is executing and trafficking media. Well, they have trafficking sheets and in those trafficking sheets, there's naming conventions around how they're naming certain things. Well, agency A and agency B and agency C, if there's not, if there is not a way to enforce and take them out of the spreadsheets and enforce the way that they're naming that, you know, creating naming conventions and naming data fields similarly, then everything you can't actually extend your data model out into that. It's very difficult to extend the data model on the enterprise side out into the agency when you have that much complexity across the agency team.

Starting point is 00:32:12 So if you think about it, we're helping the brand extend the enforcement and the use of that data model, not just within their own teams, but even outside of the organization as they think about their business, because it extends, you know, execution happens within those agencies as well. So you may have sheets that are naming convention seats that are around trafficking or trafficking sheets. And if you've ever seen one of these things, these are spreadsheets that have, you know, they're on version 137 and they're they're you know hundreds of rows why columns wide and they've got multiple layers to them and they're trying to pull in the creative they're trying to call the data about the audience and it's they're fraught with errors and the and when

Starting point is 00:32:59 when and our point is like there's actually a lot of valuable data in that, in that, that should be part of the enterprise kind of data model and data and data ecosystem and data store. And so that's, that's one way to think about it is, is that we're helping those situations where information workers are working in spreadsheets and pulling them into an application. Yeah. So from what I understand, like, okay, there are, let's say, relationships and also like constraints probably. So you can, let's say, constrain the way that things can get associated with or like what values they can take and like all that stuff. How do you represent that? I mean, that's a bit of like more technical question, but like, what the technology looks like to represent on something that on a way that like the machine can understand it and force that, right?

Starting point is 00:33:52 Like what technologies are used? Like how do you build your product to do that? So the underlying technologies? Yeah. I mean, a lot of this comes down to, I don't want to get into a technology, like what, what programming like that stuff. But what I would say is this is that when we think about this problem, we think about it's very similar to what you're talking about,

Starting point is 00:34:15 which is there are known relationships between data and the data model between data and the data model. And there are relationships that need to exist or there are fields of data that need to be part of the data model that aren't kind of naturally and natively in there. So we can help them bring those in. We also have all sorts of logic in the application that says, hey, if this user in, again, it's coming back to knowing who the user is, what functions they're responsible for, what campaigns they have,

Starting point is 00:34:53 like for example, on a campaign side, what geography, what brands, there's all sorts of way to limit down the number of available fields and the number of available options that you, that you expose to a user based on what you know about the business, what you know about the application or the area of execution, and what you know about the user themselves. Okay. So let's say we get all the architects and the taxonomists and you all together and we

Starting point is 00:35:23 generate this unified, let's say, model for a big corporation. Who is consuming that after it is done inside the organization? Who is the main consumer? Yeah, so what's interesting is, if you'd asked me that question a year and a half ago, what I would say is, two years ago,

Starting point is 00:35:44 the primary consumer of this data are the analytics teams within the within the marketing organizations what's happened now though the last year and a half again some of this is the result of not necessarily what we're doing but it's i think it's a bigger trend that's happening within the enterprise is as more of the functional teams have access to data, you know, large kind of scale cloud-based data infrastructure, they're moving more of that work and payload into those, you know, whether Snowflake or other areas. So we're pushing data now downstream into some of our customers into their BI teams. They are using it to go into their, they're pushing it into their BI teams. They are using it to go into their, they're pushing it into their CDPs. They're pushing it in some cases into

Starting point is 00:36:29 whether they're using Snowflake or other applications, they're pushing it into their machine learning and AI infrastructure. Because what's happened, what they're realizing is, again, the size of data that meets the quality standards to point machines at to make decisions on, you know, better decisions on behalf of humans is really valuable if they're scale. If they're not scale to the data and scale the data is really a byproduct of in some ways of how much the quality of the data and the relationships that exist in the data so what as i talk about the creation of relationships and i and the improving the quality

Starting point is 00:37:12 of the data through standards it really is all those kind of different applications are areas where our customers are pushing the data we have data transformation capabilities in our application that allow them to, you know, to either we are directly integrated or to push it out to, you know, an AWS bucket in a certain format and then be able to capture it and pull it in. But more and more customers are wanting the native integrations so that changes happen in real time. And it's not just about us informing downstream, but the other way to think about it is, you know, I'm thinking of one of our customers who has a very quickly changing set of inventory. It's a, it's a large athletic shoe manufacturer.

Starting point is 00:37:56 And as they are constantly, constantly releasing products, it's how do you keep an up-to-date product catalog available to your marketing organization and other users that are creating campaigns, creating content, all those other things. How do you expose that to them in a way that it's up-to-date and it's limited to their geography or the channels that they're selling into versus, you know, because they have channel conflict and other things. So it's managing things like that as well. And being able to expose, if you call it upstream, product kind of like that and other data into

Starting point is 00:38:36 the marketing organization and other parts of the business that actually have logic associated with it and allow them to, again, limit the number of selections and the variance or the variety and the choices that they need to select from to actually get data into this model, if you will, like all that. Yeah, makes total sense. So if I understand correctly, like correct me if I'm not, I see like two main, I mean, at least two ways that like value is delivered like in the organization. One is that it's easier, let's say you make data easier to be interpreted by people because of the standardization that goes there with the data model. And the other thing is like data quality, right? Like when you have a reference schema and the taxonomies, which also like add a lot into the quality aspect, you can increase and monitor like quality a lot.

Starting point is 00:39:39 But when it comes to quality, there's always something can go wrong. Someone will mess with the data, let's say, right? So what happens then? Like how the tools that you have in place, like the product that you have, can help or not help? I don't know. I mean, with addressing the issues that are created. Yeah. I mean, so one way to think about it is we think about solving data quality.

Starting point is 00:40:08 Traditionally, you think about data quality has sort of been solved reactively in the data pipeline or with ETL downstream. There's lots of ways it's been solved downstream because when you get ready to actually utilize the data in runtime, you realize that we got, you know, we got problems. So it's trying, there's a lot of that gets fixed downstream. We see it differently. We look at it and say, listen, if you put in place data standards on the front end, there's a proactive way to solve a lot of the data quality issues. We're not going to solve, we're not, it's not about solving world hunger, but it's about solving a set of problems that are kind of type two problems where you think about context and things that really enable the organization to create a way to bridge between the creative side or the creatives, if you want to call that, and the quants

Starting point is 00:40:57 and actually create a more holistic way of thinking about data quality. Because typically data quality is that the problem gets kind of shoved downstream and it's data engineers, data, you know, data analysts and data scientists that are dealing with data quality. We think that there's a better way to solve it, which is if you put tools in place that enable the information workers on the front end to solve some of this, then it benefits downstream.

Starting point is 00:41:28 You're not going to solve everything. We also have situations where customers are revalidating data. So they're running data back through our application to, again, always banging it up against the standard to make sure to see where there's, where there were their problems. And that's kind of how we think about it is it's a very different way of thinking about solving data quality rather than fixing data quality. Makes sense. Makes sense. No, that's very interesting. And okay, we talked about like how we can make like the data easier to be,

Starting point is 00:41:56 let's say, managed by machines with the validation and like all that stuff. So what about interpretability? Like how this model that has been created at the end can be communicated to a huge organization right because from what i understand we are talking about like organizations that are really really big you might have like i don't know hundreds of stakeholders that they have to agree on these definitions that this schema has so how does this work how how how technology can help and how organization can help? Because I assume that probably like the solution is somewhere in between. It's not like just a technical problem. Yeah, it's interesting because I think what we've seen, and we're seeing it less

Starting point is 00:42:35 so now, I think there's more thought being put into this kind of proactively within the organizations. But really, organizations have to come to, in some ways, an agreement on what is that data model that we're going to use to operate. Not that it can't change, but we typically get involved with them in a situation where they sort of have that solved or they're close to having that solved.

Starting point is 00:43:02 And we may help them, because we have so much context from all of our other customers, we may come in and say, Hey, you may want to think about these other, these other fields of data that may be important to your business. There's, you know, there's, there's company X over there that has, that looks it's in a similar field or whatever. There are other attributes that you guys should be collecting that you're not, you're not identifying and standardizing.

Starting point is 00:43:23 And so some of that we come in and help them with, but a lot of times that's either being done internally with, like we talked about earlier, the architects and the business users and, you know, like for marketing or there's situations where we get brought in where they've already engaged with a Deloitte digital or an Accenture. And it's a, it's the, how do we deploy and activate this model that we built and make it into the organization? Because what we've seen is

Starting point is 00:43:58 organizations have data taxonomies. They sit in these taxonomy solutions, but they're not connected to anything. So it's kind of like, that's great, but it's sort of shelfware. And so it's about making it active and making it actually usable and functional by the information workers. And that's kind of what we connect that, we kind of connect the, if you want to talk about it, we connect the architects and the taxonomists with the business people. And that's, that's ultimately who needs to solve it. But there's, there's a lot of cases where it has to be facilitated by third parties that

Starting point is 00:44:32 have been through this many times before, but it's surprising how many, how many organizations themselves know this stuff better than anybody. It's just getting, making it a priority and getting the right people in the room to actually have those debates and have those discussions. Makes a lot of sense. One last question from me. I mean, you have implemented this process in many different companies.

Starting point is 00:44:57 How different the implementation is from company to company and how much you can end up and say, like, okay, there is at the end one distilled model for marketing that makes sense for pretty much everyone. Okay. There are like edge cases and some specializations on that, but this model, like if you take it, you pretty much understand, like let's say 80% of what marketing is like in an organization. Yeah.

Starting point is 00:45:26 The way I think about that is there are, I'll call it, elements of those models that are fairly standard. There are also big portions of it that are unique in the sense that each business, if you think about context, each business is organized and structured differently. And I saw this with one of our customers and we see more and more where our data is now ending up in the finance organization, because what they're realizing is, wait a second, we're trying to do profitability analysis and all sorts of analysis from a finance perspective, but we don't necessarily have a way of looking at the data as it relates to the way we think about the business

Starting point is 00:46:09 from a P&L perspective around business units, around contribution margins by business units, by product lines and things. And so when you think about product lines, business units, the structure of the organization, that's the part of it that becomes sort of unique. It's not like every single one of them is different, but the elements of that really have to be customized

Starting point is 00:46:32 and be aligned with the organization itself. And so there are pieces of it that I would say are standard and there are elements of it that are unique to that organization, which is really where you think about organizational context versus the channel or the application context. Okay. One more question before I give it to Eric. Last one.

Starting point is 00:46:59 I promise. So how much and how often do you see these models changing inside an organization? And is there like some indication, like some dimension of the organization that whenever it changes, let's say also these reflect back to the model? Like are there some kind of connections there? Yeah, I mean, again, these models change as they're adding. There's a difference between the model changing and the underlying data itself or the available attributes in those fields, if you want to call it that. For some customers, like I mentioned earlier, that stuff is constantly changing. And there's a bunch of logic around that. The models themselves do change, but that is a much more of an organizational discussion. And there's controls around that. The models themselves do change, but that is a much more of an organizational

Starting point is 00:47:46 discussion and there's controls around that. And there's even ways that particular functional areas or users or geographies can actually, you know, there's no reason they can't modify their model, but it's being able to say, hey, this is the standard model that we've agreed on as an organization. And there's these attributes that are added by this geography. And it's how we treat them differently when we do analysis, understanding that this was built in there for a specific purpose that has nothing to do with the way we think about looking at it holistically from an organizational perspective. And so there's some of that nuance that we can, and we help them manage that and understand and put a lot of controls around who, where, when these models can be changed, because you can't have people, you know,

Starting point is 00:48:35 you have to be very specific about who has that ability and who has the rights to do that. So some of us around governance and access, but yes, they do change, but it's not, it's a, for the most part, the core kind of capital D data model versus the lowercase d. There's a difference in how and when those change and how frequently. Yep, yep. Super interesting.

Starting point is 00:48:58 Eric, all yours. I'm loving this. I mean, just the concept of, you know, I think, Varel, what's so interesting to me is that this kind of data is so rarely talked about within an organization. Yet, it's the kind of data that makes existing data so much more valuable, which is super interesting. I may be wrong with this parallel, but one thing that really strikes me as I think about what you're doing is that in many ways, like you talk about, let's say the creator who has a ton of context, right? That feels very similar to taking unstructured data and applying structure to it. And when I think about that, one of the, which this is going to sound very buzzwordy, but I start to think about things like machine learning or graphs,

Starting point is 00:49:57 you know, networks where you can discover connections that previously were undiscoverable because everything was unstructured. Is that something you're exploring, something you already do? I'd just love to hear about that. Yeah. I mean, I think what you're hitting at is, and this goes back to what I was kind of mentioning earlier, this kind of connection between quads and creatives and things, is that again, where we're laser focused. And I think, you know, as you step back, every organization out there, whether they know it or not,

Starting point is 00:50:31 actually is dealing with this problem. And most of them recognize that they're not sure how, and I think to a large degree, we come in in situations where we'll be brought in by one group and they'll bring other people to the meeting, you know, other groups to the meeting and people sit down and within five minutes are like, oh my gosh, yes, this

Starting point is 00:50:46 is a huge, we didn't, we've been, we know this is a problem. We've not talked about it. And we've sort of pretended like it didn't exist because there's not a lot of people up from us that really understand it. So we just kind of like shove it under, keep shoving it under the carpet, but it's becoming a bigger and bigger problem as the scale gets bigger. Right. And the gaps are getting wider. But as you're talking about almost like a

Starting point is 00:51:08 neural network, like in some ways, I don't, I'm not sure that's our problem to solve. I really think that what we're trying to do is empower, like enable our customers, you know, their machine learning teams, their kind of data science teams to use the technology they're using, but augment it with this data or this information or this context. And that's really kind of how we see the world. And we don't care. We're very, you know, we look at this as we place Switzerland in this and then we want to make this available wherever it's needed to be available

Starting point is 00:51:43 or where it adds value. And we will integrate upstream wherever we know there are challenges in getting context into either campaign data or performance data or whatever. That's super interesting. And it totally makes sense. I'm just thinking about use cases where, let's say you have a, you know, a pretty wide set of product lines, you might discover something with the added context about the relationship between product lines and a particular subset of consumers who meet certain demographics that would be hard to discover, you know, just with your basic, you know, clickstream and purchase data, for example. That's super interesting.

Starting point is 00:52:31 So you're essentially providing a data set that a machine learning or data science team could use to draw that conclusion. Super interesting. Yeah. And again, that's that data. It's like you said earlier, it's kind of stamped on the behavioral or that other data. And so it gets stamped onto it in a sense. Yeah.

Starting point is 00:52:48 Okay. One last question, because we're getting close to the buzzer here. Could you just give us one example? So the marketing and sort of content asset use case makes total sense. You mentioned earlier in the show that there are some other contexts where it makes sense. And some of those popped to mind for me, but I just love to hear from your perspective, what are other areas of the organization where this kind of sort of, you know, unstructured context to structured stamping, if you will, makes a lot of sense. Yeah, it's interesting. We've really been pretty laser

Starting point is 00:53:19 focused on marketing, but what's, what's interesting is we had a, we had a situation recently where we were brought into, it was interesting, it was back to kind of thinking about the Adobe situation where you're talking about integrating multiple solutions that have been acquired. And it was a company that acquired a number of demand side platforms, DSP solutions, and it's solutions for buying, you know, programmatic buying and selling of media. The problem they ran, it's interesting, they were trying to use, I think it was Informatic or some other solution to try and map data together that allowed their finance teams to appropriately build clients across platforms.

Starting point is 00:54:00 Oh, interesting. You know, to expose inventory across the different platforms, having the one team sell it and having one buyer and the problem they're running into, it came up, I guess, and it was, it came, the reason they came to us is the problem got exposed by auditors and was something that we're going to have to disclose their massive public company. It was going to, they were gonna have to disclose in their financials. It was a couple hundred million dollar problem. Like ultimately they were like, there's a billion dollar opportunity here that we can't actually get access to because we got this underlying data problem. Well, think about

Starting point is 00:54:33 not just like campaign creation or content creation, but if you've got a sales team that is creating sales orders and around inventory that is being bought from different, different, through different DSPs or different platforms. How do you, how do you standardize that down so that you know, and are able to associate the stuff, you know, that, that inventory, that the, the fulfillment and that stuff together. And that's really what they brought us in for is to really try and help map that. And I think that is probably, to me, that kind of, it was one of those situations

Starting point is 00:55:08 where you're like, okay, I would never have thought of us as a solution for that. And so it kind of opens my mind. And again, we've been so focused on where we're going that there are other applications. Anywhere you have people interacting with applications, I think it's outside marketing. You've got people on supply chain.

Starting point is 00:55:28 You've got people on sales and other places. It is the same. There's a similar opportunity in all those situations. And we've chosen to start here. It's mainly because that's our DNA. And we see the problem as being something that our customers are seeing as a big challenge. And we, and I think we can, we, we just feel like that's a natural place for us to start.

Starting point is 00:55:50 And, and we get pulled into other, you know, and the content thing came up through another customer that pulled us in saying, Hey, we've got this problem. I think you guys can help us solve it. So that's kind of how it came through a customer. Yeah. Fascinating. It makes total sense when you think about going to your example with finance, reconciling transactions that have happened that relate to inventory across distinct siloed platforms is essentially a mass reconciliation problem. You know, super interesting. Well, Verl, this has been such a fun episode. I love talking about data and uses of data and context for data, you know,

Starting point is 00:56:33 sort of outside the standard stuff that we talk about. And it sounds like you're doing some really fascinating things at Clarivine, sort of bringing that to light. So thank you for your time. And thanks for sharing with us. Thank you. It's my pleasure to be here. Thanks a lot, Eric. Kostas, thank you.

Starting point is 00:56:48 I think my big takeaway from the episode is that I really started thinking about context more, Verl mentioned context, and he talked a lot about people who are doing certain types of work, right? They're producing work. And in that context, it was marketing assets, right? A piece of content or a campaign. But I just thought about all the touch points across an organization where people are producing work and the amount of context that's in their head is unbelievable. And in many ways, that sort of is what brings value to the work. And so that whole concept is just fascinating to me about how you sort of mine, like, how can you mine that context and actually turn it into actual physical data, you know, in a defined schema, I think I'll be thinking about that all week because just, you know,

Starting point is 00:57:47 from a philosophical standpoint, it's pretty, it's pretty interesting paradigm. Yeah. Yeah. One of the things that I'll keep from this conversation is that first of all, spreadsheets are still the king. They're never going away. Yeah. Like cockroaches, man man like you cannot get rid of that is that why they named cockroach db cockroach db i know i mean that's uh yeah we had an episode

Starting point is 00:58:14 about that but we didn't discuss the name because a little bit controversial but oh that's right that's right yeah i think i i mean for me like what was like super interesting is that there are there are roles inside organizations, like really big ones that we didn't even think about, like having people that they have to build and maintain data taxonomies, for example, which is pretty amazing. And together with data architects, you have the people who are creating at the end, let's say, data representation of the whole organization that needs to be communicated to everyone. What I'll keep from the conversation that we had

Starting point is 00:58:49 is that these problems at the end, and I think this is like not just, it has to do with data in general. At the end, success is like figuring out the right balance between how much technology can do and how much humans have to agree and how we can do both and do both well. So that's what I keep from the conversation.

Starting point is 00:59:11 And I'm really looking forward to see when similar products will also hit the market for smaller companies and medium-sized organizations yeah i mean certainly a multinational corporation the pain point is severe simply due to size and complexity and so the problem is exacerbated but it's been we've had similar problems at every company if i've ever been really small yeah Yeah, 100%. I mean, yeah, I agree. I don't think that this is a problem only for the organization,

Starting point is 00:59:50 like very large corporations, just that they cannot survive without solving these problems. That's the difference. That's why I'm saying that we are going to see at some point products that they try to address these problems also,

Starting point is 01:00:02 like startups or like for smaller companies or like for medium-sized organizations. I agree. All right. Well, thanks for joining us on the Data Stack Show. Fun topic for you today, and we'll catch you on the next one. We hope you enjoyed this episode of the Data Stack Show. Be sure to subscribe on your favorite podcast app to get notified about new episodes every week. We'd also love your feedback. You can email me, ericdodds, at eric at datastackshow.com. That's E-R-I-C at datastackshow.com. The show is brought to you by Rudderstack, the CDP for developers. Learn how to build a CDP on your data warehouse at rudderstack.com.

CODACE Plant Stand

The Data Stack Show - 77: Standardizing Unstructured Data with Verl Allen of Claravine

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.

Your Ad Here

CODACE Plant Stand

The Data Stack Show - 77: Standardizing Unstructured Data with Verl Allen of Claravine

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.