The Data Stack Show - 77: Standardizing Unstructured Data with Verl Allen of Claravine
Episode Date: March 2, 2022Highlights from this week’s conversation include:Verl’s career journey (2:46)M&A data evaluation criteria (7:12)What Claravine does (10:48)The breadth of data (15:03)Adding to content and advertis...ing data (18:22)How Claravine standardizes data (23:53)Designing a data model (25:40)The underlying technologies of building a product (33:43)The main consumer (35:02)Maintaining quality (39:06)Helping solidify definitions (41:37)Implementing Claravine’s model across various companies (44:54)Internal changes affect on the model (46:47)Connection brought about by structure (49:19)Applying unstructured context to structured stamping (52:36)The Data Stack Show is a weekly podcast powered by RudderStack, the CDP for developers. Each week we’ll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data.RudderStack helps businesses make the most out of their customer data while ensuring data privacy and security. To learn more about RudderStack visit rudderstack.com.
Transcript
Discussion (0)
Welcome to the Data Stack Show.
Each week, we explore the world of data
by talking to the people shaping its future.
You'll learn about new data technology and trends
and how data teams and processes are run at top companies.
The Data Stack Show is brought to you by Rutterstack,
one platform for all your customer data pipelines.
Learn more at rutterstack.com.
And don't forget, we're hiring for all your customer data pipelines. Learn more at ruddersack.com. And don't forget,
we're hiring for all sorts of roles. Exciting news. We are going to do another
Data Sack Show livestream. That's where we record the episode live, and you can join us and ask
your questions. Kostas and I have been talking a ton about reverse ETL and getting data out of
the warehouse. So we had Brooks go round up the brightest minds
building reverse ETL products in the industry. We have people from Census, High Touch, Workato,
and we're super excited to talk to them. You should go to datastackshow.com
slash live. That's datastackshow.com slash live and register. We'll send you a link.
You can join us and make sure to bring your questions. Welcome to the Data Stack Show. Today, we're going to talk with Virl and Casas. They are
doing some really interesting things for some really large companies, some of the largest
companies in the world, actually, which is fascinating. And just to give a little preview here, Virl is part of Clarivine.
And what they do is basically take unstructured context
around internal data at a company
as it exists in the form of things like marketing assets
and essentially applies a schema to them
so that there's standardization across this massive,
you know, multinational organization, which is really interesting. I want to ask him
what, we talked about the concept of a schema as we were, as you and I were covering this show.
I want to know what is a, if you think about a schema across a large
multinational organization for creative content, what does that even look like? I mean, what are
those data points and what are they, what are, you know, what kind of data are they,
are they populating that schema with? How about you?
Yeah. I want to ask him how you can build one schema to rule them all. You know, it sounds very powerful. It does.
So yeah, I really want to see like how you can build something like this,
what are the approaches, the stakeholders involved,
and how different it looks from organization to organization.
Like at the end, how much we can standardize things.
So it's going to be a very interesting conversation, I think. It's one of these problems
that you don't hear about too often, but in the future, we will have to deal with it in
smaller organizations. So it's interesting to have this conversation with him and see
how tomorrow will look like.
Yeah. All right. Well, let's dig in with Varel. Varel, welcome to the Data Stack Show. Great to be here, Eric. Thanks for having me.
Okay. So much to talk about with data standardization, especially in the context
of marketing data, which is going to be a treat for me. But before we get there,
just give us a brief background on yourself and how you ended up doing what you're doing today
at Clarivine. Yeah. So prior to Clarivine, I joined Clar doing what you're doing today at Clarivine.
Yeah. So prior to Clarivine, I joined Clarivine in 2018. Prior to Clarivine, I spent about 12 years
first at a company called Omniture, which was kind of a leader in the web analytics space.
The company was acquired by Adobe and I think it was 2010 and then spent about another eight years
at Adobe. And in my role there, I was leading up strategy around what now is the experience cloud
and also corporate M&A. So corp dev M&A, if you think about what the experience cloud at Adobe
really is, is a compilation of about 11 or 12 acquisitions done over a 10, 12 year period
that ultimately has kind of resulted in what is now the experience cloud they have there. And so I spent a long period of time helping kind of build that business, if you want to think about it that way.
In 2018, I kind of ran into a friend who had started a small company, which is now Clarivine.
And as he and I were talking, he was kind of at a point where he's like, I'm not sure what I would do with this.
We've got some great customers. We're kind of stalled out as far as growth.
We have product issues and we need to raise capital.
And so as we were talking, I kind of said, listen, I can introduce you to some people.
I'm happy.
I love what I'm doing.
And I've kind of got a four-year plan to retire.
And as we kept talking, and I saw what he was doing in relation to where I was seeing
challenges from my time at Adobe, where we
had spent all this time acquiring all these technologies and solutions and had done a
lot of work around integration at the workflow layer and in other ways and integrating those
solutions.
What hadn't happened, though, is there was no kind of standardized data model underneath
that.
Adobe's really good at this now with their kind of the adobe data platform other things and there's emergence
of cdps but there wasn't at that point any kind of focus around the data side of the integrations
and the data side of kind of standardizing that and so as i started thinking about what he was
i started looking at what he was what they were doing here at clarabine again it was like four
or five people at that time it really struck me that there's a need in the marketplace,
especially as we think about this,
what I think of as the 2010s were the kind of the decade of SaaS applications
and explosion of that marketplace to where now you have, you know,
50 to 100 point solutions in any enterprise in the marketing organization.
The problem that they're running into is they were never really architected
to work well together and even at the data layer.
And so as you,
have you seen the emergence of kind of the enterprise kind of it's going to
be a cloud-based kind of data infrastructure that's exploded in the last
couple of years and it's becoming more readily available,
even in the functional areas, it became clear to us that, to me at least, that there's going to be
a need to standardize and kind of create common language or taxonomy or if we call it dictionary,
we're going to call it across these applications, especially as that, and it has to have context,
especially as you collapse that data into these single instances in the cloud.
And so as I was there thinking about the problems they're solving,
the problems I was seeing even in our own business inside of the applications that we had acquired,
it became clear to me that if we're struggling with this at Adobe for our customers,
the brands themselves have got a bigger problem because the number of applications they're trying to deal with is multiple times,
you know, what we were dealing with just from a solution perspective to take the experience cloud to market. Fascinating. Okay. I want to take a quick detour here. So
what a fascinating experience sort of being involved in building out a product suite from
the M&A side.
I mean, that's just fascinating, right?
That's so interesting to me on so many levels.
But I was going to ask you, and you mentioned it a little bit,
looking at when we think of, especially about like marketing tooling
or customer engagement tooling and the suite of infrastructure
that surrounds that from analytics to actually the tools that
are sending messages. Did you have an evaluation criteria on the data side? I know you said there
was struggle around that, but as you're thinking about building a product suite, I think that when
you think about customer experience, evaluating the ability to layer those products in from a
data standpoint was part of the rubric, even from an M&A standpoint. How did you think about that?
Yeah, I think, you know, it's interesting.
And I think this has evolved dramatically. I think the thought,
thinking about this is I think much more mature.
I think it's much more mature today than it was like five, six,
seven years ago.
And largely it's been driven by some of the changes in the data ecosystem and
just kind of the ways that companies are looking at
their business, not so much where it was. I think back then there was more about the
silent approach. And you think about applications about how do we get efficiency and scale in a
channel? And I think what the world's turned to more holistically, I saw some a data point the other day saying before the pandemic about 30 percent of the digital the interactions with brands was was
digital now it's like 55 60 so it's a huge push forward yeah what's happened is is that when when
when I was at Adobe we were thinking about it as this application and this application there's data
here that we need to get here.
And for specific pieces.
So it's more about how do we push specific pieces of data between the applications? It wasn't kind of stepping back and looking at it and saying, holistically,
what should that operating data model look like for the marketing organization holistically because i think the industry even
back in the 2016 2017 time frame was just starting to kind of everyone's talking about
a single view of the customer and unified profile and all this stuff but the reality is is that you
had on one side ad tech solutions you you had MarTech solutions, you had
CX solutions, and they were sort of in different groups. I think what you're seeing now is the
convergence of this stuff around the experience and around the customer more so, and it's driving
this really different way of thinking about the data necessary to operate the business,
not the data to run the application and do my job in a
channel, if that makes sense. Yeah. Super interesting. I've referred to that before
as kind of the daisy chain paradigm where it's like, okay, well, I have data here and then I
need to get it here and then I have it here, but then I also need to get it here. And so you end
up with kind of this daisy chain architecture that degrades over time, almost like a game of telephone because every,
every system has its own flavor of database and data definitions and all that
sort of stuff.
Yeah. And I saw it even in a sense of like, even within there,
within the cloud that Adobe had built,
forget about integrating other applications in that are not owned and owned by
under the Adobe brand. That was even a conflict that, under the Adobe brand,
that was even complex.
That was, again, it was a daisy chain even within those applications.
And then when you think about it from the,
you put the lens on the, from the brand's perspective,
it's even, it's much more complicated
than it looks like from Adobe's perspective
or from Salesforce's perspective,
or the kind of the big, you know,
the large enterprise software companies out there.
Yeah, for sure. Okay.
So thank you for humoring me with that little detour because it's,
it's fascinating. Let's talk about the, the let's,
maybe could you use a specific example of the type of brand and the user
who's like,
I am facing this problem every day in my job and it's really
painful. And Clarivine comes in and like, this is so much better. Like describe that for us.
I'd love for you to get specific if you can of like, I was doing things this way. There are
data problems because of X, Y, and Z. And this is the new way that we're doing it.
Yeah. It might be helpful for me to even kind of back up a little bit and explain.
When I came to Clarivine, it really was we were helping analytics teams are publishing or creating reports and,
and doing analysis. What we,
what we see with a lot of our customers is they do the analysis and then they
come up with this, you know, you've got the report and at the bottom,
there's this other bucket that 25, 35, 40, 50, 60,
70% of the data drops into. And it's very degrees, depending on how,
how, you know how
complicated or how integrated they're trying to you know to report on but what we what we really
kind of were initially initially helping solve was taking data out of the other and actually
specifying it and actually putting context around it so you can actually attribute it in some way.
And so just to put a sharper point on that,
so like, and I'm just thinking through
like our data engineers and analysts
who are listening to my own experience,
it's like, okay, and reports that I've seen
or that I've like helped build data for whatever.
It's like, okay, we have, you know,
paid search campaigns as a bucket.
We have like event, you know, sort of like where,
I guess, is that what you're getting at? No, that's one type of report. It's just event, you know, sort of like where are, I guess, is that
what you're getting at? That's one type of report is just like, you know, by channel or whatever.
I mean, we've all sat in those meetings. You know, I have a finance background, but I spent
in 99, I switched over to digital marketing because I thought this is more of an analytical
problem than it's a creative problem. It's actually an interesting problem to solve. And so
I sat in these meetings before with teams where everybody comes to the meeting, you're reading out channel by channel
by channel, and you start rolling it up and the numbers do not roll up. They do not roll down.
And so, you know, you aggregate the individual numbers from the channel, people sitting in the
rooms and seats, and it's X. You look at the aggregated reports and you're like, no, it's like 0.4X or
0.6X or 0.7X. Like where, who's, who's being successful. And it's still a problem today
because a lot of that reporting is, even though it's done in, you know, in a centralized,
through centralized applications, the foundation is broken. The premise that we have is the
foundation of that data. If there's not, again, we think about as data standards,
but so that's one of the things we're helping solve
is really kind of taking and creating more specificity
and more detail and more context to that data
that improves that reporting.
That's one thing.
But it starts, that's just on the reporting side.
But if you take that out even further and say,
hey, well, the same data that in a lot of cases you're using for reporting,
you're using data to do other things like optimize spend across channels.
And I've had situations, I was talking to one of the largest consumer
electronic companies out there that, you know,
the brand is associated with a fruit and they were,
we sat with one of their larger teams and they're like, listen,
we got a problem. One out of every seven days,
we cannot optimize ad spend because we're having to rebuild all the models.
We're having to clean all the data up.
And so there's literally one day out of every seven where we're still spending,
we spend, you know, 50, a $100 million a day. We have no,
we're flying blind and then everything's delayed. And so with that,
with that organization,
what we really helped them do was to reduce the time to insight dramatically
by reducing the amount of data that had to be cleaned up in the operations
side of things. So think about where marketing ops,
data ops, and ad ops, they spend a lot of time between execution and kind of optimization,
cleaning data. And that's what we're trying to help them eliminate by adding context and
creating standards in the way that data is captured. Okay. So when we say standards and context, I want to dig into those terms, but could you just give us a sense what is the breadth of data? Because certainly that's a huge contributor
to the problem is that
you have a huge variety
of different types of data
coming from a bunch of different places.
Absolutely.
And to be clear,
we are not a data collection,
like an analytics or data collection application.
The way I describe it is,
and there's a company that I think
in the 80s and 90s called,
it was BASF.
And they're like,
we don't make the products you use every day.
We make them better.
And so our opportunity is to improve the ROI and the value that our customers are getting
from other applications or analytics, whether it's analytics, whether it's CDPs, whether
it's ad serving or whatever else they're doing, whether we're spending dollars.
And so what we really are, when we think about that types of data,
it is clickstream data,
but it's not saying we're collecting
the clickstream data and replacing it.
It's appending onto that clickstream data
context about a campaign or an experience.
It's appending to content,
standardized data about the content itself
in relation to all the other content in the
organization and across different dams, across different CMSs. And so it's really trying to take
this complexity that exists in marketing organizations. Today it's marketing. I think
there's other applications outside of marketing that we're talking to companies about, but
it's really trying to take some of that complexity and create a layer underneath of it or alongside it that has standards around it,
that is attached to or can be attached to that data to enrich it, extend it, and also create
meaning between some of the data that right now doesn't have necessarily, you know, really,
really great ways to associate it. This may not be the right way to think about it. So tell me
if I'm off here, but it almost sounds, I mean, this sounds amazing. It almost sounds like you're a schema designed for full visibility and stamping it on the data across every
data repository? I actually think that's a great way to describe it. I think that's a simple way
to describe it because it is almost like an imprint against that. You know, it's not, it's not that we're collecting the, the,
the behavioral, the streaming data. It's, it's a, it's a set of data that gets appended to that
or stamped to that as well. Yeah. Yeah. Because like, if you think about like stamping a schema
on a certain set of data, like there's a Delta, if it doesn't, you know, contain all pieces of
the schema, maybe I'm extending that metaphor a little too far. Kostas, let me know. But okay, I have two more questions. I know Kostas has some as well. The first one is, could you give some
examples of the context that you add on to a specific type of data or a couple types of data?
Just like you mentioned, advertising performance or even content,
which is interesting, like in the context of digital asset management or other things like
that is pretty fascinating. What does that look like? You have a piece of content, what are you
adding to it? You have advertising data, what are you adding to it? Yeah. So if you think about
content, there's a lot of situations where you have people creating content which is more kind of the content
creation you i'll call it the creative side of things you've got the content side of things
where all of a sudden it gets you know loaded up into a cms what you have happening is you have
typically people creating it that have a creative brief and all this context around
this piece of content was created for this purpose, for this business unit, for this stage of funnel,
for this geography, for this demographic,
for, you know, it's all that information and insight
that sits in the creator's head.
The problem is, is that once that stuff goes in the dam,
there's A, there's not a great way to,
you've got creators all over the world.
Like you think of a large multinational organization, you you have creators and agencies you have creators internally you have
people all over the globe they're speaking different languages like how do you create a
standardized language across all those teams peoples and geographies and business units and
that's really what we kind of help them provide and so instead of having once it gets loaded into
the cms instead of trying to have the you you know, the content, the people that are loading the content into the CMS, add that context, that piece of creative, that idea with that creative is actually associated with a bunch of context in our application that allows them to really kind of create a different way of solving this problem. That's really interesting because the,
anyone who has worked with data knows that, like, I mean, I don't want to be too incriminating here,
but relying on human input, you know, for critical data is never a good idea because people are always going to get it wrong, you know,
fat finger it like it's, you know, it's just, it's the least reliable way to capture data in many
ways. Yes. And it's interesting though, when you, so if you think about it, though, those individuals have all the context and a lot of the context around what is actually happening.
And so in some ways, it's funny.
I was talking to one of our customers about this, and they said, this woman said, you create the illusion of choice inside of our application for the end user in that situation where there's an end user in the application that really kind of forces them down a path that limits the errors that can
that can kind of get created so you're almost kind of forcing a set of decisions on a much smaller
and depending on who they are what channel they manage what you know there's all sorts of controls
that you can build and logic you can build around what level of choice you create.
And as data is, you know, through integrations, there's a lot of context you can get from the
integrations of where other data is coming from that help kind of inform what options we should
or shouldn't give them. Yeah, that's interesting. It's kind of taking a consumer app optimization mindset where you define very clear pathways and success for the user and
applying that to internal creators inside of a business, which is super interesting.
Could you give one example maybe on the paid side, just so we have another example of the context that you layer on to a particular type of data, like advertising data,
is that performance data or? Yeah. So we work really closely with a lot of our customers,
both internally and with their agencies around that performance data. And so what we're helping
them add into that is in some cases, there may be data fields that are collected in one application that are not available in other applications or the way they name fields in one application are not consistent with other applications.
So it's trying to help solve naming kind of differences in the way that fields are named and we can do some mapping for them.
The other way to think about it is, again, similar to a creative brief, think about a campaign brief. There's all sorts of
context and insight in that campaign brief, which are typically managed inside of spreadsheets
that we help to onboard into kind of the enterprise data, how to say this, the data model,
if you want to call it that sure it's it's things
around what stage of funnel is the campaign on who was that what was the segment what was the
creative and it's it's mapping that creative then back to standard you know mapping it's creating
almost like a way to map even across elements of an experience that that just not are not
specifically and standardized data across those elements of an experience that
are not necessarily just specific to the campaign itself. Super interesting. Okay. I have one more
question and then I'm going to hand the mic to Costas because I've been monopolizing this
conversation. So how do you do that is my next question, because that sounds very complex,
especially when you're thinking about,
you know, organizations. And I mean, it sounds like, you know, sort of fortune 500 level,
100 level companies that are just massive, complex organizations that, you know, are producing
content across who knows how many vectors and business units and product lines and all that sort of stuff. So how do you do it?
Yeah, it's interesting.
So where we thrive, where I think the opportunity exists for this is in those organizations
that where there's more complexity and you're hitting on it.
So we have a customer, I think one of our customers, they have about 700 users across
the globe, both internal and agency users, specifically around
standardizing taxonomy around content. And in that situation, we are baked into the workflow.
When you're going through that creative process and submitting creative briefs and things,
we are integrated into the workflow and capture data, in that example, out of Workfront.
And so it's through integrations that we get access to data.
And it's through integrations as users are adding data into fields either in other applications, we derive that information into our application, into our solution.
What we have at that point is the ability to compare what was input to what the
available standards are and identify where there are differences between what is maybe input
manually in another application and what the organization identifies as the standard around
that field or that attribute in the data. And we're able to identify where there's breakage in
that, either through our
solution automate the correction of that or allow the individuals and organizations to surface those
areas where there are problems, fix them in our application, and then identify ways to enforce
it upstream. This is very interesting. I mean, I don't know if I'm going to be a little bit too technical,
but I cannot stop thinking all this time on how do you design a data model like this?
Where do you start?
In my mind, when we are doing data modeling, because it's like modeling in general, right?
What we are trying to do is create, let's say, an abstraction of real worlds.
So there are like
two ways that i can think that you can do that go like high level and be like okay what i want to do is like i want to model the marketing domain what are the main concepts that we have there what
are like the main processes that we have and try like to create a data model around that and then
all the data all the instances that they come come from different applications go there and try to connect them on this, align them with this data model. The other ways that go
from the application level, which is the other extreme. I have these applications, they have
10 different data models. Let's align these 10 different data models and see what happens at the
end. But my feeling at least is that none of them is at the end. But my feeling, at least, is that none of them is, at the end,
like so successful.
Like you need something else, something in between, probably.
So how did you do that?
Yeah, so I, by the way, I agree with what you're saying, those options.
And the way we think about it is it's not necessarily an either,
it's an either or.
It should be much more of an and.
So it's interesting where we come into organizations
and it's becoming more and more this way,
I think as companies are becoming more mature about this
and we've seen it specifically in the last 18 months,
is we are now sitting in situations
where it's not just the marketing organization
sitting in the room talking about the data model and the data taxonomy.
What's happening now is they're bringing in the enterprise.
And I didn't know these people existed, but they're all over out there.
There's enterprise data taxonomists.
It's the enterprise kind of architects and data architects that are coming in and working with the marketing organization to kind of do exactly what you're saying, which is some of this is going to come from the
application side and how do we, how do we connect data across the different applications? And that's
some of what we, we help them do is to really kind of string, create relationships in the data
that don't natively exist in, in the applications themselves. And then secondly, it's coming from the top down saying,
hey, there are other attributes and elements
that we want to capture that are really specific
to the enterprise that have nothing to do
with the applications that we want to be able
to incorporate into that model.
And so some of that becomes, you know,
the way we think about it is,
I'm thinking of just a simple example is,
there's associations between like, for example,
if you think of a car manufacturer, like a Toyota, for example,
there's associations between make model and then other sorts of, you know,
trim packages and things like that. Just think about, think about from,
from that perspective, like a data model for a car manufacturer,
like there's also, and those are, those are all related and they are not and they're and they're
exclusive in some situations like if i have this mod if i have this manufacturer like lexus versus
toyota i only have certain models available so you can exclude things and so some of that becomes
work that we do with them in helping and some of what we do internally we also work with some of
the large si's out there that help work. They help them work through a lot of this coming up with the right data model
and identifying what are the relationships that we're missing
that we want to try and enforce in this data model
that don't exist through the applications themselves.
And so it's a little bit of kind of meeting both ways
if you want to think about that.
Can you give us an example of such a relationship
with, I don't know, like... Yeah, I mean, I think the example I gave really with creative.
What you have in creative, for example, is if you think about an ad server, an ad service,
you know, one of the elements of an ad campaign is the actual creative. And in a lot of cases,
the way that creative gets named inside of the ad serving
solution, because in some cases it may be a dynamic ad that gets created, has nothing to do
with the way that the asset or the creative is named or identified in the digital asset management
solution or other applications you have. And so one way I think about it is, and then if you have
assets that you're managing inside of your dam and it's got a bunch of metadata with their attributes about
that piece of creative, how do you associate that creative with that campaign when the way that
the IDs are not, there's no linkage between them. And so that's where we, and then how do you take
and associate that campaign, that creative with all these other attributes?
That's kind of one simple example of how we help them create those relationships and stitch these things together that don't naturally exist within the applications.
Because they weren't necessarily architected and they weren't meant to work that way.
Yeah, yeah, of course. And if I understand correctly, this model that we are
talking about is a combination of, let's say, what traditional alcoholists schema, which is more
about like relationships and taxonomies. Yes. So, or is there also something else or it's just this
too? Yeah. I mean, the other way to think about it is, you know, we work with a lot of, a lot of
our customers work with agencies. Like I'm thinking
about one, one of our customers right now that has, I think almost a hundred agencies around
the globe they work with on the media side. And each one of those agencies is executing and
trafficking media. Well, they have trafficking sheets and in those trafficking sheets, there's
naming conventions around how they're naming certain things. Well, agency A and agency B and agency C, if there's not,
if there is not a way to enforce and take them out of the spreadsheets and
enforce the way that they're naming that, you know,
creating naming conventions and naming data fields similarly,
then everything you can't actually extend your data model out into that.
It's very difficult to extend the data model on the enterprise side out into the agency when you have that much complexity across the agency team.
So if you think about it, we're helping the brand extend the enforcement and the use of that data model,
not just within their own teams, but even outside of
the organization as they think about their business, because it extends, you know, execution
happens within those agencies as well. So you may have sheets that are naming convention seats
that are around trafficking or trafficking sheets. And if you've ever seen one of these things,
these are spreadsheets that have, you know, they're on version 137 and they're they're you know hundreds of rows why columns
wide and they've got multiple layers to them and they're trying to pull in the creative they're
trying to call the data about the audience and it's they're fraught with errors and the and when
when and our point is like there's actually a lot of valuable data in that, in that, that should be part of the enterprise kind of data model and data and data ecosystem and data store.
And so that's, that's one way to think about it is, is that we're helping those situations where information workers are working in spreadsheets and pulling them into an application. Yeah. So from what I understand, like, okay, there are, let's say, relationships and also like
constraints probably.
So you can, let's say, constrain the way that things can get associated with or like what
values they can take and like all that stuff.
How do you represent that?
I mean, that's a bit of like more technical question, but like, what the technology looks like to represent on something that on a way that
like the machine can understand it and force that, right?
Like what technologies are used?
Like how do you build your product to do that?
So the underlying technologies?
Yeah.
I mean, a lot of this comes down to,
I don't want to get into a technology,
like what, what programming like that stuff. But what I would say is this is that
when we think about this problem, we think about it's very similar to what you're talking about,
which is there are known relationships between data and the data model between data and the data model. And there are relationships that need to exist
or there are fields of data that need to be part of the data model
that aren't kind of naturally and natively in there.
So we can help them bring those in.
We also have all sorts of logic in the application that says,
hey, if this user in, again,
it's coming back to knowing who the user is,
what functions they're responsible for, what campaigns they have,
like for example, on a campaign side, what geography, what brands,
there's all sorts of way to limit down the number of available fields and the
number of available options that you,
that you expose to a user based on
what you know about the business, what you know about the application or the area of
execution, and what you know about the user themselves.
Okay.
So let's say we get all the architects and the taxonomists and you all together and we
generate this unified, let's say,
model for a big corporation.
Who is consuming that after it is done
inside the organization?
Who is the main consumer?
Yeah, so what's interesting is,
if you'd asked me that question a year and a half ago,
what I would say is, two years ago,
the primary consumer of this data are the analytics teams within the within the marketing organizations
what's happened now though the last year and a half again some of this is the result of not
necessarily what we're doing but it's i think it's a bigger trend that's happening within the
enterprise is as more of the functional teams have access to data, you know, large kind of scale cloud-based data infrastructure, they're moving more of that work and payload into those, you know, whether Snowflake or other areas.
So we're pushing data now downstream into some of our customers into their BI teams.
They are using it to go into their, they're pushing it into their BI teams. They are using it to go into their,
they're pushing it into their CDPs.
They're pushing it in some cases into
whether they're using Snowflake or other applications,
they're pushing it into their machine learning
and AI infrastructure.
Because what's happened, what they're realizing is,
again, the size of data that meets the quality standards
to point machines at to make decisions on, you know, better decisions on behalf of humans is really valuable if they're scale.
If they're not scale to the data and scale the data is really a byproduct of in some ways of how much the quality of the data and the relationships that exist in
the data so what as i talk about the creation of relationships and i and the improving the quality
of the data through standards it really is all those kind of different applications are areas
where our customers are pushing the data we have data transformation capabilities in our application that allow them
to, you know, to either we are directly integrated or to push it out to, you know, an AWS bucket in
a certain format and then be able to capture it and pull it in. But more and more customers are
wanting the native integrations so that changes happen in real time. And it's not just about us
informing downstream, but the other way to think about it is, you know,
I'm thinking of one of our customers who has a very quickly changing set of
inventory. It's a, it's a large athletic shoe manufacturer.
And as they are constantly, constantly releasing products,
it's how do you keep an up-to-date product catalog available to your
marketing organization and other users that are creating campaigns, creating content,
all those other things. How do you expose that to them in a way that it's up-to-date and it's
limited to their geography or the channels that they're selling into versus, you know,
because they have channel conflict and other things.
So it's managing things like that as well.
And being able to expose, if you call it upstream, product kind of like that and other data into
the marketing organization and other parts of the business that actually have logic associated with it and allow them to, again,
limit the number of selections and the variance or the variety and the choices that they need
to select from to actually get data into this model, if you will, like all that.
Yeah, makes total sense. So if I understand correctly, like correct me if I'm not, I see like two main, I mean, at least two ways that like value is delivered like in the organization.
One is that it's easier, let's say you make data easier to be interpreted by people because of the standardization that goes there with the data model. And the other thing is like data quality, right?
Like when you have a reference schema and the taxonomies,
which also like add a lot into the quality aspect,
you can increase and monitor like quality a lot.
But when it comes to quality, there's always something can go wrong.
Someone will mess with the data, let's say, right?
So what happens then?
Like how the tools that you have in place, like the product that you have, can help or not help?
I don't know.
I mean, with addressing the issues that are created.
Yeah.
I mean, so one way to think about it is we think about solving data quality.
Traditionally, you think about data quality has sort of been solved reactively in the data pipeline or with ETL downstream.
There's lots of ways it's been solved downstream because when you get ready to actually utilize the data in runtime, you realize that we got, you know, we got problems. So it's
trying, there's a lot of that gets fixed downstream. We see it differently. We look at it and say,
listen, if you put in place data standards on the front end, there's a proactive way to solve a lot
of the data quality issues. We're not going to solve, we're not, it's not about solving world
hunger, but it's about solving a set of problems that are kind of type two problems
where you think about context and things that really enable the organization to create a way
to bridge between the creative side or the creatives, if you want to call that, and the quants
and actually create a more holistic way of thinking about data quality. Because typically
data quality is that the problem gets kind of shoved
downstream and it's data engineers, data, you know,
data analysts and data scientists that are dealing with data quality.
We think that there's a better way to solve it,
which is if you put tools in place that enable the information workers on the
front end to solve some of this,
then it benefits downstream.
You're not going to solve everything.
We also have situations where customers are revalidating data.
So they're running data back through our application to, again,
always banging it up against the standard to make sure to see where there's,
where there were their problems.
And that's kind of how we think about it is it's a very different way of
thinking about solving data quality rather than fixing data quality. Makes sense. Makes sense. No,
that's very interesting. And okay, we talked about like how we can make like the data easier to be,
let's say, managed by machines with the validation and like all that stuff. So what about
interpretability? Like how this model that has been created at the end
can be communicated to a huge organization right because from what i understand we are talking
about like organizations that are really really big you might have like i don't know hundreds of
stakeholders that they have to agree on these definitions that this schema has so how does
this work how how how technology can help and how organization can help? Because I
assume that probably like the solution is somewhere in between. It's not like just a
technical problem. Yeah, it's interesting because I think what we've seen, and we're seeing it less
so now, I think there's more thought being put into this kind of proactively within the
organizations. But really, organizations have to come to,
in some ways, an agreement on what is that data model
that we're going to use to operate.
Not that it can't change,
but we typically get involved with them
in a situation where they sort of have that solved
or they're close to having that solved.
And we may help them,
because we have so much context from all of our other customers,
we may come in and say, Hey, you may want to think about these other,
these other fields of data that may be important to your business.
There's, you know, there's, there's company X over there that has,
that looks it's in a similar field or whatever.
There are other attributes that you guys should be collecting that you're not,
you're not identifying and standardizing.
And so some of that we come in and help them with,
but a lot of times that's either being done internally with,
like we talked about earlier,
the architects and the business users and, you know,
like for marketing or there's situations where we get brought in where
they've already engaged with a Deloitte digital or an Accenture.
And it's a, it's the, how do we deploy and activate
this model that we built and make it into the organization? Because what we've seen is
organizations have data taxonomies. They sit in these taxonomy solutions, but they're not
connected to anything. So it's kind of like, that's great, but it's sort of shelfware. And so it's about making it active
and making it actually usable and functional by the information workers. And that's kind of what
we connect that, we kind of connect the, if you want to talk about it, we connect the architects
and the taxonomists with the business people.
And that's, that's ultimately who needs to solve it.
But there's,
there's a lot of cases where it has to be facilitated by third parties that
have been through this many times before, but it's surprising how many,
how many organizations themselves know this stuff better than anybody.
It's just getting,
making it a priority and getting the right people in the room to actually
have those debates and have those discussions.
Makes a lot of sense.
One last question from me.
I mean, you have implemented this process in many different companies.
How different the implementation is from company to company
and how much you can end up and say, like, okay,
there is at the end one distilled model for marketing that makes sense for
pretty much everyone. Okay.
There are like edge cases and some specializations on that,
but this model, like if you take it, you pretty much understand,
like let's say 80% of what marketing is like in an organization.
Yeah.
The way I think about that is there are, I'll call it,
elements of those models that are fairly standard.
There are also big portions of it that are unique in the sense
that each business, if you think about context,
each business is organized and structured differently.
And I saw this with one of our customers and we see more and more where our data is now ending up in the finance organization,
because what they're realizing is, wait a second, we're trying to do profitability analysis and all sorts of analysis from a finance perspective, but we don't necessarily have a way of looking at the data
as it relates to the way we think about the business
from a P&L perspective around business units,
around contribution margins by business units,
by product lines and things.
And so when you think about product lines, business units,
the structure of the organization,
that's the part of it that becomes sort of unique.
It's not like every single one of them is different,
but the elements of that really have to be customized
and be aligned with the organization itself.
And so there are pieces of it that I would say are standard
and there are elements of it that are unique to that organization,
which is really where you think about organizational
context versus the channel or the application context.
Okay.
One more question before I give it to Eric.
Last one.
I promise.
So how much and how often do you see these models changing inside an organization?
And is there like some indication, like some dimension of the organization that whenever it changes, let's say also these reflect back to the model?
Like are there some kind of connections there?
Yeah, I mean, again, these models change as they're adding. There's a difference between the model changing and the underlying data itself or the available attributes in those fields, if you want to call it that.
For some customers, like I mentioned earlier, that stuff is constantly changing.
And there's a bunch of logic around that.
The models themselves do change, but that is a much more of an organizational discussion. And there's controls around that. The models themselves do change, but that is a much more of an organizational
discussion and there's controls around that. And there's even ways that particular functional areas
or users or geographies can actually, you know, there's no reason they can't modify their model,
but it's being able to say, hey, this is the standard model that we've agreed on as an
organization. And there's these attributes that are added by this geography. And it's how we
treat them differently when we do analysis, understanding that this was built in there for
a specific purpose that has nothing to do with the way we think about looking at it holistically
from an organizational perspective. And so there's some of that nuance that we can, and we help them manage that and understand and put a lot of controls around
who, where, when these models can be changed, because you can't have people, you know,
you have to be very specific about who has that ability and who has the rights to do that. So
some of us around governance and access, but yes, they do change, but it's not, it's a, for the most part,
the core kind of capital D data model
versus the lowercase d.
There's a difference in how and when those change
and how frequently.
Yep, yep.
Super interesting.
Eric, all yours.
I'm loving this.
I mean, just the concept of, you know,
I think, Varel, what's so interesting to me is that this kind of data is so rarely talked about within an organization. Yet, it's the kind of data that makes existing data so much more valuable, which is super interesting. I may be wrong with this parallel, but one thing that really strikes me as I think
about what you're doing is that in many ways, like you talk about, let's say the creator
who has a ton of context, right? That feels very similar to taking unstructured data
and applying structure to it. And when I think about that, one of the, which this is going
to sound very buzzwordy, but I start to think about things like machine learning or graphs,
you know, networks where you can discover connections that previously were undiscoverable because everything was unstructured.
Is that something you're exploring, something you already do? I'd just love to hear about that.
Yeah. I mean, I think what you're hitting at is, and this goes back to what I was kind of
mentioning earlier, this kind of connection between quads and creatives and things, is that
again, where we're laser focused.
And I think, you know, as you step back,
every organization out there,
whether they know it or not,
actually is dealing with this problem.
And most of them recognize that they're not sure how,
and I think to a large degree,
we come in in situations where
we'll be brought in by one group
and they'll bring other people to the meeting,
you know, other groups to the meeting
and people sit down and within five minutes are like, oh my gosh, yes, this
is a huge, we didn't, we've been, we know this is a problem.
We've not talked about it.
And we've sort of pretended like it didn't exist because there's not a lot of people
up from us that really understand it.
So we just kind of like shove it under, keep shoving it under the carpet, but it's becoming
a bigger and bigger problem as the scale gets bigger.
Right.
And the gaps are getting wider. But as you're talking about almost like a
neural network, like in some ways, I don't, I'm not sure that's our problem to solve. I really
think that what we're trying to do is empower, like enable our customers, you know, their machine
learning teams, their kind of data science teams to use the technology they're using, but augment it with this data
or this information or this context.
And that's really kind of how we see the world.
And we don't care.
We're very, you know, we look at this as we place Switzerland in this
and then we want to make this available wherever it's needed to be available
or where it adds value.
And we will integrate upstream wherever we know there are challenges in getting context into either campaign data or performance data or whatever.
That's super interesting.
And it totally makes sense. I'm just thinking about use cases where,
let's say you have a, you know, a pretty wide set of product lines, you might discover something with the added context about the relationship between product lines and a particular subset
of consumers who meet certain demographics that would be hard to discover, you know,
just with your basic, you know, clickstream and purchase data, for example.
That's super interesting.
So you're essentially providing a data set that a machine learning or data science team
could use to draw that conclusion.
Super interesting.
Yeah.
And again, that's that data.
It's like you said earlier, it's kind of stamped on the behavioral or that other data.
And so it gets stamped onto it in a sense.
Yeah.
Okay.
One last question, because we're getting close to the buzzer here.
Could you just give us one example?
So the marketing and sort of content asset use case makes total sense.
You mentioned earlier in the show that there are some other contexts where it makes sense.
And some of those popped to mind for me, but I just love to hear from your perspective, what are other areas of the
organization where this kind of sort of, you know, unstructured context to structured stamping,
if you will, makes a lot of sense. Yeah, it's interesting. We've really been pretty laser
focused on marketing, but what's, what's interesting is we had a, we had a situation
recently where we were brought into,
it was interesting, it was back to kind of thinking about the Adobe situation where you're
talking about integrating multiple solutions that have been acquired. And it was a company that
acquired a number of demand side platforms, DSP solutions, and it's solutions for buying,
you know, programmatic buying and selling of media.
The problem they ran, it's interesting, they were trying to use, I think it was Informatic or some other solution to try and map data together that allowed their finance teams
to appropriately build clients across platforms.
Oh, interesting.
You know, to expose inventory across the different platforms,
having the one team sell it and having one buyer and the problem they're running into,
it came up, I guess, and it was, it came, the reason they came to us is the problem got exposed
by auditors and was something that we're going to have to disclose their massive public company.
It was going to, they were gonna have to disclose in their financials. It was a couple hundred
million dollar problem. Like ultimately they were like, there's a billion dollar opportunity here that
we can't actually get access to because we got this underlying data problem. Well, think about
not just like campaign creation or content creation, but if you've got a sales team that
is creating sales orders and around inventory that is being bought
from different, different, through different DSPs or different platforms.
How do you, how do you standardize that down so that you know, and are able to associate
the stuff, you know, that, that inventory, that the, the fulfillment and that stuff together.
And that's really what they brought us in for is to really try and help map that.
And I think that is probably, to me,
that kind of, it was one of those situations
where you're like, okay,
I would never have thought of us as a solution for that.
And so it kind of opens my mind.
And again, we've been so focused on where we're going
that there are other applications.
Anywhere you have people interacting with applications,
I think it's outside marketing.
You've got people on supply chain.
You've got people on sales and other places.
It is the same.
There's a similar opportunity in all those situations.
And we've chosen to start here.
It's mainly because that's our DNA.
And we see the problem as being something that our customers are seeing
as a big
challenge. And we, and I think we can, we, we just feel like that's a natural place for us to start.
And, and we get pulled into other, you know, and the content thing came up through another customer
that pulled us in saying, Hey, we've got this problem. I think you guys can help us solve it.
So that's kind of how it came through a customer. Yeah. Fascinating. It makes total sense when you think about going to your
example with finance, reconciling transactions that have happened that relate to inventory
across distinct siloed platforms is essentially a mass reconciliation problem. You know, super interesting.
Well, Verl, this has been such a fun episode.
I love talking about data and uses of data
and context for data, you know,
sort of outside the standard stuff that we talk about.
And it sounds like you're doing
some really fascinating things at Clarivine,
sort of bringing that to light.
So thank you for your time.
And thanks for sharing with us.
Thank you.
It's my pleasure to be here. Thanks a lot, Eric. Kostas, thank you.
I think my big takeaway from the episode is that I really started thinking about context more,
Verl mentioned context, and he talked a lot about people who are doing certain types of work, right? They're producing work. And
in that context, it was marketing assets, right? A piece of content or a campaign.
But I just thought about all the touch points across an organization where people are producing
work and the amount of context that's in their head is unbelievable. And in many ways,
that sort of is what brings value to the work. And so that whole concept is just fascinating to me
about how you sort of mine, like, how can you mine that context and actually turn it into
actual physical data, you know, in a defined schema, I think I'll be thinking about that all week because just, you know,
from a philosophical standpoint, it's pretty,
it's pretty interesting paradigm.
Yeah. Yeah.
One of the things that I'll keep from this conversation is that first of all,
spreadsheets are still the king.
They're never going away.
Yeah. Like cockroaches, man man like you cannot get rid of that
is that why they named cockroach db cockroach db i know i mean that's uh yeah we had an episode
about that but we didn't discuss the name because a little bit controversial but oh that's right
that's right yeah i think i i mean for me like what was like super interesting is that there are
there are roles inside organizations,
like really big ones that we didn't even think about, like having people that they have to
build and maintain data taxonomies, for example, which is pretty amazing.
And together with data architects, you have the people who are creating at the end, let's
say, data representation of the whole organization that needs to be communicated to everyone.
What I'll keep from the conversation that we had
is that these problems at the end,
and I think this is like not just,
it has to do with data in general.
At the end, success is like figuring out
the right balance between how much technology can do
and how much humans have to agree and
how we can do both and do both well.
So that's what I keep from the conversation.
And I'm really looking forward to see when similar products will also hit the market
for smaller companies and medium-sized organizations yeah i mean certainly a multinational
corporation the pain point is severe simply due to size and complexity and so the problem is
exacerbated but it's been we've had similar problems at every company if i've ever been
really small yeah Yeah, 100%.
I mean, yeah, I agree.
I don't think that this is a problem
only for the organization,
like very large corporations,
just that they cannot survive
without solving these problems.
That's the difference.
That's why I'm saying that
we are going to see at some point
products that they try to address
these problems also,
like startups or like for smaller companies
or like for medium-sized organizations. I agree. All right. Well, thanks for joining us on the
Data Stack Show. Fun topic for you today, and we'll catch you on the next one.
We hope you enjoyed this episode of the Data Stack Show. Be sure to subscribe on your favorite
podcast app to get notified about new episodes every week. We'd also love your feedback.
You can email me, ericdodds, at eric at datastackshow.com. That's E-R-I-C at datastackshow.com.
The show is brought to you by Rudderstack, the CDP for developers.
Learn how to build a CDP on your data warehouse at rudderstack.com.