The Data Stack Show - The PRQL: If Everything Is Data, How Can We Make Sense of It All?
Episode Date: February 25, 2022Eric and Kostas preview their upcoming conversation with Verl Allen of Claravine. ...
Transcript
Discussion (0)
Welcome to the Data Sack Show prequel.
We just recorded a show with Virl from Clarivine, and I loved the show, Costas, because we talked
about a type of data inside of an organization that I probably never would have thought about
unless we talked with Virl. And I'll give just
a really brief synopsis to sort of whet people's appetites. We talked about taking context for
data that exists inside of an organization. One example that we used was maybe a marketing asset
or a marketing campaign that someone creates. Essentially mining the context that lives in someone's head or in a
team around why the work was created and for whom it was created and standardizing that into a
schema that spans the organization. And they work specifically with really large multinational
organizations, some of the largest companies in the world.
And I'm going to ask for your reaction to this. One of the thoughts I had after we recorded the show was that sort of taking unstructured data, even that lives in someone, a, an actual item that, you know, that lives inside of an organization
and turning that into physical data, as it were in a schema,
that to me is very evocative of like, we're turning literally everything into data. I mean,
in an extreme sense, it's like they're taking thought and turning it into data in some ways.
Now, that's extending the metaphor a little bit.
But what do you think?
Yeah.
I have to share something with you about myself, Eric.
It's quite personal.
I can't wait.
I hope it's very interesting. Before I started working with more down-to-earth technologies like data pipelines,
CTL, IoT, and all that stuff, I spent quite some time working with the real Web3.
Web3, back in the late 20,000s, was what was called back then the semantic web.
The semantic web was all about how we can create
context around data and how we can reason automatically around data. So that was like,
there were like two main things, two main components around that. Okay. One was like
metadata, which is what can represent, let's say the context. And then, because you also have constraints and rules and stuff like that,
you need an ontology that also encapsulates and codifies, let's say,
in a machine-understandable way, these relationships and these constraints.
Anyway, Web3 never happened, back then at least.
I don't know, probably it will come back as a crypto token or whatever. But it's very interesting
to see that it's not the first time that humans are trying to create and standardize the way that
information is represented. Back then it was mainly like ontologists were used a lot. They're
still being used in things like medical research for example
because you need like among all the practitioners that you have to agree on the semantics of the
data that you are working with right and that's like something similar that you also see here now
now the thing is that to reach the point where you really need that, you have to be a huge corporation where you have, let's say, so many different departments and so many different tools and so many different people working that have to consume that,
that you need to build like a consensus around how to think and how to represent things where context is like becoming super important because otherwise you cannot govern this thing, right?
Like how you can govern Coca-Cola, for example,
or something like that when it comes to data, right?
And you know my opinion when it comes to data technologies
that it's the opposite of SaaS, right?
What we were saying, that innovation actually starts
from the really big corporations and then goes down to
the smaller companies. So I think that as we
commoditize all the tools that we need for the infrastructure and we have
access to the data and all these things, we will start figuring
out that now, okay, that's good. We have all this data.
Now, how are we going to make sense out of this and how we can go to make sense out of
this in a structured way and in a way that we can keep track of?
I think we already start seeing some interesting products around that, like stuff like headless
BI, metric repositories.
All these are like ways that technologies to standardize
the way we represent and communicate the results of data. But we probably need more than that.
And if you think about it, think about the company, even like Rudderstack, right?
Pulling all the data from all the different applications and put them into one data
warehouse, we probably have hundreds of tables there.
How do you make sense out of this?
How do you know what connects with what in a meaningful way?
And how you can constrain how this data is going to be interacted?
That's exactly what we were discussing about today.
It might sound a little bit more high-level
than what data engineers are working working with like every day.
But I don't think we are that far away from having to deal with this problem like in every
organization.
I agree.
And if you thought that was a fun conversation, dig deep in the next episode with Verl about
Clarivine because that's exactly the conversation we had.
It's a really fun thought exercise talking about a very different type of data than we normally discuss. So be sure to catch us on the next one.