The Data Stack Show - The PRQL: If Everything Is Data, How Can We Make Sense of It All?

Starting point is 00:00:00 Welcome to the Data Sack Show prequel. We just recorded a show with Virl from Clarivine, and I loved the show, Costas, because we talked about a type of data inside of an organization that I probably never would have thought about unless we talked with Virl. And I'll give just a really brief synopsis to sort of whet people's appetites. We talked about taking context for data that exists inside of an organization. One example that we used was maybe a marketing asset or a marketing campaign that someone creates. Essentially mining the context that lives in someone's head or in a team around why the work was created and for whom it was created and standardizing that into a

Starting point is 00:00:55 schema that spans the organization. And they work specifically with really large multinational organizations, some of the largest companies in the world. And I'm going to ask for your reaction to this. One of the thoughts I had after we recorded the show was that sort of taking unstructured data, even that lives in someone, a, an actual item that, you know, that lives inside of an organization and turning that into physical data, as it were in a schema, that to me is very evocative of like, we're turning literally everything into data. I mean, in an extreme sense, it's like they're taking thought and turning it into data in some ways. Now, that's extending the metaphor a little bit. But what do you think?

Starting point is 00:01:51 Yeah. I have to share something with you about myself, Eric. It's quite personal. I can't wait. I hope it's very interesting. Before I started working with more down-to-earth technologies like data pipelines, CTL, IoT, and all that stuff, I spent quite some time working with the real Web3. Web3, back in the late 20,000s, was what was called back then the semantic web. The semantic web was all about how we can create

Starting point is 00:02:27 context around data and how we can reason automatically around data. So that was like, there were like two main things, two main components around that. Okay. One was like metadata, which is what can represent, let's say the context. And then, because you also have constraints and rules and stuff like that, you need an ontology that also encapsulates and codifies, let's say, in a machine-understandable way, these relationships and these constraints. Anyway, Web3 never happened, back then at least. I don't know, probably it will come back as a crypto token or whatever. But it's very interesting to see that it's not the first time that humans are trying to create and standardize the way that

Starting point is 00:03:14 information is represented. Back then it was mainly like ontologists were used a lot. They're still being used in things like medical research for example because you need like among all the practitioners that you have to agree on the semantics of the data that you are working with right and that's like something similar that you also see here now now the thing is that to reach the point where you really need that, you have to be a huge corporation where you have, let's say, so many different departments and so many different tools and so many different people working that have to consume that, that you need to build like a consensus around how to think and how to represent things where context is like becoming super important because otherwise you cannot govern this thing, right? Like how you can govern Coca-Cola, for example, or something like that when it comes to data, right?

Starting point is 00:04:12 And you know my opinion when it comes to data technologies that it's the opposite of SaaS, right? What we were saying, that innovation actually starts from the really big corporations and then goes down to the smaller companies. So I think that as we commoditize all the tools that we need for the infrastructure and we have access to the data and all these things, we will start figuring out that now, okay, that's good. We have all this data.

Starting point is 00:04:44 Now, how are we going to make sense out of this and how we can go to make sense out of this in a structured way and in a way that we can keep track of? I think we already start seeing some interesting products around that, like stuff like headless BI, metric repositories. All these are like ways that technologies to standardize the way we represent and communicate the results of data. But we probably need more than that. And if you think about it, think about the company, even like Rudderstack, right? Pulling all the data from all the different applications and put them into one data

Starting point is 00:05:22 warehouse, we probably have hundreds of tables there. How do you make sense out of this? How do you know what connects with what in a meaningful way? And how you can constrain how this data is going to be interacted? That's exactly what we were discussing about today. It might sound a little bit more high-level than what data engineers are working working with like every day. But I don't think we are that far away from having to deal with this problem like in every

Starting point is 00:05:50 organization. I agree. And if you thought that was a fun conversation, dig deep in the next episode with Verl about Clarivine because that's exactly the conversation we had. It's a really fun thought exercise talking about a very different type of data than we normally discuss. So be sure to catch us on the next one.

The Data Stack Show - The PRQL: If Everything Is Data, How Can We Make Sense of It All?

Eric and Kostas preview their upcoming conversation with Verl Allen of Claravine. ...

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.