The Data Stack Show - 243: The Data Economy: Turning Information into a Tradable Commodity with Viktor Kessler of Vakamo

Episode Date: May 16, 2025

Highlights from this week’s conversation include:Viktor's Background and Journey in Data (1:20)Evolution of Data Architecture (4:41)The Lakehouse Concept (7:12)Open Source Innovation (11:05)Data Pro...duction and Decentralization (15:06)Governance in Decentralized Systems (18:53)Data Economy and Monetization (21:15)Security Concerns in Data Processing (24:21)Impact on Data Consumers (27:37)Compaction Issues in Data Tables (29:39)Open Source Lake Keeper Tool and Parting Thoughts (33:02)The Data Stack Show is a weekly podcast powered by RudderStack, the CDP for developers. Each week we’ll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data.RudderStack helps businesses make the most out of their customer data while ensuring data privacy and security. To learn more about RudderStack visit rudderstack.com.

Transcript
Discussion (0)
Starting point is 00:00:00 For the next two weeks as a thank you for listening to the Data Stack show, Rudderstack is giving away some awesome prizes. The grand prize is a LEGO Star Wars Razor Crest 1023 piece set. They're also giving away Yeti mugs, anchor power banks, and everyone who enters will get a Rudderstack swag pack. To sign up, visit rudderstack.com slash TDSS-giveaway. Hi, I'm Eric Dotz. And I'm John Wessel. Welcome to the Data Stack Show.
Starting point is 00:00:36 The Data Stack Show is a podcast where we talk about the technical, business, and human challenges involved in data work. Join our casual conversations with innovators and data professionals to learn about new data technologies and how data teams are run at top companies. ["Data Work"] Before we dig into today's episode, we want to give a huge thanks to our presenting sponsor, RutterSack. They give us the equipment and time to do this show
Starting point is 00:01:04 week in, week out, and provide you the valuable content. RudderSack provides customer data infrastructure and is used by the world's most innovative companies to collect, transform, and deliver their event data wherever it's needed, all in real time. You can learn more at ruddersack.com. Welcome back to the Thank you guys for inviting me. All right, well, we want to talk about all things lake houses and talk about Lakekeeper. But first, tell us about your background. Absolutely. Well, first of all, I'm based out of Switzerland, so from Europe,
Starting point is 00:01:53 and I'm happy to be here at Data Console. I am one of the co-found of a company named Bakamo, and Bakamo is the company which develops a Lakekeeper open source Appligeisper catalog. And from a background, I'm ex-mongoloid to be. Yes, in ex-risk management. Absolutely, you know, like I'm based previously out of Germany. In Germany we have a lot of insurance companies and one of the companies I used to work with was Munich Re-Ergo in different companies. Great, so Victor one of the topics that I think we need you to clear us up on is catalogs and maybe a little bit of a misnomer. So we want to get some clarity from you on that. And then what do you want to talk about? Yeah, well, catalog from, you know, like what I hear is a
Starting point is 00:02:35 misused word and unfortunately, like everything is a catalog. And if we're going to talk about catalogs around Lake House, there is a specific technical catalog, what you need for Apache Iceberg. And what we would love to talk like from Vakamo is all about the metadata and technical metadata which need to become actionable. Yeah, love it. Awesome. Well, let's dig in. All right, Victor, so excited to connect here in person at Data Council. When you were talking a bit about your background, one of the word you used is that you've worked on ancient systems and here we are today in 2025
Starting point is 00:03:11 all the way up to Lake House architecture, Iceberg. What is, what's maybe one takeaway or kind of lesson that you've kept with you all the way up to today, maybe that you kind of learned working on those ancient systems? Yeah, so maybe to just, you know, like to to mention what is the ancient system and yes what is the and you know like so some people probably who are going to listen that they don't even know that type of a system and you know from my experience I can go to some kind of like IT archaeology
Starting point is 00:03:42 just imagine you with a brush and with the whole... Yes, that's something what you can do. And what I actually mean, it's a super robust system, but I used to work with mainframe DB2 and with something like Cobalt copy books, which used to be very useful in 60s, 70s, I would say. But right now it's super hard to go there. Even looking at the assembler or something like this But I'm grateful to have that experience because you know like you understand what was the beginning like punch card
Starting point is 00:04:12 What is the punish card exactly and now moving on on the stack you see that there's one thing is which is The change is continuous. It's not like oh now we have a AI revolution The change is continuous. It's not like, oh, now we have an AI revolution. No, we have like all the way down to that punch card to today, some kind of a change. And that's what I learned. That's even like in the next couple of years, we're going to talk about some new topics, I guess. Yeah, totally. Okay, so let's, I want to dig into that a little bit. And I love that perspective. I mean, clearly AI is going to have an impact on the way that, you know, the way that a lot of things are done. And, but it is
Starting point is 00:04:52 just one of many, if we think about the IT archaeology, right? It's one of many, one of many ages, right? How do you think lake house architecture, or I guess maybe a better way to put it would be, is it a fundamental shift? Do you believe we're at the front of a fundamental shift or is it just another component sort of in a landscape? Yeah. So maybe let's just go on a journey of a data platform to understand why we end up with a lake house and what is a lake house and what type of challenges we're trying to solve here.
Starting point is 00:05:26 So in our back days, we had some, a nice system called data warehouse and we used all the different databases like Postgres or maybe like SQL Server, DB2, Oracle, Teradata. And they've been like a monolithic type of a system, one box. And it was good to serve the amount of data at that time, pre-internet era, let's put it that way. And we could actually store the amount of data we had and the amount of reports we can serve for that type of a system.
Starting point is 00:05:57 But the system itself was monolithic and very bureaucratic. So it's like you had some someone sitting in the ivory tower who was making the decision twice a year, we're gonna make an adjustment or data model. Yeah. And that was like, holy grail for that person. The DB admin. Yes.
Starting point is 00:06:15 Kiss my hand. Kiss the ring and I will make the change. Yeah, you need to go to outer and make some. Yeah. An actual animal sacrifice. That was a quite funny time. At the data temple, right? At the data temple, yeah. Yeah, we're in the data temple.
Starting point is 00:06:32 Exactly. But, you know, things have changed, and especially with the internet, we got a vast amount of data which didn't fit inside of a data warehouse, part of the, you know, like you have your schema, you have your star schema store Flick schema we've got like all that clicks IOT all that different data which you need to capture somehow and analyze It's why like after data warehouse We've got data lakes and data lakes was based primarily on the Hadoop system And then we got like object storage with S3 where we could store
Starting point is 00:07:03 Some formats like Avro, Parquet or C and we got like up to the petabytes and that was like in a parallel world to data warehouse and we had the data lakes with all that file based systems. The thing on a data lake was that you could not go and make transaction. It was super hard on a schema evolution. So all that stuff was very kind of hard from a modeling perspective, but it was very flexible. So I could not just go like store a bit about the data and then take something like Trino, Presto, whatever different tools and just analyze it. And it was so awesome. But again, maybe like here, hello from Germany, the fault GDPI Europeans type of regulation that someone will tell you or you got to delete that data and you don't know how to do that. Yeah right it was super hard so
Starting point is 00:07:51 now we have like two paradigms we have a data warehouse they're bureaucratic rigid and they have a very flexible chaotic data lake with height metastore and you know like you can do that parallel but at the end, we understood like, okay, we need to do something around that part. So we need just to take that best from both worlds, combine that. And that's why we got the lake house, which can serve exactly that pattern. So from one side, we have all the transaction guarantees, and we have like schema evolution, time travel capabilities. And in parallel, we have like now infinite storage travel capabilities and in parallel we have like now infinite storage on s3 where we're gonna store like parquet file the the question what you have is it
Starting point is 00:08:31 kind of a evolution or revolution and from my perspective we have revolution because what we actually got with a lake house we've got a open table format named iceProp which made the data free. So the thing is right now you can just go and store your IceProp in whatever place you want. You can store that in a cloud and now with a cloud repatriation you can go and store that on-prem and you can write that with Spark by IceProp. You can use like 5trend with all the different tools and then by reading you can again use all the different tools as well. And the good thing is that you're not stacked with one type of technology which dictates you how you're gonna analyze your data it's more like okay my use case is ABC and that is a dictated by business where I need to drive value
Starting point is 00:09:18 and in that situation I can just go and pick a technology which will drive the best value for me so I can be more competitive. And that is a revolutionary thing because the data is now in an iceberg format free. But their caveats, their challenges, you need to manage icebergs. So one of the things that I think we talked about this last night, that's really practical but easy to gloss over is data people have been doing this long enough. One of the reasons that's so attractive is because they've been through these system migrations. Like they get acquired and then they're like,
Starting point is 00:09:50 all of your technologies and this technology, you have to move it to this technology by this date. You spend 18 months doing it. Oh, that's a topic for Capgemini essentially, going migration from EV2 to Oracle and then to Antwerp and then to Snowflake. Or you just get a new leader and they decide that you're going to use the new technology. And you can make a whole career of these companies essentially.
Starting point is 00:10:10 What did you do? Like I think I just migrated things between us. Technology is my entire career. So that's one of the reasons I think it's so attractive to people. Well, okay. One interesting question actually, John, for both you and Victor, especially as we think about this, I love the concept of archaeology. I'm trying to figure out how to fit a paleontology. I think you're going to Indiana Jones, but all
Starting point is 00:10:30 right. That would have been way better. One interesting, the forces that are pulling these advances have pulled these advances, I think, about two main main forces and maybe I'm thinking about this, you know, maybe my view is too narrow, but you have this pull of cost, right, where I know I want to do this thing but it's just way too expensive with the current technology. And then you have use cases, right? I need to do something that I can't do because of limitations here, right? And there's obviously a relationship there. Are those the two primary forces or what are the other forces?
Starting point is 00:11:08 What I would love to add to, absolutely agree on that two forces. And there's one additional, which is the open source community. That sometimes follows the two forces, but sometimes they have like a different understanding of world and that is a very essential part. And nowadays you can see that the open source community drives a lot of innovation and that
Starting point is 00:11:31 open source even goes, well, probably they have a crystal ball image trying to see like in the future and develop some stuff that can be used later on by a bank, by insurance. And that is kind of a cool thing what I would like just to add to the forces. That's an interesting dynamic because the first two are almost completely commercial. I mean, you have cost, we're trying to manage the balance sheet, we have a use case, we're trying to essentially add revenue through executing some use case of data. But the open source community is driven by innovation, the joy of building things, right? Curiosity.
Starting point is 00:12:11 Yeah, exactly. It's like Indiana Jones, probably, but from a data chart. Yeah, totally. There we go. We have the Indiana Jones. And I think there's also the driver of light developer experience, right? Like a lot of people are solving their own like painful, like pain points and they're like,
Starting point is 00:12:27 I've been using this tool, it's awful. Like this is my life. I have to create something better, which is the innovation part, but also it's just like, is it the frustration perspective? Yeah. Can we look at those, let's look at this revolution in those three lenses, right?
Starting point is 00:12:43 So we know that cost, I think, is probably the easiest one in terms of that pattern was established by S3 basically. We can essentially have unlimited storage at a very low cost practically, but there were all these limitations. That aspect is very clear when we think about the lake house where there is a cost driver. What about the use case side of it? Yeah. Maybe, you know, just to talk about that costs and use case,
Starting point is 00:13:14 let's just look at the lake house architecture, how it's structured. So you have like main three components on a lake house. The first one is storage, which might drive costs or might even like lower the cost aspect and that is kind of a solved issue with Amazon Street, Google and Azure. You can go and use some Dell storages on plan. It's a commodity. Absolutely. Then the second component which is commodity as well, it's a compute and we have classically like two types of a compute, writes and reads and then on write you have like Spark, Pi, Iceberg and all the different Ekl tools. And
Starting point is 00:13:50 on read you have like Presto, Trino, DuckDB, DataFusing. And that is again like a large list and it's kind of a challenge for our companies to pick and write computes and that can drive the cost up and down depending like on your use case. And then the last third component in order to get your lake house alive is well now we are with a word catalog. Because in order to create a table iceberg table like you have a DDL statement create table, alter table, you communicate eventually with a catalog, which will execute that to create the metadata layer of Icebook table. And then your compute will communicate with catalog to understand that metadata and then
Starting point is 00:14:33 write the parquet file to a tree storage. So it's all distributed right now and it helps you actually to scale every component on your demand and your use case. And that's quite interesting because on a use case perspective is right now you have like a classical way, we have like a centralized data engineers who are just trying to collect all the data in one space, but what happens in parallel to the organization we decentralized the whole stuff. Right now like every company want to be a startup and now we have a inside of a company,
Starting point is 00:15:05 marketing is a startup and sales is startup and everyone is like independent, which is actually kind of a not aligned with the way how we treat data. And what we actually need to do here is to think like every department, aka startup now needs to treat data as a product and think about like okay I'm the one who understands the data. I'm the one who can prepare that as a product and give it to someone So I'm a data producer. It's my data manufacturing a machine and everyone can consume that via through API SQL whatever different like protocols and then now we have like MCP for AI agents and so forth. And that is something what you will look at the use case side.
Starting point is 00:15:49 So you have like all that different use case and they can be solved by teams or data domains itself, but not century. And that is a different kind of a trend what we have here. Yep. We're gonna take a quick break from the episode to talk about our sponsor, Rutter Stack. Now I could say a bunch of nice things as if I found a fancy new tool, take a quick break from the episode to talk about our sponsor, RutterStack.
Starting point is 00:16:05 Now, I could say a bunch of nice things as if I found a fancy new tool, but John has been implementing RutterStack for over half a decade. John, you work with customer event data every day and you know how hard it can be to make sure that data is clean and then to stream it everywhere it needs to go. Yeah, Eric. As you know, customer data can get messy. ago. have implemented the longest running production instance of RutterStack at six years and going. Yes, I can confirm that. And one of the reasons we picked RutterStack was that it does not store the data and we can live stream data to our downstream tools. One of the things about the implementation that has been so common over all the years and with so many RutterStack customers is that it wasn't a wholesale replacement of your stack. It fit right into your existing tool set. Yeah, and even with technical tools, Eric, things like Kafka or PubSub, but you don't
Starting point is 00:17:13 have to have all that complicated customer data infrastructure. Well, if you need to stream clean customer data to your entire stack, including your data infrastructure tools, head over to rudderstack.com to learn more. John, you asked about the term catalog. Yeah. Let's dig into that because it is I Victor you, when you were talking before we hit record, the term came up and you got a nice sly, you know, grin on your face and you're chuckling now that John, you had some questions about that term. I think it'd be helpful to kind of overlay. So most of our listeners will be very familiar with,
Starting point is 00:17:45 let's say Postgres, right? To overlay what Postgres kind of bundles for you. And then let's look at that in this new architecture and talk about the different layers and what's happening. And then talk about the names, like the misnomer on catalogs. Yeah, yeah. So, you know, like if you look at the,
Starting point is 00:18:01 like let's take Postgres. So it's a box which has everything. It's storage, compute, it's a list of all the things that you can do. name like the misnomer on catalogs yeah yeah so you know like if you look at the like let's take posters so it's a box which has everything it's storage compute it takes care of your table life cycle it takes care of access management but what happened eventually that someone took the tall hammer as my co-founder would say and and just 800 Postgres and it's full apart. And now if you look at storage from a Postgres, you have S3. And then if you look at compute, you have something like Spark. And if you look at exactly that part which managed the Postgres, so you as a user or
Starting point is 00:18:37 whoever can communicate, that is exactly the catalog part with what you can actually call information schema, where you have like your tables views you have some objects inside of your information schema and that's exactly what we call a catalog in Lakehouse and there are some like benefits of that type of architecture but we need to think about like how we're going to manage the governance in that case, because at the end, it's not a single system which controls who's writing and then who's reading. Now you have like again, getting back to that startup type of organization, marketing uses spy iceberg, sales uses snowflake, and then how you're going to give access to your table, who is going to read, who's going to run.
Starting point is 00:19:21 And then there's, I think there's also the use case of data sharing between business units or between partners or between vendors. Like I think that's gonna grow as well. That's a topic for itself, but that's, you know, like you touched something. So like with 99% of my discussions, like, okay, we have big company and we would like just to build a lighthouse.
Starting point is 00:19:40 And then there are sometimes discussions, okay, let's zoom out on the supply chain. And like I'm from Germany, we have like manufacturing cars, automotive, pharmaceutical, and then in that supply chain, you have like thousands of different suppliers. And now is the question, let's assume I'm continental, I'm producing tires. And then you're Mercedes, you're building a Mercedes and then you buy a Mercedes and you drive a Mercedes. So me as a continental, I have an R&D department who made an assumption about like how you're going to drive in San Francisco.
Starting point is 00:20:11 And then someone drives in, I don't know, in a different part of that country. And me as continental, I would love to get that data back cycle to understand that somewhere where it's like, okay, he's foreign guides plus 70 and someone who is by plus 30 is a different type of tires what I need or like a rubber on my gum and that is kind of a question what right now is kind of unsolved because my R&D tries to predict but getting back exactly to that zoom out on a supply chain we need to build a sharing and not just sharing of the data, we need to govern that sharing. And there are two aspects to that. And especially Lakehouse can solve that because Lakehouse offers us some
Starting point is 00:20:53 sort of a no-copy architecture. So I store that in S3 and then I can give access to S3 to all the different partners. But I need somehow to manage what the purpose of reading of that data, who is gonna read that data, I need to audit all that different reads. And therefore I need to date a contract, but not like a PDF in a Wikipedia page. I need to have part of a computational... Not an acuSign, right? Well, you can try and try, but that's going to be hard, especially like if you want to automate the whole stuff. And if you look at in the future, right now we are in that process of sharing humans.
Starting point is 00:21:28 And I can call John and ask you, can I get your data? But in the future we will have AI agents and they need a way to automate the whole process. And that process cannot be done just on the phone. Ray might call each other. Maybe. Yeah. But I think they expect to have MCP type is protocol just to negotiate on the way how we're going to use it. And the funny part is if you have like that supply chain you might ask yourself okay so now I'm like producing a data product and then I have someone who just want to consume it
Starting point is 00:22:02 inside of organization or outside of organization. So can I put a price tag on my data product? So can I just drive the value from that? So we can actually go and then say, well, now we can actually create a data economy because now we can sell data products and that's how data becomes oil, wheat, or whatever type of a commodity. Well, I think there's this interesting thing that we touched on that is part of the evolution of that separation between storage and compute, right?
Starting point is 00:22:30 Super important part of the evolution. And essentially all of the cost is in the compute. The storage is almost free. Like, not quite. There's a certain level where you can wrap up some cost. But for most companies, it's almost free. If you're in the majority of companies that don't have that much data, it's very cheap.
Starting point is 00:22:49 And then I think you touched on this too. You've got like, okay, so I've got all the storage and then what if the person that's asset, like you have to handle the governance, but then the person accessing the data brings their compute. So there's this interesting cost dynamic here too, where like, it's just, there's an easiness to like, yeah, like you can have access to data,
Starting point is 00:23:10 we handle governance, you bring your own compute. And then from a cost standpoint, like, you're paying for whatever you're using. I think that's an interesting thing. Yeah, maybe, maybe that's super awesome because, well, first thing about that storage doesn't cost much. Well, just try to count how much do we need to pay it like for a street to store a petabyte.
Starting point is 00:23:28 The new terabyte is a petabyte. That's kind of like the situation nowadays. And there is this estimation that we have 150 zettabytes of data stored and the estimation by 2030 is gonna be like 2000 zettabytes. So you can just say that. So someone who is building their business on storage, it's a good time. It's been a good time for a long time. Yes, yes. So it's not gonna be that cheap anymore I guess and therefore
Starting point is 00:23:52 we need to drive value. But the good point is how to use the compute on a different way. And there is two ways. So you can go in and use your NDP engine or you can think about like if majority of selects or reads is not that big it's like one gig yeah right and the question is why not to use your own laptop with something like a ddb data fusion and especially with the power of the the individual machines now absolutely incredible absolutely and that's something what we think as well in a Lakekeeper, why not just to take something like DuckDB or Data Fusion, use a WASM, embed that in a browser,
Starting point is 00:24:33 and you just go open the browser and then you can write your query, which will use your computer and your local machine, go through catalog to rest, read the parquet file, and you will get your results. So it might be not a millisecond response, it's a couple of seconds, but if it does the job, why not to do something about it? So what do you think about in that architecture, like obviously people are going to have security
Starting point is 00:24:55 concerns because I think they probably have a little bit of a false sense of security when everything's like processed on secure servers that you know and then now it's like processing local like what how do you think that is going to be approached the security challenge? Yeah, so the governance is a very hard topic and Is like kind of a question who is gonna? issue the key of access yeah, and If you look at the organizations and that they are very free in choosing the tool So you usually if you go to like the large enterprise, they have everything.
Starting point is 00:25:27 Yeah. And then there's a poor guy, Siso, who needs just to say like, it's all secured. Don't worry. Yeah. There's no data bridges. And then there's, you said we have everything and don't worry. There's no data bridges.
Starting point is 00:25:44 Yeah. It's not. Yeah. Yeah., that's kind of a tricky question. And now looking at Lakehouse, it might be my biased opinion, but I think that the only one place where I can just say that a person, a group, a tool can read, write is a catalogue. Because the catalogue is same like in a postgres. Postgres you will say okay that role can read and the same in a catalog and it's actually what we do. We connect you with IDP and that was one of the decision of Lakekeeper team so we're not going to be at IDP so we're not issuing any tokens whatever. We can just connect to mtroid, octa, kitlog and then we're going to use that token and inside of Lakekeeper we have authorization concept based on Google Zanzibar paper we use openFGA so we
Starting point is 00:26:29 re-bag a super set of A bag for our bag whatever bag and we can actually manage inside of catalog and say okay group A can read the table and person B can write to the table and the catalog is in place and then doesn't matter which tool all of them will go to us and say okay I am a person and we can actually solve that problem for a CISO. Yeah very cool. Yeah I was thinking about the CISO and the concept of IT archeology. Yeah. That's not the type of dig you want to. But I mean that is a pretty strong selling point around security because we just drastically simplified that very big problem. And maybe just what I would like to add because I have a lot of conversation for companies is like, okay, so from a governance perspective you have like a security,
Starting point is 00:27:20 but there is an additional concept which is well, companies try to avoid that. But let's assume the situation that I'm the owner of a table and I take a customer table and someone is using that table, but I have usually no idea about that person that they use a table customer for whatever purposes. But I hope that the purpose is just to build a report and goal and make a business decision, which will lower the cost or it's just get some revenue in a company. And if you go to enterprise, you will find like that situation that we have a hundred thousand of tables and the owner has usually no understanding who is using what type of table.
Starting point is 00:27:59 But what I can do as an owner, I can go and make an alterty. So from security, from airbag, it's all solved. But that will cause a problem on your side or consumer side that because your report is not working anymore, it will break. So you are impacted on a business decision. And from a governance perspective, that's a very eventual kind of, it's a very important part, because what we need to do is somehow to solve the problem that the consumption pipeline is Unbreakable that business is not interrupted and that exactly what we built inside the playkeeper the interface that communicates with a contract engine Which means let's assume I am running alter table
Starting point is 00:28:39 So the air bug will tell your the or the administrator go for it. On a second step there is a business constraint inside of a data contract and there is an SLO, stable schema, which actually prohibits me to do that type of operation. So the laykeeper will examine that SLO, tell me where is the conflict. On the next step the laykeeper can inform every consumer and need that there are two persons or one person who uses that product So what I can do now I can go and turn the contract and take a year or seven Sunday there is grace period change your report adapt to the new system And that's the way how we can make achieve that that the pipeline the consumption pipeline will become unbreakable
Starting point is 00:29:20 and going forward in couple of next years if become unbreakable and going forward in couple of next years, if there is no like a human in that process but AI agents and that will help them, you know, like just to repair stuff. So what do you think the biggest practical barrier is to adopting that, like today to adopting this architecture and then maybe what does that look like next year in a few years? Yeah, so the, I would say from a lake house perspective is that it's still brand new and we miss a couple of things. What do you think the biggest missing things like for people that are like I really want
Starting point is 00:29:55 this? Yes, from a technical perspective, the missing part is the optimization or compaction of a table. So it's very hard issue right now. Because you know, on a day one, you start inserting in your table, all is good. On the day two you're trying to run your report, it doesn't work anymore because the table has too many small files. And I think yesterday was like LinkedIn presenting some data about like compaction, how hard that issue actually is. And which means you
Starting point is 00:30:22 need to go on day two and run a compaction where you're going to take all that small files, let's say 10 small porke files and write one big file. Because it's not just the performance, it's costs, you know, like every get and list on S3 costs Japan. And if it's like hundreds of gets instead of one get, so you have a different cost bill at that. And that is a very hard issue right now. So to solve the compaction, I know like a lot of companies trying to do that. And so again, from a catalog perspective, I think catalog is the best place, a way to just tick the box and say like, okay, that table should be optimized today. That's it. I know we're getting closer, but I want to ask about something you mentioned at the beginning, which is fascinating.
Starting point is 00:31:07 So this architecture enables a world where you use supply chain, for example, so tires, you know, car manufacturer, and then the actual, you you're not in the car or you're not driving, just sitting in the car, you know, but it's still, you know, rubber on the road. What's interesting to think about if we, you know, there's all this technology underlying that the catalog enables, you know, all these interesting ways to execute contracts between multiple different parties. But what we're talking about is an economy where products are being exchanged, right? Like there's an exchange of goods, it's just that it's data, and the architecture actually enables that.
Starting point is 00:31:53 How do you think that economy will form in terms of the actual format of the transactions, right? Because there is this really fascinating set of commodities that are currently not monetized because the pathway to monetization is very inefficient. Like it is actually a ton of work, there are security considerations, right? But the future you are describing is that we now have an architecture that can create efficiencies there. And so what's the mechanism that's actually going to enable the exchange of goods? Well, I think we still have to develop some stuff, because when I talk to companies and they would like to share some data, and that is a misconception, you shouldn't share data, you should share the data product. Yes. And the data product is a bit
Starting point is 00:32:42 more than just a raw table. And so that is a piece which we don't have at the moment. So I know like a lot of startups trying to build something around the data product, because as in a physical world, you don't wanna buy plastic, you would like to buy a product, right? That they can use. And if we have that piece,
Starting point is 00:33:01 then we can think about like what type of platforms we can use to exchange the goods. Is it going to be like the Amazon for data products? Is it going to be a NASDAQ for some sort of like a commodity exchange and so on? So there are a lot of new stuff coming up in the next five to ten years around the data now. Yep, man, that's going to be really fascinating. Okay, Victor, we're at the end, but tell our listeners where they can find out about Lakekeeper and it's an open source tool so they can go try it out. Yeah, well, everyone is invited just lakekeeper.io and then you will find actually the whole
Starting point is 00:33:36 information or just go to the GitHub, try it out, give us a feedback. We're building that not in a bubble, so everyone needs just to try it out, give us a feedback and if you like it, give us a star. Would awesome just to get a star. And we open for contribution, so we're not paused. So if you want to develop a feature, you're welcome. Great. Awesome.
Starting point is 00:33:56 All righty. Well, thank you so much for joining us here in Oakland, Victor. Thank you, guys. All right. That's a wrap for episode four here in person at Data Council. Stay tuned, we've got more coming your way. The Data Stack Show is brought to you by Rudder Stack, the warehouse native customer data platform.
Starting point is 00:34:14 Learn more at rudderstack.com.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.