The Data Stack Show - 243: The Data Economy: Turning Information into a Tradable Commodity with Viktor Kessler of Vakamo
Episode Date: May 16, 2025Highlights from this week’s conversation include:Viktor's Background and Journey in Data (1:20)Evolution of Data Architecture (4:41)The Lakehouse Concept (7:12)Open Source Innovation (11:05)Data Pro...duction and Decentralization (15:06)Governance in Decentralized Systems (18:53)Data Economy and Monetization (21:15)Security Concerns in Data Processing (24:21)Impact on Data Consumers (27:37)Compaction Issues in Data Tables (29:39)Open Source Lake Keeper Tool and Parting Thoughts (33:02)The Data Stack Show is a weekly podcast powered by RudderStack, the CDP for developers. Each week we’ll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data.RudderStack helps businesses make the most out of their customer data while ensuring data privacy and security. To learn more about RudderStack visit rudderstack.com.
Transcript
Discussion (0)
For the next two weeks as a thank you for listening to the Data Stack show,
Rudderstack is giving away some awesome prizes.
The grand prize is a LEGO Star Wars Razor Crest 1023 piece set.
They're also giving away Yeti mugs, anchor power banks, and everyone who enters will get a Rudderstack swag pack.
To sign up, visit rudderstack.com slash TDSS-giveaway.
Hi, I'm Eric Dotz.
And I'm John Wessel.
Welcome to the Data Stack Show.
The Data Stack Show is a podcast where we talk about the technical, business, and human challenges involved in data work. Join our casual conversations with innovators
and data professionals to learn about new data technologies
and how data teams are run at top companies.
["Data Work"]
Before we dig into today's episode,
we want to give a huge thanks
to our presenting sponsor, RutterSack.
They give us the equipment and time to do this show
week in, week out, and provide you the valuable content. RudderSack provides customer data
infrastructure and is used by the world's most innovative companies to collect, transform,
and deliver their event data wherever it's needed, all in real time. You can learn more at ruddersack.com.
Welcome back to the
Thank you guys for inviting me.
All right, well, we want to talk about all things lake houses and talk about Lakekeeper.
But first, tell us about your background.
Absolutely. Well, first of all, I'm based out of Switzerland, so from Europe,
and I'm happy to be here at Data Console. I am one of the co-found of a company named
Bakamo, and Bakamo is the company which develops a Lakekeeper open source Appligeisper catalog.
And from a background, I'm ex-mongoloid to be. Yes, in ex-risk management. Absolutely, you know,
like I'm based previously out of Germany. In Germany we have a lot of insurance
companies and one of the companies I used to work with was Munich Re-Ergo in
different companies. Great, so Victor one of the topics that I think we need you to clear us up on is
catalogs and maybe a little bit of a misnomer. So we want to get some clarity from you on that.
And then what do you want to talk about? Yeah, well, catalog from, you know, like what I hear is a
misused word and unfortunately, like everything is a catalog. And if we're going to talk about
catalogs around Lake House, there is a specific technical catalog, what you need for
Apache Iceberg. And what we would love to talk like from Vakamo is all about the metadata and
technical metadata which need to become actionable. Yeah, love it. Awesome. Well, let's dig in.
All right, Victor, so excited to connect here in person at Data Council. When you were talking
a bit about your background,
one of the word you used is that you've worked
on ancient systems and here we are today in 2025
all the way up to Lake House architecture, Iceberg.
What is, what's maybe one takeaway or kind of lesson
that you've kept with you all the way up to today,
maybe that you kind of learned working
on those ancient systems?
Yeah, so maybe to just, you know, like to to mention what is the ancient system and yes what is the
and you know like so some people probably who are going to listen that they don't even know that
type of a system and you know from my experience I can go to some kind of like IT archaeology
just imagine you with a brush and with the whole...
Yes, that's something what you can do.
And what I actually mean, it's a super robust system, but I used to work with mainframe
DB2 and with something like Cobalt copy books, which used to be very useful in 60s, 70s,
I would say.
But right now it's super hard to go there.
Even looking at the assembler or something like this
But I'm grateful to have that experience because you know like you understand what was the beginning like punch card
What is the punish card exactly and now moving on on the stack you see that there's one thing is which is
The change is continuous. It's not like oh now we have a AI revolution
The change is continuous. It's not like, oh, now we have an AI revolution. No, we have like all the way down to that punch card to today, some kind of a change.
And that's what I learned. That's even like in the next couple of years, we're going to talk about some new topics, I guess.
Yeah, totally.
Okay, so let's, I want to dig into that a little bit.
And I love that perspective. I mean, clearly AI is going to
have an impact on the way that, you know, the way that a lot of things are done. And, but it is
just one of many, if we think about the IT archaeology, right? It's one of many, one of many
ages, right? How do you think lake house architecture, or I guess maybe a better way to put it would be,
is it a fundamental shift?
Do you believe we're at the front of a fundamental shift or is it just another component sort
of in a landscape?
Yeah.
So maybe let's just go on a journey of a data platform to understand why we end up with
a lake house and what is a lake house and what type of challenges we're trying to solve here.
So in our back days, we had some, a nice system called data warehouse and we used all the
different databases like Postgres or maybe like SQL Server, DB2, Oracle, Teradata.
And they've been like a monolithic type of a system, one box.
And it was good to serve the amount of data at that time,
pre-internet era, let's put it that way.
And we could actually store the amount of data we had
and the amount of reports we can serve
for that type of a system.
But the system itself was monolithic and very bureaucratic.
So it's like you had some someone sitting in the ivory tower
who was making the decision twice a year,
we're gonna make an adjustment or data model.
Yeah.
And that was like, holy grail for that person.
The DB admin.
Yes.
Kiss my hand.
Kiss the ring and I will make the change.
Yeah, you need to go to outer and make some.
Yeah.
An actual animal sacrifice. That was a quite funny time.
At the data temple, right?
At the data temple, yeah.
Yeah, we're in the data temple.
Exactly.
But, you know, things have changed, and especially with the internet, we got a vast amount of
data which didn't fit inside of a data warehouse, part of the, you know, like you have your
schema, you have your star schema store
Flick schema we've got like all that clicks IOT all that different data which you need to capture somehow and analyze
It's why like after data warehouse
We've got data lakes and data lakes was based primarily on the Hadoop system
And then we got like object storage with S3 where we could store
Some formats like Avro, Parquet or C and we got like up to the petabytes and that was like in a
parallel world to data warehouse and we had the data lakes with all that
file based systems. The thing on a data lake was that you could not go and make
transaction. It was super hard on a schema evolution. So all that stuff was very kind of hard from a modeling perspective, but it was very flexible. So I could not just
go like store a bit about the data and then take something like Trino, Presto, whatever
different tools and just analyze it. And it was so awesome. But again, maybe like here,
hello from Germany, the fault GDPI Europeans type of regulation that someone will tell you
or you got to delete that data and you don't know how to do that. Yeah right it was super hard so
now we have like two paradigms we have a data warehouse they're bureaucratic rigid and they
have a very flexible chaotic data lake with height metastore and you know like you can do that
parallel but at the end, we understood like,
okay, we need to do something around that part. So we need just to take that best from both worlds,
combine that. And that's why we got the lake house, which can serve exactly that pattern. So from one
side, we have all the transaction guarantees, and we have like schema evolution, time travel
capabilities. And in parallel, we have like now infinite storage travel capabilities and in parallel we have like now
infinite storage on s3 where we're gonna store like parquet file the the question what you have is it
kind of a evolution or revolution and from my perspective we have revolution because what we
actually got with a lake house we've got a open table format named iceProp which made the data free. So the thing is
right now you can just go and store your IceProp in whatever place you want. You can store that in a
cloud and now with a cloud repatriation you can go and store that on-prem and you can write that with
Spark by IceProp. You can use like 5trend with all the different tools and then by reading you can
again use all the different tools as well. And the good thing is that you're not stacked with one type of technology
which dictates you how you're gonna analyze your data it's more like okay my
use case is ABC and that is a dictated by business where I need to drive value
and in that situation I can just go and pick a technology which will drive the
best value for me so I can be more competitive. And that is a revolutionary thing because the data is now in an iceberg format free.
But their caveats, their challenges, you need to manage icebergs.
So one of the things that I think we talked about this last night,
that's really practical but easy to gloss over is data people have been doing this long enough.
One of the reasons that's so attractive
is because they've been through these system migrations.
Like they get acquired and then they're like,
all of your technologies and this technology,
you have to move it to this technology by this date.
You spend 18 months doing it.
Oh, that's a topic for Capgemini essentially,
going migration from EV2 to Oracle
and then to Antwerp and then to Snowflake.
Or you just get a new leader and they decide that you're going to use the new technology.
And you can make a whole career of these companies essentially.
What did you do?
Like I think I just migrated things between us.
Technology is my entire career.
So that's one of the reasons I think it's so attractive to people.
Well, okay.
One interesting question actually, John, for both you and Victor, especially as we think
about this, I love the concept of archaeology.
I'm trying to figure out how to fit a paleontology. I think you're going to Indiana Jones, but all
right. That would have been way better. One interesting, the forces that are pulling
these advances have pulled these advances, I think, about two main main forces and maybe I'm thinking about this, you know, maybe my view is too narrow, but you have this pull of cost, right, where I
know I want to do this thing but it's just way too expensive with the current
technology. And then you have use cases, right? I need to do something that I
can't do because of limitations here, right? And there's obviously a
relationship there.
Are those the two primary forces
or what are the other forces?
What I would love to add to,
absolutely agree on that two forces.
And there's one additional,
which is the open source community.
That sometimes follows the two forces,
but sometimes they have like a different understanding
of world and that is a very essential part.
And nowadays you can see that the open source community drives a lot of innovation and that
open source even goes, well, probably they have a crystal ball image trying to see like
in the future and develop some stuff that can be used later on by a bank, by insurance.
And that is kind of a cool thing what I would like just to add to the forces.
That's an interesting dynamic because the first two are almost completely commercial.
I mean, you have cost, we're trying to manage the balance sheet, we have a use case, we're
trying to essentially add revenue through executing some use case of data.
But the open source community is driven by innovation, the joy of building things, right?
Curiosity.
Yeah, exactly.
It's like Indiana Jones, probably, but from a data chart.
Yeah, totally.
There we go.
We have the Indiana Jones.
And I think there's also the driver of light developer experience, right?
Like a lot of people are solving their own like painful,
like pain points and they're like,
I've been using this tool, it's awful.
Like this is my life.
I have to create something better,
which is the innovation part, but also it's just like,
is it the frustration perspective?
Yeah.
Can we look at those,
let's look at this revolution in those three lenses, right?
So we know that cost, I think, is probably the easiest one in terms of that pattern was established
by S3 basically.
We can essentially have unlimited storage at a very low cost practically, but there
were all these limitations.
That aspect is very clear when we think about the lake house where there is a cost driver.
What about the use case side of it?
Yeah.
Maybe, you know, just to talk about that costs and use case,
let's just look at the lake house architecture, how it's structured.
So you have like main three components on a lake house.
The first one is storage, which might drive costs
or might even like lower the cost aspect and that is kind of a solved issue with
Amazon Street, Google and Azure. You can go and use some Dell
storages on plan. It's a commodity. Absolutely. Then the second component
which is commodity as well, it's a compute and we have classically like two
types of a compute, writes and reads and then on write you have like Spark, Pi, Iceberg and all the different Ekl tools. And
on read you have like Presto, Trino, DuckDB, DataFusing. And that is again like a large list and
it's kind of a challenge for our companies to pick and write computes and that can drive the cost up
and down depending like on your use case. And then the last
third component in order to get your lake house alive is well now we are with a word catalog.
Because in order to create a table iceberg table like you have a DDL statement create table,
alter table, you communicate eventually with a catalog, which will execute that to
create the metadata layer of Icebook table.
And then your compute will communicate with catalog to understand that metadata and then
write the parquet file to a tree storage.
So it's all distributed right now and it helps you actually to scale every component on your
demand and your use case.
And that's quite interesting because on a use case perspective
is right now you have like a classical way, we have like a centralized data engineers who are
just trying to collect all the data in one space, but what happens in parallel to the organization
we decentralized the whole stuff. Right now like every company want to be a startup and now we have
a inside of a company,
marketing is a startup and sales is startup and everyone is like independent, which is actually
kind of a not aligned with the way how we treat data. And what we actually need to do here is to
think like every department, aka startup now needs to treat data as a product and think about like okay
I'm the one who understands the data. I'm the one who can prepare that as a product and give it to someone
So I'm a data producer. It's my data manufacturing a machine and everyone can consume that via through
API SQL whatever different like protocols and then now we have like MCP for AI agents and so forth.
And that is something what you will look at
the use case side.
So you have like all that different use case
and they can be solved by teams or data domains itself,
but not century.
And that is a different kind of a trend what we have here.
Yep.
We're gonna take a quick break from the episode
to talk about our sponsor, Rutter Stack.
Now I could say a bunch of nice things as if I found a fancy new tool, take a quick break from the episode to talk about our sponsor, RutterStack.
Now, I could say a bunch of nice things as if I found a fancy new tool, but John has been implementing RutterStack for over half a decade.
John, you work with customer event data every day and you know how hard it can be to make sure that data is clean and then to stream it everywhere it needs to go.
Yeah, Eric. As you know, customer data can get messy. ago.
have implemented the longest running production instance of RutterStack at six years and going. Yes, I can confirm that. And one of the reasons we picked RutterStack was that it does not store
the data and we can live stream data to our downstream tools. One of the things about the
implementation that has been so common over all the years and with so many RutterStack customers
is that it wasn't a wholesale replacement of your stack. It fit right into your existing tool set.
Yeah, and even with technical tools, Eric, things like Kafka or PubSub, but you don't
have to have all that complicated customer data infrastructure.
Well, if you need to stream clean customer data to your entire stack, including your
data infrastructure tools, head over to rudderstack.com to learn more. John, you asked about the term catalog.
Yeah.
Let's dig into that because it is I Victor you, when you were talking before
we hit record, the term came up and you got a nice sly, you know, grin on your
face and you're chuckling now that John, you had some questions about that term.
I think it'd be helpful to kind of overlay. So most of our listeners will be very familiar with,
let's say Postgres, right?
To overlay what Postgres kind of bundles for you.
And then let's look at that in this new architecture
and talk about the different layers and what's happening.
And then talk about the names,
like the misnomer on catalogs.
Yeah, yeah.
So, you know, like if you look at the,
like let's take Postgres.
So it's a box which has everything.
It's storage, compute, it's a list of all the things that you can do. name like the misnomer on catalogs yeah yeah so you know like if you look at the like let's take
posters so it's a box which has everything it's storage compute it takes care of your
table life cycle it takes care of access management but what happened eventually that someone
took the tall hammer as my co-founder would say and and just 800 Postgres and it's full apart. And now if you look at storage from a Postgres, you have S3.
And then if you look at compute, you have something like Spark.
And if you look at exactly that part which managed the Postgres, so you as a user or
whoever can communicate, that is exactly the catalog part with what you can actually call
information schema, where you have like your tables
views you have some objects inside of your information schema and that's exactly what we
call a catalog in Lakehouse and there are some like benefits of that type of architecture but we
need to think about like how we're going to manage the governance in that case, because at the end, it's not a single system which controls who's writing and then who's
reading. Now you have like again, getting back to that startup type of organization,
marketing uses spy iceberg, sales uses snowflake, and then how you're going to
give access to your table, who is going to read, who's going to run.
And then there's, I think there's also the use case of data sharing between business units
or between partners or between vendors.
Like I think that's gonna grow as well.
That's a topic for itself, but that's, you know,
like you touched something.
So like with 99% of my discussions, like, okay,
we have big company and we would like
just to build a lighthouse.
And then there are sometimes discussions,
okay, let's zoom out on the supply chain.
And like I'm from Germany, we have like manufacturing cars, automotive,
pharmaceutical, and then in that supply chain, you have like thousands of different suppliers.
And now is the question, let's assume I'm continental, I'm producing tires.
And then you're Mercedes, you're building a Mercedes and then you buy a Mercedes and you drive a Mercedes.
So me as a continental, I have an R&D department who made an assumption about
like how you're going to drive in San Francisco.
And then someone drives in, I don't know, in a different part of that country.
And me as continental, I would love to get that data back cycle to understand
that somewhere where it's like, okay, he's foreign guides plus 70 and someone who is by plus 30 is a different
type of tires what I need or like a rubber on my gum and that is kind of a question what
right now is kind of unsolved because my R&D tries to predict but getting back exactly
to that zoom out on a supply chain we need to build a sharing and not just sharing of
the data, we need to govern that sharing. And there are two aspects to that. And
especially Lakehouse can solve that because Lakehouse offers us some
sort of a no-copy architecture. So I store that in S3 and then I can give
access to S3 to all the different partners. But I need somehow to manage
what the purpose of reading of that data, who is gonna read that data, I need to audit all that different reads.
And therefore I need to date a contract, but not like a PDF in a Wikipedia page.
I need to have part of a computational...
Not an acuSign, right?
Well, you can try and try, but that's going to be hard, especially like if you want to automate the whole stuff.
And if you look at in the future, right now we are in that process of sharing humans.
And I can call John and ask you, can I get your data?
But in the future we will have AI agents and they need a way to automate the whole process.
And that process cannot be done just on the phone.
Ray might call each other. Maybe. Yeah. But I think they expect to have
MCP type is protocol just to negotiate on the way how we're going to use it. And
the funny part is if you have like that supply chain
you might ask yourself okay so now I'm like producing a data product
and then I have someone who just want to consume it
inside of organization or outside of organization. So can I put a price tag on my data product?
So can I just drive the value from that? So we can actually go and then say,
well,
now we can actually create a data economy because now we can sell data products
and that's how data becomes oil, wheat, or whatever type of a commodity.
Well,
I think there's this interesting thing that we touched on that is part of the evolution
of that separation between storage and compute, right?
Super important part of the evolution.
And essentially all of the cost is in the compute.
The storage is almost free.
Like, not quite.
There's a certain level where you can wrap up some cost.
But for most companies, it's almost free.
If you're in the majority of companies
that don't have that much data, it's very cheap.
And then I think you touched on this too.
You've got like, okay, so I've got all the storage
and then what if the person that's asset,
like you have to handle the governance,
but then the person accessing the data brings their compute.
So there's this interesting cost dynamic here too,
where like, it's just, there's an easiness to like,
yeah, like you can have access to data,
we handle governance, you bring your own compute.
And then from a cost standpoint, like,
you're paying for whatever you're using.
I think that's an interesting thing.
Yeah, maybe, maybe that's super awesome because,
well, first thing about that storage doesn't cost much.
Well, just try to count how much do we need to pay it
like for a street to store a petabyte.
The new terabyte is a petabyte.
That's kind of like the situation nowadays.
And there is this estimation that we have 150 zettabytes
of data stored and the estimation by 2030
is gonna be like 2000 zettabytes.
So you can just say that.
So someone who is building their business on storage, it's a good time. It's been a good time for a
long time. Yes, yes. So it's not gonna be that cheap anymore I guess and therefore
we need to drive value. But the good point is how to use the compute on a
different way. And there is two ways. So you can go in and use your NDP engine or
you can think about like if majority of selects or reads
is not that big it's like one gig yeah right and the question is why not to use your own laptop
with something like a ddb data fusion and especially with the power of the the individual
machines now absolutely incredible absolutely and that's something what we think as well in a Lakekeeper,
why not just to take something like DuckDB or Data Fusion,
use a WASM, embed that in a browser,
and you just go open the browser
and then you can write your query,
which will use your computer and your local machine,
go through catalog to rest, read the parquet file,
and you will get your results.
So it might be not a millisecond
response, it's a couple of seconds, but if it does the job, why not to do something about it?
So what do you think about in that architecture, like obviously people are going to have security
concerns because I think they probably have a little bit of a false sense of security when
everything's like processed on secure servers that you know and then now it's like processing
local like what how do you think that is going to be approached the
security challenge? Yeah, so the governance is a very hard topic and
Is like kind of a question who is gonna?
issue the key of access yeah, and
If you look at the organizations and that they are very free in choosing the tool
So you usually if you go to like the large enterprise, they have everything.
Yeah.
And then there's a poor guy, Siso, who needs just to say like, it's all secured.
Don't worry.
Yeah.
There's no data bridges.
And then there's,
you said we have everything and don't worry.
There's no data bridges.
Yeah.
It's not. Yeah. Yeah., that's kind of a tricky question. And now looking at Lakehouse, it might be my biased
opinion, but I think that the only one place where I can just say that a person, a group, a tool
can read, write is a catalogue. Because the catalogue is same like in a postgres. Postgres you will say okay
that role can read and the same in a catalog and it's actually what we do. We connect you with
IDP and that was one of the decision of Lakekeeper team so we're not going to be at IDP so we're not
issuing any tokens whatever. We can just connect to mtroid, octa, kitlog and then we're going to
use that token and inside of Lakekeeper we have authorization concept based on Google Zanzibar paper we use openFGA so we
re-bag a super set of A bag for our bag whatever bag and we can actually manage
inside of catalog and say okay group A can read the table and person B can
write to the table and the catalog is in place and then doesn't matter which tool
all of them will go to us and say okay I am a person and we can actually
solve that problem for a CISO. Yeah very cool. Yeah I was thinking about the CISO
and the concept of IT archeology. Yeah. That's not the type of dig you want to.
But I mean that is a pretty strong selling point around security because we just drastically simplified that very big problem.
And maybe just what I would like to add because I have a lot of conversation for companies is like, okay, so from a governance perspective you have like a security,
but there is an additional concept which is well, companies try to avoid that. But let's
assume the situation that I'm the owner of a table and I take a customer table and someone
is using that table, but I have usually no idea about that person that they use a table customer
for whatever purposes. But I hope that the purpose is just to build a report and goal and make a
business decision, which will lower the cost or it's just get some revenue in a company.
And if you go to enterprise, you will find like that situation that we have a hundred
thousand of tables and the owner has usually no understanding who is using what type of
table.
But what I can do as an owner, I can go and make an alterty.
So from security, from airbag, it's all solved.
But that will cause a problem on your side or consumer side that because your report is not
working anymore, it will break. So you are impacted on a business decision. And from a
governance perspective, that's a very eventual kind of, it's a very important part, because what we
need to do is somehow to solve the problem that the consumption pipeline is
Unbreakable that business is not interrupted and that exactly what we built inside the playkeeper the interface that communicates with a contract engine
Which means let's assume I am running alter table
So the air bug will tell your the or the administrator go for it. On a second step there is a business constraint inside of a data contract
and there is an SLO, stable schema, which actually prohibits me to do that type of operation.
So the laykeeper will examine that SLO, tell me where is the conflict.
On the next step the laykeeper can inform every consumer and need that
there are two persons or one person who uses that product
So what I can do now I can go and turn the contract and take a year or seven
Sunday there is grace period change your report adapt to the new system
And that's the way how we can make achieve that that the pipeline the consumption pipeline will become unbreakable
and going forward in couple of next years if
become unbreakable and going forward in couple of next years, if there is no like a human in that process but AI agents and that will help them, you know, like just to repair stuff.
So what do you think the biggest practical barrier is to adopting that, like today to
adopting this architecture and then maybe what does that look like next year in a few
years?
Yeah, so the, I would say from a lake house perspective is that it's still brand new and
we miss a couple of things.
What do you think the biggest missing things like for people that are like I really want
this?
Yes, from a technical perspective, the missing part is the optimization or compaction of
a table.
So it's very hard issue right now.
Because you know, on a day one, you start inserting in your table, all is good. On the day two you're
trying to run your report, it doesn't work anymore because the table has too
many small files. And I think yesterday was like LinkedIn presenting some data
about like compaction, how hard that issue actually is. And which means you
need to go on day two and run a compaction where you're going to take all
that small files, let's say 10 small porke files and write one big file. Because it's not just the
performance, it's costs, you know, like every get and list on S3 costs Japan. And if it's like
hundreds of gets instead of one get, so you have a different cost bill at that. And that is a very hard issue right
now. So to solve the compaction, I know like a lot of companies trying to do that. And
so again, from a catalog perspective, I think catalog is the best place, a way to just tick
the box and say like, okay, that table should be optimized today. That's it.
I know we're getting closer, but I want to ask about something you mentioned at the beginning, which is fascinating.
So this architecture enables a world where you use supply chain, for example, so tires, you know, car manufacturer, and then the actual, you you're not in the car or you're not
driving, just sitting in the car, you know, but it's still, you know, rubber on the road.
What's interesting to think about if we, you know, there's all this technology underlying
that the catalog enables, you know, all these interesting ways to execute contracts between
multiple different parties.
But what we're talking about is an economy where products are being exchanged, right?
Like there's an exchange of goods, it's just that it's data, and the architecture actually
enables that.
How do you think that economy will form in terms of the actual format of the transactions,
right?
Because there is this really fascinating set of commodities that are currently not monetized
because the pathway to monetization is very inefficient. Like it is actually a ton of work,
there are security considerations, right? But the future you are describing is that we now have an
architecture that can create efficiencies there. And so what's the mechanism that's actually going to enable the exchange of goods? Well, I think we still have to develop some stuff,
because when I talk to companies and they would like to share some data, and that is a misconception,
you shouldn't share data, you should share the data product. Yes. And the data product is a bit
more than just a raw table. And so that is a piece which we don't have at the moment.
So I know like a lot of startups trying to build something
around the data product,
because as in a physical world,
you don't wanna buy plastic,
you would like to buy a product, right?
That they can use.
And if we have that piece,
then we can think about like what type of platforms
we can use to exchange the goods.
Is it going to be like the Amazon for data products? Is it going to be a NASDAQ for
some sort of like a commodity exchange and so on? So there are a lot of new stuff coming up in the
next five to ten years around the data now. Yep, man, that's going to be really fascinating.
Okay, Victor, we're at the end, but tell our listeners where they can find out about Lakekeeper
and it's an open source tool so they can go try it out.
Yeah, well, everyone is invited just lakekeeper.io and then you will find actually the whole
information or just go to the GitHub, try it out, give us a feedback.
We're building that not in a bubble, so everyone needs just to try it out, give us a feedback
and if you like it, give us a star.
Would awesome just to get a star.
And we open for contribution, so we're not paused.
So if you want to develop a feature, you're welcome.
Great.
Awesome.
All righty.
Well, thank you so much for joining us here in Oakland, Victor.
Thank you, guys.
All right.
That's a wrap for episode four here in person at Data Council.
Stay tuned, we've got more coming your way.
The Data Stack Show is brought to you by Rudder Stack, the warehouse native customer data
platform.
Learn more at rudderstack.com.