The Infra Pod - Next-Gen Data Processing with Incremental Query Engines (Chat with Gilad from Epsio)
Episode Date: July 14, 2025Join Tim from Essence VC and Ian from Keycard as they sit down with Gilad Kleinman (CEO of Epsio), to explore the next generation incremental query engine they're building. Gilad shares his journe...y from a software developer focusing on low-level Linux kernel development to identifying fundamental gaps in data processing. They delve into the philosophical questions regarding data complexity and redundancy, discuss the theoretical advancements propelling their project, and highlight the practical issues faced by companies in achieving efficient, scalable data queries.00:35 Journey to Building Epsio03:33 Tackling Incremental Processing10:08 User Experience in Data Systems17:24 Scaling and Integration Challenges18:43 AI and the Future of Database Management32:21 Spicy Hot Take!
Transcript
Discussion (0)
Welcome back to the InfraPod. This is Tim from Essence and Ian, let's go.
Hey, this is Ian Livingstone, builder of cool identity software. I couldn't be more excited
today to be joined by Ghalad, the CEO of Epsilon, building a next generation incremental query
engine. Ghalad, tell us a little bit about why in the world you started building this
company and how you got to take the plunge
on this specific piece of tech and say, you know what, I'm going to bet a good portion
of my working time in life to building this technology.
Yeah, sure.
So first of all, happy to be on this pod.
Of course, happy to share a little bit about the background.
So basically, I've been a software developer for most of my life, dealing with things from Linux kernel,
like more on the low level side development,
backend development a little bit early on.
And kind of like when looking on the evolution of
most of the projects I worked on in my life,
me and my co-founders saw two trends
happen pretty much everywhere that are very obvious,
but cause some philosophical questions.
We have more data and our queries become more complex. pretty much everywhere that are very obvious but cause some philosophical questions.
We have more data and our queries become more complex.
And although these two things are like super obvious, I think if you look on it in a philosophical
way, if you have more complex queries and more data, there becomes like this fundamental
gap between the data you collect and the data they want you to show your end users or the
results of the processing that you collect and the data that you want, you show your end users or the results of the processing
that you're having.
So for example, if you're building a SaaS
that collects salary,
you're probably collecting a list of salaries
and that's what you're saving in your database.
But as the product evolves,
you want to show some of the salaries per department,
graph of growth of salaries, et cetera, et cetera, et cetera.
And the more data that you have and
the more complex of queries that you want to do,
the more work your database is going to need to do from,
hey, this is a list of salaries to,
hey, this is of selling salaries per department,
or here's a nice pretty graph.
Looking on this evolution and this gap
getting bigger and bigger and bigger,
it was pretty odd to us how much redundant work is going
in most data stacks to recompute data that's already been recomputed.
Again, if you're collecting salary, somebody opens your dashboard,
most chances, most of the salaries did not change from the last time,
yet all the batch processors in the world, Postgres, MySQL,
or whatever other database you're using,
just scan the entire set each time,
output the result, five records saved,
and do everything from scratch.
We're really intrigued and again,
philosophical bridging that gap and helping
companies remove that redundancy and
make products that are faster and more efficient.
That's the high level, obviously,
that goes into a lot of places,
but I think that's kind of like what brought us
to work on FCO and kind of what we're excited about.
And was there some catalyst or some change you saw
that you're like, hey, this is the moment to go do this?
Because incremental core engine,
I mean, we've been talking about them for some time,
but what was it that said, hey, this is the thing I'm seeing
that's like, this is the problem I gotta solve today
versus like, you know, a problem that needs to be solved five years from now?
That's a great question.
And obviously, we are not the ones
who invented the concept of incremental processing.
That's something that has been around for a very, very long
time.
And the concept of, let's save the results.
If, for example, in the salary example,
a new record gets added, just let's do plus to a previous sum.
But I think kind of when looking historically on the evolution of this,
I think there's like kind of two interesting trends to look on.
First is just the theoretical trend,
where basically sum of salaries is a really, really easy example,
but probably most slow queries are probably the most complex.
And up until the last couple of years, just from a theoretical perspective,
being able to incrementally maintain these thousands of lines of SQL queries
efficiently, correctly, and most importantly in scale was kind of like impossible.
When you look on like, for example, the implementation of MSSQL incremental views
or a thousand other attempts to kind of create these incremental technologies in the past, When you look on, for example, There have been a lot of really amazing advancements
in the theory world, articles like Differential Data Flow, with real scale. And I think that kind of like was what we saw and got excited about and was interested in implementing.
And alongside that, I think another thing was kind of
when looking on the advancements of integration
and things like that, I think like we saw a lot
of these technologies evolve
and these theories become more mature.
But when we kind of look on like why companies
are not using it and why we're asking ourselves, why are we not using it? become more mature.
combination of great replication tools, companies like PureDB, 5Tran that do a lot of great things on making replication really easy and integration together with these theoretical
advancements are a real opportunity now to make an incremental core engine that works
and is easy to use.
Yeah, I'm very curious because you mentioned that differential data flow even DBSP's these are not brand new
I think the concept has actually been there for some time and we've seen
Databases, I think back and it was materialized and there's some other companies and building almost like this incremental materialized view
Supports basically but just curious like what you saw
That the market really, really
needed because like I said, I don't think this is new.
There's some other products that almost offer something in this in the middle, but what
is like the biggest gap?
Like, okay, even though there are other databases that sort of do sister incremental view supports,
what is FCO sort of brings that is actually not that easy for other databases to achieve right now?
Yeah, so I think first like from the theoretical perspective, I think that
like a lot of times people talk about like when theoretical articles were
published and like academia world is really slow and I think it takes a lot
of times to look on like some of the LLM theories that kind of push the
advancement. Sometimes it takes a lot of time to kind of trickle down to the market.
But other than that, I think that the way that we kind of look uniquely on the problem
is that we start from the batch processing world.
I think a lot of these technologies like Materialize, like DBSP, more come to tackle and replace
stream processors originally.
And we kind of focused on the problem
of slow database queries.
And I think that's kind of like where stream processing
originally started from.
You had a batch processor and things became slow,
so you slowly needed to use and build stream processors.
And we really try to give the same UI and UX for the users
like the existing database
that they have.
We integrate natively into Postgres, MySQL database
that they already have.
And they could just, instead of in the existing database
do create materialized view, they call FCO.createView.
And for just slow queries.
And most of our customers don't replace us
with stream processors or microservices
or just things like that.
They have slow queries that they probably would have thought about maybe in the future
building stream processors for, but right now it's within that context, if that makes
sense.
It'll be great to maybe give maybe an example of what your difference you think from a batch
or slow query versus a stream processor.
Because I feel like even the example you mentioned earlier
for the salary thing, obviously there's a recomputation
of the whole query that makes it slow.
So you're able to only do the incremental changes.
But those incremental changes that's happening at the table
seems like a stream processing as well
because there are data coming in almost like a stream, right?
You know, you would actually process it as a stream.
But you're mentioning something a bit more,
is it a data volume here?
Is it a number of queries or a number of nodes
that can be able to execute?
Like, what makes yours more slow query batch processing oriented
for some other databases or more stream processing oriented?
I think it's all about kind of UI, UX, and kind of the wrapping
and kind of like what use cases you try to focus on.
And like, I think specifically, like the way we see it
is that the UI and UX and ease of use of batch processing
is much easier for a lot of companies.
And they don't want to think in stream processing ways.
They don't want to push data to a Kafka topic.
They don't want to care about the BZM
and then JDBC to sync it back and weird things like that.
So like, I think that the thing is that we try to abstract
that away as much as possible.
And I 100% agree with you.
We're doing stream processing internally, and we're
using stream processing concepts. We just try to abstract that away from the user and
kind of like build a stream processor that looks like a batch processor. Because batch
processing is much more intuitive, much more easier to understand, I think, in many scenarios,
if that makes sense. So we're 100% a stream processor,
but we kind of believe that the reason stream processors
perhaps are not as widely used as they should be
is because a lot of people don't want to think
in a stream processing world,
and we try to abstract that away.
And so what's the experience building
that's differentiated if you kind of abstract
these complexes away, but as a developer, someone trying to build a streaming system,
like how do you make it simple for me?
Like, help me understand like, why is this simple and better and easier and
accessible because one of the biggest challenges to date with stream processing
in general, like the most used system is Flink.
Like it's like, it ain't easy, right?
It ain't easy to hang out, use Flink, scale it, deploy it, even rationalize about how
the system actually works.
It's like this weird inversion of the way you program.
It's not imperative.
So, help us understand what is it that Epsilon is doing to make it approachable, consumable,
and scalable, both in terms of technological data volume, speed component of that.
But I think the scalability part that's always one of the real issues with streaming stuff is
because it takes such a level of expertise about a production streaming system with say a flank or
existing stuff, like nobody builds streaming stuff unless it has to absolutely and can only be
streaming for the use case. So I'll give a couple of examples.
First of all, from like the integration part,
like the way you would deploy EPSIO,
it's a single component you run within your environment.
You give it the connection details of your database
and a dedicated schema that it will create
its stored procedure there.
From that point onward, you install EPSIO,
one-liner install, you give it permissions to your database.
To create a streaming query or streaming calculation, all you have to do is call a stored procedure within the existing database,
give it the SQL of the query, and the result of that would be
materialized into a result table that sits within your database. To give the parallel in a Flink world, you do not want me to spin up the
Visium, then Kafka, then Flink, then Kafka again, and JDBC2Sync, and also
probably Avro and a couple of other fun stuff to kind of make sure everything
plays together.
So first of all, just from the amount of components and moving parts that you
need to think about, it's much lower.
More fundamentally, I think, kind of like if you think about
doing like an end-to-end stream processor,
there's a lot of concepts that you can abstract away from the user
if you're building it end-to-end.
For example, the concept of a transaction.
When you're writing streaming transformations in Flink,
you're probably using the BZM to consume the changes,
you're pushing the changes to a Kafka topic,
each table to a different topic,
you have a Flink processor and then you output the results.
Somewhere along the way in the Debezium part,
you're losing the concept of a transaction.
So for example, if in a single transaction
in the source database, you're deleting a record
and inserting a record in a different category, that's something
that very easily can get lost within the process and you need to think about that a lot when
you're building a good stream processor. For example, that's one of the things that we're
abstracting away. We're taking care of transactions end to end, meaning we have a very strong
guarantee that each transaction that you perform in your source database will translate into a single transaction in the sync database alongside ordering, of course,
and kind of like abstracting way that you have a database, you think in transactions.
So you don't need to think about like the ordering of topics in your Kafka things internal.
These are not things that you want to care about.
What are some like common use cases where people are saying, hey, you know what, you
really like unlocked something I could have done before.
Tim and I, and many people have done anything in data, have talked a lot of like streaming
tech companies.
There's many, many, many streaming tech companies that have come along.
And all of them have their own little wedge, a little position of what they do.
I got a curious like, what is the sort of place you're broadening out for
people that enables people to do something like they couldn't before?
Like where you find that, that traction, do you have like a narrowed use case
that actually you land here and then because you unlock something from
somebody and then you can kind of take over the world or what's, what's it look?
Yeah.
So I think originally like most initial use cases are like the classical like customer
facing analytics.
We have some fraud detection use cases and things like that.
My differentiating is that streaming is not a goal.
It's a tool to reach a goal of like having a heavy transformation, having like some heavy
calculation.
And I think it's honestly really exciting.
Most companies try with the classic dashboard analytics fraud detection use cases.
At the end of the day, if you have a lot of data, you're running repetitive queries,
that's a pretty generic and pretty wide thing that you can build a lot of stuff on top.
You can do it. That's obviously relevant for BI.
That's relevant for DBT models that you don't want to
recalculate the same thing from scratch.
That's really important for cost saving also.
So there's kind of what we see usually as
the initial pain point of,
hey, I try to squeeze my database and just can't.
But after that, I think just thinking about things
incrementally and not in batch,
that can evolve into a lot of exciting and interesting places.
Even data replication.
For example, we have a customer who started, again, from a customer-facing application,
but then they have an elastic database that they want to replicate their data,
but they want to kind of flatten all the tables.
So they just suddenly realized that it could be super easy to basically join all the tables
and then write that into Elastic
and kind of replicate things,
not in a dummy way of just copying things
from one place to another,
but kind of enriching it and moving.
And since the UI is as simple as a materialized view,
suddenly that's also relevant because it's already there.
It's just defining another query.
It's call FCO.createView, do the join
and dump the result of that into Elastic.
You know, Postgres of the world,
since you're building sort of like this
incremental query engine on top of existing databases,
these databases are typically, even though you can shard,
you could, you know, do different kind of replication.
The main execution of the writes and stuff, it's just really just single node, right?
But the flink of the worlds are designed to be, hey, run any number of machines, right?
There is a state to it, but there's also like the way it builds, you know, it's not infinite
scalable but like much more scalable than one single node. And it's kind of funny that you're doing string processing with both...
I saw that in the document, there's a statefulness and a stateless,
there's this idea that we aren't able to execute a variety of different results here.
It's the assumption that all the data that you'll be handling
is going to be fit in a single node anyways?
Or you also need to be able to handle it
at a scalable way somehow?
Because it's really hard to tell based on the information
of your deployment and stuff.
It doesn't seem like this is like an infinite, scalable,
elastic type of execution engine,
which probably doesn't need to be.
But I'm just trying to understand,
where do you see your engine fit?
Is it the confront of like, okay, everything that fits in a single node is good for us.
If you have to be scale outs, go use, you know, I don't know,
Cockroach or something else.
And those people will handle their own replication and you're just kind of
within this single node world or maybe tell me more if that's the
right way to think about it.
So definitely not in the, in the single world and a single node world? Or maybe tell me more if that's the right way to think about it. So definitely not in the in the single world and a single node world.
And we're actually starting work on a cockroach integration
because we got a couple of polls from from companies using cockroach
and kind of reached the scale limit of what obviously a single
Postgres instance can offer.
Today, we support like only you can't take a single view
and spread it across multiple
EPCO instances. That's definitely something that we will have in the future and kind of like already
built the structures to allow that. Companies that do have large scale what they do is they just like
separate logic into many separate views and then just separate that across many instances and kind
of like shard. You can think of it across many instances.
And kind of like Shard, we can think of it across many EPCOs.
In the future, that's something we're definitely going to go into,
of like having a cluster of a thousand EPCOs writing to a thousand Postgreses.
But currently it's one EPCO view per EPCO instance, do support sources and syncs that don't have to be a single Postgres instance.
We also have customers who have many Postgres shard themselves instances,
and then they use FCO, for example, to read from 10 Postgreses,
aggregate it, and dump it to an 11th Postgres, if that makes sense.
Actually, you bring up a really good point. I want to ask you about this.
Given that your customers are either running Postgres themselves or using
some kind of RDSs or even mentioned cockroaches and there's a few other
like scale out databases out there.
I feel like what the vendor markets versus what the actual customers usage and
how they actually deploy and operate.
There's usually a big mismatch, you know, how they actually run their operate, there's usually a big mismatch,
you know?
How they actually run their database is always a hilarious thing.
What's sort of like the most common anti-pattern or bottleneck you see people running with
your popular database like MySQL or Postgres?
What is the thing that they always keep running into that you have to almost educate them
or just help them in a way.
Yeah, that's an interesting question.
I think obviously there's a million small parameters
and things like that that you probably want to tune.
Thinking about your schemas,
I think a lot of people don't understand
the performance implication of building the correct schema. Like from stupled things like JSON versus JSONB
is your primary key a UUID or an integer?
And a lot of things that I think people kind of like separate
between the schema layer and the performance layer.
So that's one thing I see kind of happen a lot.
But other than that, I don't know.
I think logical replication, specifically in Postgres,
is kind of something we saw a lot of people fall on.
And I think some companies had a lot of issues with it
that we needed a lot of times to help kind of like fine tune
and make sure that it works correctly.
I have a lot of some criticism and thoughts about the way
logical replication works in Postgres.
So that's maybe one place I think people sometimes fall on.
Maybe let's also, I want to touch on the future database,
like the new SQL type, we're not making new SQL,
the cockroaches.
Oh, oh, oh.
I just want to like, because you know,
I think we haven't really hear or see people talk about the
bottlenecks of these databases.
Like people just assume they're just much more scalable.
They can, you know, the reason you pay a separate database instance like this or
like the Spanner type, right?
It's like you can scale forever.
Um, but it sounds like it's not true.
You have customers running into issues with those.
But it sounds like it's not true. You have customers running into issues with those.
So maybe enlighten us too,
like what are the typical skill and issues
you can even run into in this like skill outs,
everything will be great kind of database, you know.
Yeah, okay, so I misunderstood the question,
but I think over there, like it's specifically,
and like I think I remember also reading
the documentation in Cockroach,
like these databases like are not meant to aggregate data from separate places.
They're really good at distributing the data and having a lot of simple selects.
But if you want to now aggregate, join, and do a lot of complex aggregations on top of
separate shards or whatever you call that, they're not solving that problem.
They're meant to solve the problem of a lot of writes
and simple selects.
They're less performant in complex aggregations,
joins, across wide variety of data.
And that's fine.
I don't think that's, they're like an OLTP engine,
and they're not meant to do a lot of heavy cross-shards
aggregations.
Just to give an example, some customers have like,
literally they want to aggregate data
in different geo locations.
So just network-wise, just to run a query
to aggregate all this stuff can take time
and you probably want to pre-calculate that
a lot of the times.
What do you think the future is?
I've been in San Francisco for the last couple of weeks playing the West Coast, East Coast
flight game.
On the ground, one of the things we're talking a lot about is what's the future of development,
what's the future of programming, what's the future interfaces that people interact with.
I'm very interested to understand on one side, our data ecosystem is an incredible legacy.
Once data is in a system,
that system is there for almost forever.
Migrating data to a different database or to a different thing,
it's like there's so much inherent.
The system you build around it is so
inherent to how that data is stored and organized,
and the inherent properties of the system that used to
manage data that it makes changing incredibly difficult.
And I'm kind of curious, on one side you have that aspect,
on the other side you have all of these new coding tools.
What do you think the future of data ecosystems look like?
Are still good people going to adopt
these large giant databases?
Is something going to be different?
And then also what's the future
of how you interact and build ecosystems?
We're kind of curious to get your perspective
on what the future UI UX looks like
and how the data world changes with the rise of LLMs
being the entry point for a lot of developers to code bases.
Yeah, so I think on the data layer at least,
I think that where the layer at least, I think that kind of like where the goal is kind of more
going into like, I don't know if to call it distributed,
but kind of like a lot of components
that know how to play together well.
Like I think in a world where it's really hard,
as you said, to like move, migrate to database,
we have one of our customers has a set of quote
that I really love that migrating a database
is like changing a car's
engine while driving. I couldn't agree more. It's really hard to move databases, like change the
basic structures of your data when the company is moving. And I think that's why kind of like
the market is more going towards like kind of an ecosystem of databases that talk together
and integrate. I think clickouts, for example, are doing a lot of great things there for
the replication, for example, from Postgres to Clickhouse.
They have a lot of databases that kind of know how to play together very nice.
So on the data layer, I think it's definitely going there where you have a lot
of components that integrate and play together.
And then on top of that, like the coring LLM tools
that also need like a better interface
and more generic interface and kind of like a consolidation
of things that these upper systems can talk with.
And also they're like part of the reasons we try
to be Postgres compatible or MySQL compatible
when we're in MySQL exactly for these upper systems to kind of have like a
Single language that they can talk with these lower systems and then the lower systems talk and distribute and play together
together and
Do you think like systems like Epsio will play like a big part for dealing with the sort of one?
You are solving the data movement problem today
Which is like how do I get data from one sync
to another sync to another sync to another sync
and sync and source?
But do you think developers will be interacting directly
with these data systems,
or do you think that's actually gonna be like
delegated to some LLM or what do you think?
So I think it might be an LLM in the future.
I think, funnily enough, like originally,
when we started Epsio, we thought about automating the views
which we materialize and kind of ourselves choosing which queries you want to have incremental
and which not. And we quickly understood that that's something really hard to do.
Because other than performance, there's a lot of business logic injected into that.
For example, you probably want your buy button to run as fast as
possible and two seconds buy button is probably too slow, but delete my account
that's probably a button you don't care at all that takes a lot of time. So we
kind of understood with the companies that they really know well which parts
are important and which not and that's something really hard to kind of
abstract away from the user.
So I'm hopeful regarding the long run of LLMs and kind of the state of AI to
better understand that.
But I still think there's like a long way until like it can automatically
understand the business implication of speed up in each place.
You know, this is curious because it's hearing you talking about the architecture
and your
customers and being an investor ads, I just always keep having this question in my mind
because my own startup was also trying to do performance optimization for containers,
just not for database queries, but just in containers in general.
And it's so hard to tell people your stuff sucks, You know? And also like, even though they know,
they may not want to buy it right now.
Like there's almost like a timing thing where like,
I am so underwhelmed or I feel like I have no idea
how to do things.
It's not like that common actually to find people like,
oh, I definitely have to go buy a solution
to speed things up.
They always feel like this is like the last resort of last resorts.
And I'm just curious, like, how do you get over this hump of like, not just people interested
in this problem, because people are always interested in this problem, but getting to
like actually say like, you know, I want this now is such a difficult task.
I don't know if you learn anything trying to do this because it's still a PTSD in my
mind, you know, something any performance related, it's just like, it's like, they love
to debate technical stuff with you, but like actually wanting to buy there's a risk involved
and there's an ego involved, you know, both sometimes.
Yeah.
I a hundred percent agree.
And I think one of the big realizations, like I talk a lot about performance, but actually 100% agree. five times faster.
questions that I ask and that we talk with companies that are interested in using us is not how slow things are. Because that's interesting, but that's like not a hats on fire.
But like how much blood, sweat and tears did your engineers already do or are planning to do to make these things fast?
And then you have like the big value propositions.
We work with companies who like it's not's not, obviously not uncommon where you have
five engineers working on, like, re-hauling the database infrastructure because everything's on
fire. You have, like, obviously a lot of engineering time goes, and I think, following your point,
like, I talk about performance, but, like, the real value proposition is how easy it is to reach
the performance you have already and make changes, Because a lot of companies build a pile of patches
and then each time they want to change one small tile
in the dashboard, it's like a month of development.
Even if they already did all the caching,
stream processing and things like that.
So I 100% agree and the value proposition is
R&D time and velocity.
And that's how we measure ourselves.
That's why how we discover.
And if you have a query that takes an hour
and it's not painful enough for you to put an engineer
on that, that's totally fine.
We're probably not valuable enough for there.
And you know, the whole world is buzzing AI
that we just talked to just briefly about.
There's a whole category of AISREs now. We're like, I'm going to go look at your logs,
and I'm going to go fix things for you.
Or I'm going to even try to tell you exactly what
to do to fix things.
And they're pretty umbrella messaging, like anything.
I'm not sure, you're in a database site,
so you'd probably say this is not going to happen.
But I'm just curious, just given that you are trying
to fix people's problems, and you are building
a product like this, do you see yourself that AI is enabling
things to go faster, maybe because there is an engine that you built already, there's maybe a
faster way that AI can help you get bridged into your product? Or maybe down the line,
there is an opportunity for AI to look at database logs and schemas
and just even do the fine tuning, the DBA type work.
I've actually seen AI DBAs being proposed quite a few times.
Do you think that's real in the short term?
There will be an AI DBA of some sorts?
I think the recommendation is kind of something
I believe a lot in.
I think there's like the taking action
and then there's a recommendation.
And I think like when I talk later about like a AI agent
automatically creating FCO views,
that's something I see a long time
until I personally believe that's there.
But like recommending which views,
like analyzing the query history, seeing for example,
I don't know, looking at the screen recording,
seeing where there's rage clicks,
understanding which queries happen there,
and kind of like suggest that automatically.
I see like a lot of places that like,
EPCO could give toolings to MLM's to do better suggestions.
I do just think that like, to give toolings to LLMs to do better suggestions.
I do just think that I came from a cybersecurity company
before and we always talked there about visibility always
comes first before taking action.
And I think that we still need to build a layer of visibility of LLMs that can give you visibility and recommend. If we do that well, then great.
We can also apply policy, but that's always like the first step.
And I don't think we're there yet.
Like the AI DBA, like I think still on recommendation side, there's still
some, some work to be done to provide the visibility.
Awesome.
Well, I want to move on to what we call the spicy future.
Spicy future.
Tell us what's your spicy hot take, maybe in the info world database,
or whatever world you want to be declaring here.
What's your spicy hot take of the info stuff?
Yeah, I think it kind of connects to stuff I already talked about.
So that might not
be fair, but I really think UI and UX is underrated in the database space.
And I think the fact that you don't have a pretty screen doesn't mean that the experience
of using it doesn't matter.
And I think that's something I live and believe every day.
UX matters even if you don't have a pretty screen. What's an example of a shitty UX for database?
Or like, what's a scene that you're seeing that makes you feel like this is a hot take?
I think like, obviously there's like the flinks and things like that that obviously don't expose a nice interface.
I think generally there's a lot of specific examples
that I don't know, in MySQL there's some pretty weird
things sometimes that we come across.
Postgres replication, for example,
I'm not sure that that's a great interface
I would expose to a user.
I can give a lot of specific examples,
but I just think about the prettiness
of the experience is something worth focusing more on. And it's very
easy, like developers like configuration and they can dive
deep into things. But that doesn't mean you have to.
Yeah, I think this is what I feel like. Because you know, I
worked on databases a little bit and and work with a lot of
friends are working on databases. And especially like
Postgres or worlds have been loved for like 30, 40 years or so long now.
Most of the maintainers have been working on for like 10, 20.
It's like the kernel, you know, sort of crowd
and their best UX is what they knew.
And they also don't want to change things.
Well, we always done it this way, right?
We always added one more flag.
We always just add another release notes, you know? And so just, you just keep doing that. And so I feel, I don't know if you see we always done it this way, right?
have no incentive to change it because they have paid product. Yeah, that's a problem.
Like why?
Why should I make all my other things so much better when I can just like spice up my data
breaks notebooks, right?
But to make it harder.
Yeah, actually it is incentivizing them to make it harder.
They are actually prioritizing open source a lot now, just or like completely just not
touching it pretty much. You know, it is sort of the sad reality of what it is,
but I guess it's also a necessity for them to make money.
So it's almost like it's very hard to find people in this space
that actually have the motivation to do it.
It's actually very hard to have great UX if you only touch on one tiny part of the system, right?
You know, to be like in control in different parts and able to like herd the cats and, you know,
I don't know, it's just very difficult to actually see that happen in this like open
source, widely popular database world, you know, unless there's like a crazy guy like Linus or
something like really like, yeah, I just care about this and you guys all can just F off.
It's really difficult. Somebody has to be like a dictator.
I think that's a great example and I'm a big fan of Linus Torvan. I think he's a great example of
somebody who's obsessed with interface. I agree on that. I think if there's alignment, there's
a lot of alignment issues in the open source and data world. I think it's really hard to build good alignment
in these projects to make sure that it's encouraged
to build a product that's not harder to use.
Love to understand from your perspective,
what do you think the future of data interop is actually?
You were talking about standards,
but is this actually a solvable problem? Does know, does AI solve this problem in a world where like you can potentially like
turn this into like a machine learning problem set instead of a everyone has to go right
interop.
The fundamental problem with today is we have all these different standards and all these
different formats and all these different systems, all these different characteristics.
So we ultimately end up with transformation layers, right? And I guess the
question is, do we actually care about standards in the future? Or is this something we'll use in
LLM to write all the interop? And so it's like, the standards actually really matter anymore,
because like the systems can just like figure out, build interoperable transformation layers that
that just work. So that's an interesting question. Personally, I think that like,
So that's an interesting question. Personally, I think that a well-defined interface would still be necessary.
So you'd still want the LLM to have a very defined way of doing specific things.
And even if it's an LLM, you still like data is a very precise matter,
and you want to be very precise about which action does.
It is possible, though, that the interface will change,
that you'll want a separate interface.
Maybe SQL is not the best thing if you're an LLM.
Maybe you want some other interface to expose to there
that's still precise but is more beneficial for LLMs.
Having said that, I still think that well-defined interface, even in an autonomous
world, is still beneficial because then you can have good separation. That the LLM doesn't
care about the internals and the internals don't care about the LLM. And you can only
achieve that separation with a very strong wall with very defined entrances and exits.
Awesome.
Well, I think there's so much we could probably dive into this whole data
spaces, but for the sake of time and stuff, what is the way for folks to
learn more about EBCO?
Like, do you have social channels?
Do you have any places people can find you?
Like what's the websites or social channels we should be shouting out for?
Yeah, so we have like a like our website, obviously, where we have also like a newsletter
that you're more than welcome to sign up and look for news.
We're mostly active on LinkedIn, so you're welcome, obviously, to follow us there.
And more than that, like I said, we believe in ease of use and we're willing to stand
behind it.
So the easiest way to learn is just try it and see how it works.
Try to create incremental views, see what works and what's not.
And also obviously the documentation.
But yeah.
Awesome.
Well, thanks, glass up being on our pod.
And I'm sure lots of folks will love it.
Thank you so much.
It's been a pleasure.
Thank you.
Thanks for having me.