The Infra Pod - Next-Gen Data Processing with Incremental Query Engines (Chat with Gilad from Epsio)

Starting point is 00:00:00 Welcome back to the InfraPod. This is Tim from Essence and Ian, let's go. Hey, this is Ian Livingstone, builder of cool identity software. I couldn't be more excited today to be joined by Ghalad, the CEO of Epsilon, building a next generation incremental query engine. Ghalad, tell us a little bit about why in the world you started building this company and how you got to take the plunge on this specific piece of tech and say, you know what, I'm going to bet a good portion of my working time in life to building this technology. Yeah, sure.

Starting point is 00:00:34 So first of all, happy to be on this pod. Of course, happy to share a little bit about the background. So basically, I've been a software developer for most of my life, dealing with things from Linux kernel, like more on the low level side development, backend development a little bit early on. And kind of like when looking on the evolution of most of the projects I worked on in my life, me and my co-founders saw two trends

Starting point is 00:00:59 happen pretty much everywhere that are very obvious, but cause some philosophical questions. We have more data and our queries become more complex. pretty much everywhere that are very obvious but cause some philosophical questions. We have more data and our queries become more complex. And although these two things are like super obvious, I think if you look on it in a philosophical way, if you have more complex queries and more data, there becomes like this fundamental gap between the data you collect and the data they want you to show your end users or the results of the processing that you collect and the data that you want, you show your end users or the results of the processing

Starting point is 00:01:26 that you're having. So for example, if you're building a SaaS that collects salary, you're probably collecting a list of salaries and that's what you're saving in your database. But as the product evolves, you want to show some of the salaries per department, graph of growth of salaries, et cetera, et cetera, et cetera.

Starting point is 00:01:43 And the more data that you have and the more complex of queries that you want to do, the more work your database is going to need to do from, hey, this is a list of salaries to, hey, this is of selling salaries per department, or here's a nice pretty graph. Looking on this evolution and this gap getting bigger and bigger and bigger,

Starting point is 00:02:03 it was pretty odd to us how much redundant work is going in most data stacks to recompute data that's already been recomputed. Again, if you're collecting salary, somebody opens your dashboard, most chances, most of the salaries did not change from the last time, yet all the batch processors in the world, Postgres, MySQL, or whatever other database you're using, just scan the entire set each time, output the result, five records saved,

Starting point is 00:02:31 and do everything from scratch. We're really intrigued and again, philosophical bridging that gap and helping companies remove that redundancy and make products that are faster and more efficient. That's the high level, obviously, that goes into a lot of places, but I think that's kind of like what brought us

Starting point is 00:02:48 to work on FCO and kind of what we're excited about. And was there some catalyst or some change you saw that you're like, hey, this is the moment to go do this? Because incremental core engine, I mean, we've been talking about them for some time, but what was it that said, hey, this is the thing I'm seeing that's like, this is the problem I gotta solve today versus like, you know, a problem that needs to be solved five years from now?

Starting point is 00:03:08 That's a great question. And obviously, we are not the ones who invented the concept of incremental processing. That's something that has been around for a very, very long time. And the concept of, let's save the results. If, for example, in the salary example, a new record gets added, just let's do plus to a previous sum.

Starting point is 00:03:27 But I think kind of when looking historically on the evolution of this, I think there's like kind of two interesting trends to look on. First is just the theoretical trend, where basically sum of salaries is a really, really easy example, but probably most slow queries are probably the most complex. And up until the last couple of years, just from a theoretical perspective, being able to incrementally maintain these thousands of lines of SQL queries efficiently, correctly, and most importantly in scale was kind of like impossible.

Starting point is 00:04:00 When you look on like, for example, the implementation of MSSQL incremental views or a thousand other attempts to kind of create these incremental technologies in the past, When you look on, for example, There have been a lot of really amazing advancements in the theory world, articles like Differential Data Flow, with real scale. And I think that kind of like was what we saw and got excited about and was interested in implementing. And alongside that, I think another thing was kind of when looking on the advancements of integration and things like that, I think like we saw a lot of these technologies evolve and these theories become more mature.

Starting point is 00:05:02 But when we kind of look on like why companies are not using it and why we're asking ourselves, why are we not using it? become more mature. combination of great replication tools, companies like PureDB, 5Tran that do a lot of great things on making replication really easy and integration together with these theoretical advancements are a real opportunity now to make an incremental core engine that works and is easy to use. Yeah, I'm very curious because you mentioned that differential data flow even DBSP's these are not brand new I think the concept has actually been there for some time and we've seen Databases, I think back and it was materialized and there's some other companies and building almost like this incremental materialized view

Starting point is 00:06:00 Supports basically but just curious like what you saw That the market really, really needed because like I said, I don't think this is new. There's some other products that almost offer something in this in the middle, but what is like the biggest gap? Like, okay, even though there are other databases that sort of do sister incremental view supports, what is FCO sort of brings that is actually not that easy for other databases to achieve right now? Yeah, so I think first like from the theoretical perspective, I think that

Starting point is 00:06:31 like a lot of times people talk about like when theoretical articles were published and like academia world is really slow and I think it takes a lot of times to look on like some of the LLM theories that kind of push the advancement. Sometimes it takes a lot of time to kind of trickle down to the market. But other than that, I think that the way that we kind of look uniquely on the problem is that we start from the batch processing world. I think a lot of these technologies like Materialize, like DBSP, more come to tackle and replace stream processors originally.

Starting point is 00:07:03 And we kind of focused on the problem of slow database queries. And I think that's kind of like where stream processing originally started from. You had a batch processor and things became slow, so you slowly needed to use and build stream processors. And we really try to give the same UI and UX for the users like the existing database

Starting point is 00:07:25 that they have. We integrate natively into Postgres, MySQL database that they already have. And they could just, instead of in the existing database do create materialized view, they call FCO.createView. And for just slow queries. And most of our customers don't replace us with stream processors or microservices

Starting point is 00:07:44 or just things like that. They have slow queries that they probably would have thought about maybe in the future building stream processors for, but right now it's within that context, if that makes sense. It'll be great to maybe give maybe an example of what your difference you think from a batch or slow query versus a stream processor. Because I feel like even the example you mentioned earlier for the salary thing, obviously there's a recomputation

Starting point is 00:08:11 of the whole query that makes it slow. So you're able to only do the incremental changes. But those incremental changes that's happening at the table seems like a stream processing as well because there are data coming in almost like a stream, right? You know, you would actually process it as a stream. But you're mentioning something a bit more, is it a data volume here?

Starting point is 00:08:31 Is it a number of queries or a number of nodes that can be able to execute? Like, what makes yours more slow query batch processing oriented for some other databases or more stream processing oriented? I think it's all about kind of UI, UX, and kind of the wrapping and kind of like what use cases you try to focus on. And like, I think specifically, like the way we see it is that the UI and UX and ease of use of batch processing

Starting point is 00:09:00 is much easier for a lot of companies. And they don't want to think in stream processing ways. They don't want to push data to a Kafka topic. They don't want to care about the BZM and then JDBC to sync it back and weird things like that. So like, I think that the thing is that we try to abstract that away as much as possible. And I 100% agree with you.

Starting point is 00:09:23 We're doing stream processing internally, and we're using stream processing concepts. We just try to abstract that away from the user and kind of like build a stream processor that looks like a batch processor. Because batch processing is much more intuitive, much more easier to understand, I think, in many scenarios, if that makes sense. So we're 100% a stream processor, but we kind of believe that the reason stream processors perhaps are not as widely used as they should be is because a lot of people don't want to think

Starting point is 00:09:53 in a stream processing world, and we try to abstract that away. And so what's the experience building that's differentiated if you kind of abstract these complexes away, but as a developer, someone trying to build a streaming system, like how do you make it simple for me? Like, help me understand like, why is this simple and better and easier and accessible because one of the biggest challenges to date with stream processing

Starting point is 00:10:16 in general, like the most used system is Flink. Like it's like, it ain't easy, right? It ain't easy to hang out, use Flink, scale it, deploy it, even rationalize about how the system actually works. It's like this weird inversion of the way you program. It's not imperative. So, help us understand what is it that Epsilon is doing to make it approachable, consumable, and scalable, both in terms of technological data volume, speed component of that.

Starting point is 00:10:46 But I think the scalability part that's always one of the real issues with streaming stuff is because it takes such a level of expertise about a production streaming system with say a flank or existing stuff, like nobody builds streaming stuff unless it has to absolutely and can only be streaming for the use case. So I'll give a couple of examples. First of all, from like the integration part, like the way you would deploy EPSIO, it's a single component you run within your environment. You give it the connection details of your database

Starting point is 00:11:17 and a dedicated schema that it will create its stored procedure there. From that point onward, you install EPSIO, one-liner install, you give it permissions to your database. To create a streaming query or streaming calculation, all you have to do is call a stored procedure within the existing database, give it the SQL of the query, and the result of that would be materialized into a result table that sits within your database. To give the parallel in a Flink world, you do not want me to spin up the Visium, then Kafka, then Flink, then Kafka again, and JDBC2Sync, and also

Starting point is 00:11:53 probably Avro and a couple of other fun stuff to kind of make sure everything plays together. So first of all, just from the amount of components and moving parts that you need to think about, it's much lower. More fundamentally, I think, kind of like if you think about doing like an end-to-end stream processor, there's a lot of concepts that you can abstract away from the user if you're building it end-to-end.

Starting point is 00:12:16 For example, the concept of a transaction. When you're writing streaming transformations in Flink, you're probably using the BZM to consume the changes, you're pushing the changes to a Kafka topic, each table to a different topic, you have a Flink processor and then you output the results. Somewhere along the way in the Debezium part, you're losing the concept of a transaction.

Starting point is 00:12:36 So for example, if in a single transaction in the source database, you're deleting a record and inserting a record in a different category, that's something that very easily can get lost within the process and you need to think about that a lot when you're building a good stream processor. For example, that's one of the things that we're abstracting away. We're taking care of transactions end to end, meaning we have a very strong guarantee that each transaction that you perform in your source database will translate into a single transaction in the sync database alongside ordering, of course, and kind of like abstracting way that you have a database, you think in transactions.

Starting point is 00:13:14 So you don't need to think about like the ordering of topics in your Kafka things internal. These are not things that you want to care about. What are some like common use cases where people are saying, hey, you know what, you really like unlocked something I could have done before. Tim and I, and many people have done anything in data, have talked a lot of like streaming tech companies. There's many, many, many streaming tech companies that have come along. And all of them have their own little wedge, a little position of what they do.

Starting point is 00:13:43 I got a curious like, what is the sort of place you're broadening out for people that enables people to do something like they couldn't before? Like where you find that, that traction, do you have like a narrowed use case that actually you land here and then because you unlock something from somebody and then you can kind of take over the world or what's, what's it look? Yeah. So I think originally like most initial use cases are like the classical like customer facing analytics.

Starting point is 00:14:08 We have some fraud detection use cases and things like that. My differentiating is that streaming is not a goal. It's a tool to reach a goal of like having a heavy transformation, having like some heavy calculation. And I think it's honestly really exciting. Most companies try with the classic dashboard analytics fraud detection use cases. At the end of the day, if you have a lot of data, you're running repetitive queries, that's a pretty generic and pretty wide thing that you can build a lot of stuff on top.

Starting point is 00:14:41 You can do it. That's obviously relevant for BI. That's relevant for DBT models that you don't want to recalculate the same thing from scratch. That's really important for cost saving also. So there's kind of what we see usually as the initial pain point of, hey, I try to squeeze my database and just can't. But after that, I think just thinking about things

Starting point is 00:15:03 incrementally and not in batch, that can evolve into a lot of exciting and interesting places. Even data replication. For example, we have a customer who started, again, from a customer-facing application, but then they have an elastic database that they want to replicate their data, but they want to kind of flatten all the tables. So they just suddenly realized that it could be super easy to basically join all the tables and then write that into Elastic

Starting point is 00:15:29 and kind of replicate things, not in a dummy way of just copying things from one place to another, but kind of enriching it and moving. And since the UI is as simple as a materialized view, suddenly that's also relevant because it's already there. It's just defining another query. It's call FCO.createView, do the join

Starting point is 00:15:48 and dump the result of that into Elastic. You know, Postgres of the world, since you're building sort of like this incremental query engine on top of existing databases, these databases are typically, even though you can shard, you could, you know, do different kind of replication. The main execution of the writes and stuff, it's just really just single node, right? But the flink of the worlds are designed to be, hey, run any number of machines, right?

Starting point is 00:16:14 There is a state to it, but there's also like the way it builds, you know, it's not infinite scalable but like much more scalable than one single node. And it's kind of funny that you're doing string processing with both... I saw that in the document, there's a statefulness and a stateless, there's this idea that we aren't able to execute a variety of different results here. It's the assumption that all the data that you'll be handling is going to be fit in a single node anyways? Or you also need to be able to handle it at a scalable way somehow?

Starting point is 00:16:50 Because it's really hard to tell based on the information of your deployment and stuff. It doesn't seem like this is like an infinite, scalable, elastic type of execution engine, which probably doesn't need to be. But I'm just trying to understand, where do you see your engine fit? Is it the confront of like, okay, everything that fits in a single node is good for us.

Starting point is 00:17:11 If you have to be scale outs, go use, you know, I don't know, Cockroach or something else. And those people will handle their own replication and you're just kind of within this single node world or maybe tell me more if that's the right way to think about it. So definitely not in the, in the single world and a single node world? Or maybe tell me more if that's the right way to think about it. So definitely not in the in the single world and a single node world. And we're actually starting work on a cockroach integration because we got a couple of polls from from companies using cockroach

Starting point is 00:17:35 and kind of reached the scale limit of what obviously a single Postgres instance can offer. Today, we support like only you can't take a single view and spread it across multiple EPCO instances. That's definitely something that we will have in the future and kind of like already built the structures to allow that. Companies that do have large scale what they do is they just like separate logic into many separate views and then just separate that across many instances and kind of like shard. You can think of it across many instances.

Starting point is 00:18:05 And kind of like Shard, we can think of it across many EPCOs. In the future, that's something we're definitely going to go into, of like having a cluster of a thousand EPCOs writing to a thousand Postgreses. But currently it's one EPCO view per EPCO instance, do support sources and syncs that don't have to be a single Postgres instance. We also have customers who have many Postgres shard themselves instances, and then they use FCO, for example, to read from 10 Postgreses, aggregate it, and dump it to an 11th Postgres, if that makes sense. Actually, you bring up a really good point. I want to ask you about this.

Starting point is 00:18:43 Given that your customers are either running Postgres themselves or using some kind of RDSs or even mentioned cockroaches and there's a few other like scale out databases out there. I feel like what the vendor markets versus what the actual customers usage and how they actually deploy and operate. There's usually a big mismatch, you know, how they actually run their operate, there's usually a big mismatch, you know? How they actually run their database is always a hilarious thing.

Starting point is 00:19:09 What's sort of like the most common anti-pattern or bottleneck you see people running with your popular database like MySQL or Postgres? What is the thing that they always keep running into that you have to almost educate them or just help them in a way. Yeah, that's an interesting question. I think obviously there's a million small parameters and things like that that you probably want to tune. Thinking about your schemas,

Starting point is 00:19:38 I think a lot of people don't understand the performance implication of building the correct schema. Like from stupled things like JSON versus JSONB is your primary key a UUID or an integer? And a lot of things that I think people kind of like separate between the schema layer and the performance layer. So that's one thing I see kind of happen a lot. But other than that, I don't know. I think logical replication, specifically in Postgres,

Starting point is 00:20:11 is kind of something we saw a lot of people fall on. And I think some companies had a lot of issues with it that we needed a lot of times to help kind of like fine tune and make sure that it works correctly. I have a lot of some criticism and thoughts about the way logical replication works in Postgres. So that's maybe one place I think people sometimes fall on. Maybe let's also, I want to touch on the future database,

Starting point is 00:20:37 like the new SQL type, we're not making new SQL, the cockroaches. Oh, oh, oh. I just want to like, because you know, I think we haven't really hear or see people talk about the bottlenecks of these databases. Like people just assume they're just much more scalable. They can, you know, the reason you pay a separate database instance like this or

Starting point is 00:20:56 like the Spanner type, right? It's like you can scale forever. Um, but it sounds like it's not true. You have customers running into issues with those. But it sounds like it's not true. You have customers running into issues with those. So maybe enlighten us too, like what are the typical skill and issues you can even run into in this like skill outs,

Starting point is 00:21:12 everything will be great kind of database, you know. Yeah, okay, so I misunderstood the question, but I think over there, like it's specifically, and like I think I remember also reading the documentation in Cockroach, like these databases like are not meant to aggregate data from separate places. They're really good at distributing the data and having a lot of simple selects. But if you want to now aggregate, join, and do a lot of complex aggregations on top of

Starting point is 00:21:37 separate shards or whatever you call that, they're not solving that problem. They're meant to solve the problem of a lot of writes and simple selects. They're less performant in complex aggregations, joins, across wide variety of data. And that's fine. I don't think that's, they're like an OLTP engine, and they're not meant to do a lot of heavy cross-shards

Starting point is 00:22:03 aggregations. Just to give an example, some customers have like, literally they want to aggregate data in different geo locations. So just network-wise, just to run a query to aggregate all this stuff can take time and you probably want to pre-calculate that a lot of the times.

Starting point is 00:22:21 What do you think the future is? I've been in San Francisco for the last couple of weeks playing the West Coast, East Coast flight game. On the ground, one of the things we're talking a lot about is what's the future of development, what's the future of programming, what's the future interfaces that people interact with. I'm very interested to understand on one side, our data ecosystem is an incredible legacy. Once data is in a system, that system is there for almost forever.

Starting point is 00:22:50 Migrating data to a different database or to a different thing, it's like there's so much inherent. The system you build around it is so inherent to how that data is stored and organized, and the inherent properties of the system that used to manage data that it makes changing incredibly difficult. And I'm kind of curious, on one side you have that aspect, on the other side you have all of these new coding tools.

Starting point is 00:23:12 What do you think the future of data ecosystems look like? Are still good people going to adopt these large giant databases? Is something going to be different? And then also what's the future of how you interact and build ecosystems? We're kind of curious to get your perspective on what the future UI UX looks like

Starting point is 00:23:31 and how the data world changes with the rise of LLMs being the entry point for a lot of developers to code bases. Yeah, so I think on the data layer at least, I think that where the layer at least, I think that kind of like where the goal is kind of more going into like, I don't know if to call it distributed, but kind of like a lot of components that know how to play together well. Like I think in a world where it's really hard,

Starting point is 00:23:56 as you said, to like move, migrate to database, we have one of our customers has a set of quote that I really love that migrating a database is like changing a car's engine while driving. I couldn't agree more. It's really hard to move databases, like change the basic structures of your data when the company is moving. And I think that's why kind of like the market is more going towards like kind of an ecosystem of databases that talk together and integrate. I think clickouts, for example, are doing a lot of great things there for

Starting point is 00:24:27 the replication, for example, from Postgres to Clickhouse. They have a lot of databases that kind of know how to play together very nice. So on the data layer, I think it's definitely going there where you have a lot of components that integrate and play together. And then on top of that, like the coring LLM tools that also need like a better interface and more generic interface and kind of like a consolidation of things that these upper systems can talk with.

Starting point is 00:24:57 And also they're like part of the reasons we try to be Postgres compatible or MySQL compatible when we're in MySQL exactly for these upper systems to kind of have like a Single language that they can talk with these lower systems and then the lower systems talk and distribute and play together together and Do you think like systems like Epsio will play like a big part for dealing with the sort of one? You are solving the data movement problem today Which is like how do I get data from one sync

Starting point is 00:25:25 to another sync to another sync to another sync and sync and source? But do you think developers will be interacting directly with these data systems, or do you think that's actually gonna be like delegated to some LLM or what do you think? So I think it might be an LLM in the future. I think, funnily enough, like originally,

Starting point is 00:25:41 when we started Epsio, we thought about automating the views which we materialize and kind of ourselves choosing which queries you want to have incremental and which not. And we quickly understood that that's something really hard to do. Because other than performance, there's a lot of business logic injected into that. For example, you probably want your buy button to run as fast as possible and two seconds buy button is probably too slow, but delete my account that's probably a button you don't care at all that takes a lot of time. So we kind of understood with the companies that they really know well which parts

Starting point is 00:26:20 are important and which not and that's something really hard to kind of abstract away from the user. So I'm hopeful regarding the long run of LLMs and kind of the state of AI to better understand that. But I still think there's like a long way until like it can automatically understand the business implication of speed up in each place. You know, this is curious because it's hearing you talking about the architecture and your

Starting point is 00:26:45 customers and being an investor ads, I just always keep having this question in my mind because my own startup was also trying to do performance optimization for containers, just not for database queries, but just in containers in general. And it's so hard to tell people your stuff sucks, You know? And also like, even though they know, they may not want to buy it right now. Like there's almost like a timing thing where like, I am so underwhelmed or I feel like I have no idea how to do things.

Starting point is 00:27:17 It's not like that common actually to find people like, oh, I definitely have to go buy a solution to speed things up. They always feel like this is like the last resort of last resorts. And I'm just curious, like, how do you get over this hump of like, not just people interested in this problem, because people are always interested in this problem, but getting to like actually say like, you know, I want this now is such a difficult task. I don't know if you learn anything trying to do this because it's still a PTSD in my

Starting point is 00:27:44 mind, you know, something any performance related, it's just like, it's like, they love to debate technical stuff with you, but like actually wanting to buy there's a risk involved and there's an ego involved, you know, both sometimes. Yeah. I a hundred percent agree. And I think one of the big realizations, like I talk a lot about performance, but actually 100% agree. five times faster. questions that I ask and that we talk with companies that are interested in using us is not how slow things are. Because that's interesting, but that's like not a hats on fire. But like how much blood, sweat and tears did your engineers already do or are planning to do to make these things fast?

Starting point is 00:28:58 And then you have like the big value propositions. We work with companies who like it's not's not, obviously not uncommon where you have five engineers working on, like, re-hauling the database infrastructure because everything's on fire. You have, like, obviously a lot of engineering time goes, and I think, following your point, like, I talk about performance, but, like, the real value proposition is how easy it is to reach the performance you have already and make changes, Because a lot of companies build a pile of patches and then each time they want to change one small tile in the dashboard, it's like a month of development.

Starting point is 00:29:32 Even if they already did all the caching, stream processing and things like that. So I 100% agree and the value proposition is R&D time and velocity. And that's how we measure ourselves. That's why how we discover. And if you have a query that takes an hour and it's not painful enough for you to put an engineer

Starting point is 00:29:53 on that, that's totally fine. We're probably not valuable enough for there. And you know, the whole world is buzzing AI that we just talked to just briefly about. There's a whole category of AISREs now. We're like, I'm going to go look at your logs, and I'm going to go fix things for you. Or I'm going to even try to tell you exactly what to do to fix things.

Starting point is 00:30:13 And they're pretty umbrella messaging, like anything. I'm not sure, you're in a database site, so you'd probably say this is not going to happen. But I'm just curious, just given that you are trying to fix people's problems, and you are building a product like this, do you see yourself that AI is enabling things to go faster, maybe because there is an engine that you built already, there's maybe a faster way that AI can help you get bridged into your product? Or maybe down the line,

Starting point is 00:30:41 there is an opportunity for AI to look at database logs and schemas and just even do the fine tuning, the DBA type work. I've actually seen AI DBAs being proposed quite a few times. Do you think that's real in the short term? There will be an AI DBA of some sorts? I think the recommendation is kind of something I believe a lot in. I think there's like the taking action

Starting point is 00:31:08 and then there's a recommendation. And I think like when I talk later about like a AI agent automatically creating FCO views, that's something I see a long time until I personally believe that's there. But like recommending which views, like analyzing the query history, seeing for example, I don't know, looking at the screen recording,

Starting point is 00:31:30 seeing where there's rage clicks, understanding which queries happen there, and kind of like suggest that automatically. I see like a lot of places that like, EPCO could give toolings to MLM's to do better suggestions. I do just think that like, to give toolings to LLMs to do better suggestions. I do just think that I came from a cybersecurity company before and we always talked there about visibility always

Starting point is 00:31:55 comes first before taking action. And I think that we still need to build a layer of visibility of LLMs that can give you visibility and recommend. If we do that well, then great. We can also apply policy, but that's always like the first step. And I don't think we're there yet. Like the AI DBA, like I think still on recommendation side, there's still some, some work to be done to provide the visibility. Awesome. Well, I want to move on to what we call the spicy future.

Starting point is 00:32:26 Spicy future. Tell us what's your spicy hot take, maybe in the info world database, or whatever world you want to be declaring here. What's your spicy hot take of the info stuff? Yeah, I think it kind of connects to stuff I already talked about. So that might not be fair, but I really think UI and UX is underrated in the database space. And I think the fact that you don't have a pretty screen doesn't mean that the experience

Starting point is 00:32:54 of using it doesn't matter. And I think that's something I live and believe every day. UX matters even if you don't have a pretty screen. What's an example of a shitty UX for database? Or like, what's a scene that you're seeing that makes you feel like this is a hot take? I think like, obviously there's like the flinks and things like that that obviously don't expose a nice interface. I think generally there's a lot of specific examples that I don't know, in MySQL there's some pretty weird things sometimes that we come across.

Starting point is 00:33:30 Postgres replication, for example, I'm not sure that that's a great interface I would expose to a user. I can give a lot of specific examples, but I just think about the prettiness of the experience is something worth focusing more on. And it's very easy, like developers like configuration and they can dive deep into things. But that doesn't mean you have to.

Starting point is 00:33:54 Yeah, I think this is what I feel like. Because you know, I worked on databases a little bit and and work with a lot of friends are working on databases. And especially like Postgres or worlds have been loved for like 30, 40 years or so long now. Most of the maintainers have been working on for like 10, 20. It's like the kernel, you know, sort of crowd and their best UX is what they knew. And they also don't want to change things.

Starting point is 00:34:19 Well, we always done it this way, right? We always added one more flag. We always just add another release notes, you know? And so just, you just keep doing that. And so I feel, I don't know if you see we always done it this way, right? have no incentive to change it because they have paid product. Yeah, that's a problem. Like why? Why should I make all my other things so much better when I can just like spice up my data breaks notebooks, right? But to make it harder.

Starting point is 00:34:52 Yeah, actually it is incentivizing them to make it harder. They are actually prioritizing open source a lot now, just or like completely just not touching it pretty much. You know, it is sort of the sad reality of what it is, but I guess it's also a necessity for them to make money. So it's almost like it's very hard to find people in this space that actually have the motivation to do it. It's actually very hard to have great UX if you only touch on one tiny part of the system, right? You know, to be like in control in different parts and able to like herd the cats and, you know,

Starting point is 00:35:30 I don't know, it's just very difficult to actually see that happen in this like open source, widely popular database world, you know, unless there's like a crazy guy like Linus or something like really like, yeah, I just care about this and you guys all can just F off. It's really difficult. Somebody has to be like a dictator. I think that's a great example and I'm a big fan of Linus Torvan. I think he's a great example of somebody who's obsessed with interface. I agree on that. I think if there's alignment, there's a lot of alignment issues in the open source and data world. I think it's really hard to build good alignment in these projects to make sure that it's encouraged

Starting point is 00:36:09 to build a product that's not harder to use. Love to understand from your perspective, what do you think the future of data interop is actually? You were talking about standards, but is this actually a solvable problem? Does know, does AI solve this problem in a world where like you can potentially like turn this into like a machine learning problem set instead of a everyone has to go right interop. The fundamental problem with today is we have all these different standards and all these

Starting point is 00:36:37 different formats and all these different systems, all these different characteristics. So we ultimately end up with transformation layers, right? And I guess the question is, do we actually care about standards in the future? Or is this something we'll use in LLM to write all the interop? And so it's like, the standards actually really matter anymore, because like the systems can just like figure out, build interoperable transformation layers that that just work. So that's an interesting question. Personally, I think that like, So that's an interesting question. Personally, I think that a well-defined interface would still be necessary. So you'd still want the LLM to have a very defined way of doing specific things.

Starting point is 00:37:14 And even if it's an LLM, you still like data is a very precise matter, and you want to be very precise about which action does. It is possible, though, that the interface will change, that you'll want a separate interface. Maybe SQL is not the best thing if you're an LLM. Maybe you want some other interface to expose to there that's still precise but is more beneficial for LLMs. Having said that, I still think that well-defined interface, even in an autonomous

Starting point is 00:37:46 world, is still beneficial because then you can have good separation. That the LLM doesn't care about the internals and the internals don't care about the LLM. And you can only achieve that separation with a very strong wall with very defined entrances and exits. Awesome. Well, I think there's so much we could probably dive into this whole data spaces, but for the sake of time and stuff, what is the way for folks to learn more about EBCO? Like, do you have social channels?

Starting point is 00:38:18 Do you have any places people can find you? Like what's the websites or social channels we should be shouting out for? Yeah, so we have like a like our website, obviously, where we have also like a newsletter that you're more than welcome to sign up and look for news. We're mostly active on LinkedIn, so you're welcome, obviously, to follow us there. And more than that, like I said, we believe in ease of use and we're willing to stand behind it. So the easiest way to learn is just try it and see how it works.

Starting point is 00:38:47 Try to create incremental views, see what works and what's not. And also obviously the documentation. But yeah. Awesome. Well, thanks, glass up being on our pod. And I'm sure lots of folks will love it. Thank you so much. It's been a pleasure.

Starting point is 00:39:01 Thank you. Thanks for having me.

The Infra Pod - Next-Gen Data Processing with Incremental Query Engines (Chat with Gilad from Epsio)

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.