Drill to Detail - Drill to Detail Ep.111 'Making Analytics Fun, Frictionless and Ducking Awesome' with Special Guest Jordan Tigani

Starting point is 00:00:00 And will we ever see you in that duck outfit again? A company with a name like Mother Duck, we either had to sort of pretend like we weren't really a duck-based company or do the opposite and just sort of embrace the duck. And so that's part of our core company values is embrace the duck. Hello and welcome back to the Drill to Detail podcast and I'm your host Mark Whitman. So I'm joined today all the way from Seattle by returning guest and friend of the show, Jordan Ticchiardi. Jordan, welcome back and great to have you on the show again. Thanks Mark, great to be here again. This is one of my favorite podcasts. I'm really

Starting point is 00:00:48 excited to get a chance to chat. Excellent. Excellent. So Jordan, so you were on the show before and you came on in your, I suppose, in your BigQuery and Google kind of, I suppose, kind of role. But just tell us a bit about yourself really, sort of how you got into the industry and how you got to the point now where you are at where you are. Sure. So yeah, I started at Google in like 2008 or 2009. And it was kind of the beginning of, you know, Google Cloud didn't quite exist yet. But we had a number of tools internally that we thought were pretty interesting and wanted to share with the world. And so yeah, we ended up building, you know, this, this thing called called BigQuery, this, you know, database has a database,

Starting point is 00:01:30 a service data analytics as a service. And, you know, I stuck with it for you know, tenaciously for, for, for a decade. And I think we built some pretty cool, some pretty cool things there. And I think we built some pretty cool things there. And I turned from, I had been sort of like a compiler person or operating system person. So I ended up as a database person. And databases are pretty fun. So that was, I think, a good move. And I went to single store for a couple of years

Starting point is 00:02:05 and kind of saw things outside of Google. Not every company works like Google, and that was sort of something important to learn and then struck out on my own about a year ago to help start a company called Motherduck. Exactly. So Motherduck, I mean, really interesting, really topical company and a great story as well for you to go through with you now.

Starting point is 00:02:29 So we had a great episode, actually, when you talked about BigQuery. And it was certainly a journey you went on there, building out, I suppose, the ultimate in large-scale data warehousing and cloud data warehouses. But you went to a single store. And without joining too long on it, what were you doing at single store? And what was the kind of the role and the product you're working on there? Sure. So I was a chief product officer. And so I had, you know, I spent a lot of my career as an engineer and an engineering manager and then switched to

Starting point is 00:02:57 product management. And, you know, I think my brain still works like an engineer. So I'd love to talk about engineering problems and design, design things. But also kind of after being a product manager, you kind of realize that there's a lot more to the world and how things actually work than just sort of building great technology. And so single store gave me the opportunity to kind of do both. So I was the chief product officer, you know, in charge of the engineering and product side of things. And, uh, and so that was really interesting because they had great, you know, amazing technology at single store. And I thought I could use a little bit of product focus. They were kind of in the process of moving to cloud.

Starting point is 00:03:34 And I thought, hey, I know cloud and cloud analytics. So I had a great time, you know, helping them sort of reboot their cloud, their cloud system. You know, they have, you know, been able to build something that most people say is impossible, which is an analytics and transactional database that actually work really well together in one in one package. And and so that was that was fun. That was a great, great place to be for for a couple of years. But then I kind of decided that, you know, to build my own thing, something that I believe in. I mean, not that I didn't believe in single store, but something that I can create and focus on innovation rather than just taking something else and making it bigger. I'm sort of more interested in the earlier stages.

Starting point is 00:04:28 Okay. So give us a high-level overview then of what MotherDuck is then. And we'll get into a lot more detail as we go on about the specifics about it. But what is MotherDuck and what's your role there? Sure. So it's an analytical database in the cloud. So it's like a serverless analytics database, but it's based on DuckDB, an open source project created just a really simple, really fast, low latency, easy to use, easy to use database for analytics in the cloud and really targeting use cases like DuckDB because, you know, there's a lot of people using DuckDB. They did, I think, 2 million downloads last month. And, you know, a lot of those people want something that's hosted and it's running in the cloud.

Starting point is 00:05:42 Okay. So it's been quite a journey you've been on, really, from BigQuery through sort of single store and now to MotherDuck. So let's kind of, I suppose, to understand that journey and understand the context, let's take a sort of step back then, really, to I suppose the historical context, really, of what we're talking about. So the evolution and future of data warehouses. I know you spoke about this a bit in the past. So maybe let's go back to the days of maybe when I first –

Starting point is 00:06:07 I have to think about this. So maybe sort of like 10 years ago or so, when data warehouse servers were on-premises, okay, and you had appliances and so on. And that moved to the cloud, really. That moved to sort of things like BigQuery. So just talk about maybe that time, that period, and the impact that cloud had on that world of on-premises appliances and so on.

Starting point is 00:06:27 Sure. that, you know, these were really hardware plus software packages, you know, that was available in, you know, in the sort of on-prem era. And when you move to the cloud, the constraints are different. And I think the best of the cloud, the cloud data warehouses really embraced what became, you know, was only possible in the cloud. And I kind of would put BigQuery and Snowflake in that category as, you know, what are the things you can only do in the cloud, really kind of leaning into separation of storage and compute and elastic computing and multi-tenancy. And, you know, and we're able to kind of, I think, address kind of broader problems than just sort of pure data warehousing problems. And I think that's one of the reasons why everybody went from calling themselves cloud data warehouses to calling themselves like cloud data platforms, because it turns out that like, hey, you know, once you kind of move to the cloud, you can address a broader, a broader section of needs, rather than just, you know, what we had called what we had called data warehousing.

Starting point is 00:07:51 Okay. So a lot of these, certainly when we spoke about BigQuery back on the previous episode, there was a lot of the roots of services like BigQuery were in, I suppose, the Hadoop and Spark and distributed computing world. And I suppose there were, I suppose, there were benefits in that and there were trade-offs and there were costs and so on. So as workloads like that moved into the cloud, what was they being optimized for? And what were you gaining and what were you losing, really, from that move into the cloud in terms of performance

Starting point is 00:08:20 and the way those things worked and the kind of use cases they were aimed at? So I think one of the things you just you gained was you know this sort of decoupling from having to know know about or care about what the hardware environment was like you know you basically outsourced the hardware to somebody else but also the running of that hardware, the running of the software, and even the architecture. And that really enabled much larger scale, much faster performance, the kinds of things that would have been incredibly difficult to set up in your own environment, you know, the public clouds had a lot of this hardware sitting around.

Starting point is 00:09:09 And, you know, multi-tenancy could mean that, you know, they could actually profitably run a system that, you know, they could handle scaling up and scaling down. And whereas if you're running things on, on prem, you would, you know, you'd have to basically provision for the peak load of what you, of what you needed. Of course, the thing is,

Starting point is 00:09:33 the thing you give up is sort of control. You know, you don't know when upgrades are being done. It's a lot harder to understand what the, what the availability characteristics are going to be, the durability characteristics. It's sort of easy to understand, okay, what's the RPO and RTO failover situations of my box that sits in my data center versus something that's spread over millions of, in my data center, you know, versus something that's, you know, spread over millions of disks and multiple data centers and, and, you know, what kind of things, what kind of

Starting point is 00:10:12 things can go wrong. There's also just because these distributed large scale distributed systems are so, are so complex. The, you know, it's very hard to make them as consistent as a, as a local, as a local system, you know, local system is like, Hey, this query is usually, uh, 500 milliseconds. Um, uh, and it's almost always, you know, within, you know, a few milliseconds of that, but in the cloud, there's all sorts of tail latency and things that can, things that can happen. And it might be mostly 500 milliseconds, but sometimes it will be five seconds. So I think that's also one of the things that people had a hard time wrapping their heads around. And I think, you know, you mentioned that, you know, things sort of come from the, you know, there was this Hadoop lineage and this large scale distributed system. And I think,

Starting point is 00:11:01 you know, we kind of rapidly changed the underlying architecture and had to then like bolt on some of the familiar pieces like transactionality that took a long time to sort of this thing where you had to change how you're working with how you're thinking about your data problems versus being able to give you the old familiar thing just with a new architecture underneath but it took a long time to sort of make it so that it you know these you know these these systems were good enough that they really felt like they really felt like they, you know, your old, your, your, your Teradata instance, like Teradata is still an amazing and amazing data warehouse. And in many ways is well ahead of what you can do in, in think it just has some, you know, architectural limitations just by being, you know, kind of built on sort of a like, you know, legacy system on prem, even though they're they are moving to cloud. They weren't really sort of designed with cloud, cloud in mind. And I think that makes that makes it a lot harder for them to take advantage of some of the things you can only do in the cloud.

Starting point is 00:12:27 Okay. So I suppose another observational sort of thought you have now looking back at that time is there was a lot of kind of interest and thought around this concept of big data. And things like BigQuery and Snowflake and so on, they're all designed for big data workloads. And I must admit, in the work I've been doing, it's very rare to get a data set that is anything more than a few terabytes in size, really. And most workloads we see are structured and they are not, well, maybe big data is now all data because it's been easier to handle. But how much is big data a thing you think now?

Starting point is 00:13:05 And how much of it has been normalized and how much of it is probably overkill? What do you think? Yeah, I think there's, you know, when the people started to sort of hyperventilate about big data and sort of design these new systems, there are certainly people that have giant amounts of data. And I think there was this assumption that, okay, well, if Google has

Starting point is 00:13:28 big data and Facebook has big data and these other, you know, giant banks have big data, then everybody's going to have big data too. And there were also a handful of even small organizations that had giant amounts of data. But that really turned out not to be the case. I mean, just the experiences that people have, it's like, you know, hey, like I have a, you know, a SaaS business and the SaaS business has a bunch of customers and, you know, they're doing really well and they have a few gigabytes of data. And so I think that, you know, we kind of have designed these systems to really be kind of overkill for this amounts of data that the vast majority

Starting point is 00:14:08 of people out there. And I think when your design point is orders of magnitude, many orders of magnitude off of where your customer base is, you just, you focus on different things. I mean, so for example, you know, the majority of data warehouses, actually, I think all the data warehouses out there are really focused on throughput rather than latency. In other words, like to process as much data in as short amount of time as possible, rather than getting the answer as fast as possible. And it's a somewhat of a subtle difference. But one of the things that that tends to do is it tends to trade off

Starting point is 00:14:50 kind of the latency, the startup time, the time to get the data out so that it can run faster. And I think that's sort of what you see in these sort of large-scale distributed systems. There's a lot of coordination overhead, you know, that you basically have, you know, if you have 16 servers in your kind of your Snowflake cluster, you know, they have to all talk to each other, shuffle data in between, you know, BigQuery, you know, by default runs with 2,000 workers, so your queries can be spread across 2,000 workers. Even if you do that really well, which I think BigQuery does, there's going to be a bunch of overhead. But they're willing to trade off that overhead, which you feel in terms of latency and how long your query takes, in exchange for being able to scale it out to very, very large, large sizes.

Starting point is 00:15:47 And I think what's, you know, where that's become sort of problematic is that when you have a human being waiting for the result, like they really care about latency. And so that's why I think kind of some opportunities have been missed is because, you know, in this focus on, on making things as, you know, as big as possible, you know, the,

Starting point is 00:16:13 the kind of the end user is, is left out. Especially, especially, especially as you've got laptops now, I mean, I've got a Mac book recently upgraded and you get like the chips on that. The arm chips are so fast.

Starting point is 00:16:25 So you've got all this kit sitting in front of you that is not being used. And I mean, how does that affect the equation, do you think, the ability to run such work, to run workloads on your laptop that are so fast, they're just not being used? Yeah, I think it changes the potential dramatically. I mean, it used to be that, you know, when we would say,

Starting point is 00:16:46 so let's run this on your laptop, like it was almost as a joke. It's sort of like, oh, this underpowered, like, you know, underpowered thing that, you know, can't possibly, you know, crunch much, much, much data. But now, you know, I think, you know, George Frazier, the CEO of Fivetran, youran, just did some benchmarking of running basically an analytics benchmark on his laptop versus one of the popular cloud data warehouse. And his laptop was faster. And yeah, it was a smallish data size, but it wasn't tiny.

Starting point is 00:17:23 But it wasn't tiny. And, but it wasn't really, wasn't really optimized. And it was, I think it's just one step towards what we're going to be able, we're going to be starting to see more of. And I think it's also a big, a big opportunity. a good example of where the tuning of, you know, for giant data sets actually has sort of hurt us because one of the mantras when we started BigQuery was, you know, okay, with large data, you want to move the compute to the data rather than the data to the compute, which is, you know, we wanted people to load their data into BigQuery, into our cloud, and then operate on the, you know,

Starting point is 00:18:04 compute while it's in the cloud rather than having them download the data and operate on it locally. And that's something that's a truism with large data sets. With smaller data sets, however, you want the data to be as close to the user as possible because that's how you get low latency. So if something is using DuckDB locally on your laptop, you can do queries sub millisecond. And so, yeah, I mean, yeah, human beings can't tell the difference between anything really that's sub 100 milliseconds. But this means you can do 100 queries in the the in the amount of time that, you know, it basically takes you to to blink.

Starting point is 00:18:49 And that's a fundamentally different interaction model that I think we've seen before. You can do sort of video game style, you know, 60 frames per second updates of your of your dashboards and there's lots of things that i think we haven't we haven't even you know we're just starting to see the possibilities of of what you can do with really low latency uh okay so so let's so duck db so you've been on this journey now going from bigquery to to uh maybe the memory part of bigquery uh set in single store and then you started mother duck but mother duck is based around this this database technology called duck db so maybe just for the for the listeners what is duck db right um why how is it how is it fundamentally different from the likes of my sql and sqlite and and then you know and then in high level what does mother duck add to duck db so first of all

Starting point is 00:19:42 what is duck db sure i think i mean one way of thinking about DuckDB is it's SQLite for analytics. It's this embedded, really lightweight database with no dependencies, but tuned. So in the database terms, SQLite is a row store. DuckDB is a column store. So DuckDB is really geared towards doing aggregations, understanding data rather than setting and updating individual rows of data, they talked to a bunch of data scientists and they're like, you know, data scientists like hated databases. And they said, well, why is that the case? And a lot of it was around sort of the usability. These are the general like end-to-end user experience of database of installing it, you know, dependency management, you know, keeping it up, you know, like.

Starting point is 00:20:46 And they said, well, why don't we build something that is just easier? And while pretty much every database, like, focuses on, you know, the time from when you get the query to you get the query, you know, the query result is ready, it turns out from an end user experience, that's only a small portion of actually what's happening. You know, there's, you know, there's how does the application actually send the data? How long does that take? And then when the query is done, how do you get the results out? How do you stream those results out?

Starting point is 00:21:23 Do they come through a, you know, a JDBC driver? All this stuff is like, like, I think Hannes, I think talked about it as like a burger is like, everybody focuses on like the, the, uh, the, the actual, the burger, the meat part. Um, but you know, nobody's really looking at the bun and the bun actually has a big impact on the overall, you know, overall, you know, experience of the, of the end, end user, or the end eater of the, of the hamburger. And, and so, and I think that's one of the things that people love about DuckDB is it's like, hey, it just, it just works. It's just easy.

Starting point is 00:22:00 And, and, you know, and I think they helped kick off, you know, their ideas from DuckDB helped kick off like the ADBC project and Arrow. And, you know, that's sort of the next, next, like, update of, you know, ODBC, JDBC targeted for targeted analytics using Arrow. But anyway, so I think they've built this really, really, really well-designed database. And it's also, you know, the other kind of exciting thing about it is it's so incredibly lightweight that it can run. It can run in your browser. You know, it's basically, you know, it needs, you know, tens of megabytes of RAM so you can run. It's easy to download, easy to, you know, it has no dependencies, so you can just install it.

Starting point is 00:22:47 And it's being used in all kinds of interesting applications. Right. So, but, okay, so that's interesting. But there's a danger that DuckDB could be like a novelty or like a skateboarding dog or something. So I think that's interesting. But, you know, running your database on your laptop, there's obviously limitations in that, okay? So what if you want to share what you're doing with other people?

Starting point is 00:23:11 What if you want to collaborate on something? And is that really where MotherDuck comes in? Because MotherDuck is about providing DuckDB as a managed service, okay? So how does that add value to the DuckDB experience? And how does that kind of, I suppose, fundamentally change the developer workflow? Yeah, so I think you mean that you've hit on kind of exactly what we think we can offer to the DuckDB user is some better scalability, somebody else to manage it for you, collaboration. DuckDB itself is certainly of, is not a data, it's certainly not a data warehouse. It's something that you can engine, you can use to understand and analyze your data,

Starting point is 00:23:52 but it has no concept of users. It has no concept of, you know, like of a, of a sort of a global catalog durability is something that you have to manage yourself. Like, you know, the, even kind of the semantics of being able to have multiple writers, like you can have multiple writers and the same time you have readers.

Starting point is 00:24:13 So there's a bunch of things that DuckDB doesn't necessarily offer. And then, you know, by offering a cloud service, you know, we can give you, you know, scale into the cloud, you know, and we can do, you know, sort of serverless, of serverless scale up, scale down as you need it. The other thing that we're doing, which I think is pretty unique, is hybrid execution. So the MotherDuck client is always DuckDB. So if you're running with MotherDuck and you're running from your laptop,

Starting point is 00:24:50 you'll have a local DuckDB on your laptop. And so that might be the DuckDB command line interface, but it might just be the web UI. The MotherDuck web UI has DuckDB running in Wasm in the web UI. And what our hybrid execution can do is it can kind of seamlessly, you know, if your query references data that lives locally, that lives on your machine, everything runs locally. So you can do, you know,

Starting point is 00:25:19 sub millisecond queries. If your query references data that lives in the cloud, you know, your query will run in the cloud. We can be, you know, make that very fast and, you know, return those results. And if your query references data that lives partially locally and partially in the cloud,

Starting point is 00:25:38 then, you know, we can actually, you know, we're hooked into the optimizer of DuckDB. So we can basically do the optimal thing, whether we either move some data locally up to the cloud, or we pull data from the cloud to your, to your local, local machine. And from the, the programmers perspective, or the users perspective, this is all just sort of just sort of just, you know, seamless, you don't have to really worry about where the data is. And so nice things that we can do about that is we can, you know, basically result cache that sits locally on your, on your local machine. You know, very often when people are doing

Starting point is 00:26:12 analytics, they do sort of tweaks of the same types of queries. Well, if your queries are similar enough, we may never, we may never have to actually hit the, hit the server again. If you're building an application and you want to show, you know, a stock portfolio to your end users, they, you know, basically, they can pull that all down locally. And you can slice and dice that and kind of operate on that, you know, locally in in your in your browser and have the updates be, you know, you know, sub millisecond, you know, so you can do, you know, 100 frames per second. But let's say you want to then say, okay, what happens if I had bought, you know, Amazon or Apple back in, you know, 2004,

Starting point is 00:26:50 that will have to reference data that lives in the cloud. And so then that will be just a slightly lower, you know, slightly higher latency, but still be able to pull those results down. So you'll be able to sort of operate on it, you know, in a very, very fast, fast way. So I think that hybrid execution is one of the keys that's going to be, it's going to have a lot of different, you know, different cool things you can do. Like, so instead of like having to, you know, if you're, if you're using Snowflake or BigQuery, you run a query, you know, you have to pay for that query because that's using cloud, cloud hardware.

Starting point is 00:27:23 But if you can pull data down locally, then you're, you know, basically that, query because that's using cloud hardware. But if you can pull data down locally, then, you know, basically that's going to be free. So you can dramatically reduce your costs if you're able to move those down locally. And, yes, you can manage that yourself. You can, you know, download, you know, export to Parquet, download that data, and then operate on it locally. But the nice thing about MotherDoc is it just sort of lets you do that, you know, really, really seamlessly.

Starting point is 00:27:48 Okay. So if I was in charge of data, say, a big organization, I'd be kind of having kittens now thinking about my data going onto laptops and not being centrally controlled and so on. So what does MotherDuck do to, I suppose, provide security and governance and control over how data moves around and who can access it and so on in that hybrid environment? Yeah, I think so. There's two answers to that. And the glib answer is, the first answer is, we're not worrying about that so much in the

Starting point is 00:28:20 short term. We think that there's a lot of sort of non-big enterprise use cases for Motherduck. And, you know, we will deal with those later. The less glib answer is, and I think the other thing is, I guess the less glib answer is that, you know, by using a service like Motherduck, you can actually have visibility into and control how people are, how people are, are doing this. Because I think without mother duck, if people have duck TV, they're going to be downloading this data and they're going to be doing, doing this themselves. But you as the IT administrator are not going to, to know what, what they're doing or how they're doing it. And there's a bunch of like things that we are, that we are, you know, planning in our roadmap to, you know, for example, you know, set, set TTLs when people,

Starting point is 00:29:14 when people, you know, so basically expiring, expiring data that gets downloaded, making sure that data is encrypted when it gets, you know, when it, when it comes locally, or maybe it only stays in memory or, you know, maybe the key, the maybe it only stays in memory, or, you know, maybe the key, the encryption key only stays in memory and has to be fetched from the cloud. So if, you know, somebody else gets access to the machine afterwards, that, you know, they wouldn't be able to access the data, you know, and these are all things that we have planned. But also, kind of the, they're not on the near term roadmap, because we're really sort of going for, we're going after, you know, individual analysts, people with a data problem,

Starting point is 00:29:48 rather than trying to target, you know, these giant enterprises. But I also think that there's a lot of, you know, big opportunities for these enterprises as well. So what is the user persona for MotherDuck, really? You know, is it analysts? Is it kind of data scientists? Is it analyst engineers? I mean, can developers make use of this kind of your product as well?

Starting point is 00:30:11 Yeah. So I think that that's sort of a broad, the answer is sort of broad because the numbers of people that are using DuckDB are so broad. So it originally started with data scientists, but it's being used all over the place by analysts. It's being used, you know, for people doing sort of lightweight data exploration. It's being used by developers because it's incredibly easy for a developer to get started with it because it's just a library. You link in the library. And one of the, I think, the cool things about MotherDuck and how we've integrated with DuckDB is that if you're using DuckDB, you can switch to using MotherDuck just by changing the database name. So if you do like in DuckDB, if you do like DuckDB connect,

Starting point is 00:30:52 foo.db is a local database, local file named foo.db. If you change that to md colon foo, that'll automatically connect to MotherDuck. You don't have to install anything. You don't have to change anything else. So if you're a developer and you're prototyping something locally because DuckDB is super easy and you can run unit tests in a millisecond, but then you want to switch to production and you want to run in the cloud, all you have to do is change your database name and boom, you're running against MotherDuck and running in the cloud.

Starting point is 00:31:28 And I think, you know, people that are building applications that need to give their users access to data is an ideal use case for MotherDuck because, A, you know, we can have this sort of WASM thing where you're running, you know, you can have these super low latency stuff running in the browser. But also it's just, you know, you can have these super low latency stuff running in the browser. But also it's just, you know, really, really nice developer ergonomics and eventually, you know, economics when we can, you know, when we have a, we have billing enabled. But also just really anybody who's using DuckDB that wants access to cloud. So analysts, we have a number of people that are sort of using it for sort of lightweight data warehouses.

Starting point is 00:32:08 People that are currently doing their analytics using Postgres. So I think it's a pretty common thing where people, you know, an application developeret at their application database or a replica of their application database like Postgres. But those are really not designed for analytics. And so with MotherDuck, you basically can say, hey, you know, DuckDB can read data from Postgres directly. So you can just point DuckDB at your Postgres, pull the data out, then, you know, use the, you know, if your database name is MD colon, you know, my orders, then now you're doing analytics in the cloud and we have integrations with, you know, a number of BI tools. Okay. So what's the relationship between yourselves and the DuckDB team then? And so do you sponsor it? Do you, I mean, how do you interact with them?

Starting point is 00:33:09 Do you compete with them? Do you compliment them? How does it kind of work for you? And what influence do you have over product features and the direction of DuckDB? Great question. I mean, we have a super strong relationship with them. I just talked to Hannes and Mark this morning.

Starting point is 00:33:24 You know, when we started MotherDuck, we, you know, we made an arrangement with them. I just talked to Hannes and Mark this morning. When we started MotherDuck, we made an arrangement with them. They got a co-founder share of MotherDuck. And in exchange, they kind of gave us an exclusive cloud database as a service. So they weren't going to work with anybody that was doing something similar to what we're doing. It is open source. It's MIT licensed. Anybody can do anything. But, you know, really we have a very strong tie to the Duck TV team.

Starting point is 00:33:53 They, you know, one of the things that we tried to do is make sure that we don't have to fork. And so they built in a bunch of hooks for us. So the, you know, pluggable data catalog, pluggable parser, pluggable storage engine, you know, serializable query plans, you know, you know, WASM extensions, like all of these things that they've added in DuckDB in order to make MotherDuck work. And just so we can be, you know, we have a client side extension, we have a server side extension. And, and we don't have, you know, and so that means that when there's a new version of DuckDB that comes out, we basically can support it on day one, rather than, you know, having to, to do this sort of lengthy, lengthy merge process. We do have an influence on their, on their, on their roadmap, but they really are, they, an independent organization, they, they, you know, they take the, they take DuckDB in the direction that they believe is right for DuckDB. But we

Starting point is 00:34:48 also do pay them for services and for features. And so if there's something that we want, generally, they're quite solicitous in making that happen. Okay. So putting your founder's hat on and your product manager's hat, I suppose your worst nightmare is reading a press release saying that Google Cloud have released a managed version of DuckDB or AWS have done that. So you mentioned a minute ago about the exclusivity, but I suppose what's your moat?

Starting point is 00:35:21 What's your defense against it just being made as a sort of like a commodity service on one of the cloud platforms? There's a number of things. So one of those is that, you know, DuckDB by itself is not a data warehouse. So if you just ran DuckDB, you know, behind a cloud, you know, interface, first of all, there's no real kind of cloud interface to use. So you kind of would have to, kind of would have to figure that out. But then you'd also have to build all of the user management, ACL management, global catalog and naming and storage management. I mean, DuckDB is a single file storage system.

Starting point is 00:36:02 So you basically have to figure out how to, you know, how to, how to make that work on, on cloud data stores. You know, object stores are generally immutable and don't work, work really well directly with, with databases unless they kind of change how the data is being stored. So there's a bunch of work so that you, they would have to do. I mean, it is, you know, it's, it's not rocket science. It is just, it is work. They could figure that out.

Starting point is 00:36:27 But it's not like, you know, Elastic or Kafka or something where it's just a question of like, you know, spinning that up and, you know, chances are they can do that better and less expensively because they're really good at that. The second part is just the hybrid execution. You know, the hybrid execution work, there's a lot of work that's gone at that. The second part is just the hybrid execution. You know, the hybrid execution work, there's a lot of work that's gone into that. I mean, we work with, you know, with Peter Bonks, who's invented VectorWise as one of the sort of pioneers of column-based, you know, query execution.

Starting point is 00:37:00 And, you know, he's on sabbatical for a year working with us to help us build out hybrid execution. I mean, that's hard to make that seamless. And so if somebody wanted to have sort of a drop-in replacement for what we're doing, you know, there would be a lot more work involved. And then lastly is the distribution with DuckTV. So the fact that I mentioned that you just change the database name and it starts working, you know, somebody who is directly competing with us wouldn't be able to get that level of integration. Okay.

Starting point is 00:37:30 So maybe Devils have a quick question. So when I spoke to you last time on the show, we talked about small workloads on BigQuery and, say, Snowflake and so on. So is it not the case that you could run a small workload on those systems? And why would people go to the time and expense or disruption cost of kind of using DuckDB when you could put small workloads into, say, BigQuery? I mean, what's, again, the unique – I know you said latency, but why should people care, really, if they're using small workloads? So I think small workloads work pretty well on BigQuery.

Starting point is 00:38:03 And in BigQuery, there were a number of customers that were, you know, paying a few dollars a month to just sort of like to do some exploratory analytics or they had small data and they're like, hey, this is basically free and it's super easy. BigQuery is only available in GCP and most, you know, most people these days are not on GCP. Snowflake and other data warehouses don't scale down nearly as well. And so Snowflake, if you have a data warehouse and you have to load data to it reasonably continually, it's going to cost you over $1,000 a month and $10,000 a year. And for a lot of people, that's a heavy lift. So I think that, you know, for those cases, it's not ideal. And I think, you know, just the economics of it, you know, even beyond the sort of the minimal

Starting point is 00:38:58 entry point, I think we're going to be able to make it less expensive, partly because we can move work down to the client. You know, query planning, at the very least, query planning is done on the client, which is not, you know, which takes up, you know, considerable, you know, computational complexity. And then also just, you know, being able to run, you know, sort of highly multi-tenant systems and, you know, the serverless, you know, scaling things up and scaling things down as needed, you know, we think also is going to be beneficial. And then lastly, sort of the simplicity of just sort of getting started

Starting point is 00:39:36 and ease of use, like, you know, the other vendors are not really optimizing for those use cases. They could, but today they're not, I mean, it's, it's not that hard to get started with Snowflake, but it's, it's a lot harder than, you know, pip install DuckDB, you know, select, you know, select 40, you know, you know, DuckDB space MD colon foo. Like that's all you need to do to, to start using DuckTB

Starting point is 00:40:06 and start using MotherDuck. And so, you know, we think that, you know, basically just a really low friction to get started on MotherDuck is just one reason to do it. Okay. So you mentioned earlier on the UI that's used for working with DuckDB, the MotherDuck UI, and that uses DuckDB, doesn't it, as part of it. And I suppose that struck me as maybe sort of one of the example of maybe some new types of web-based

Starting point is 00:40:39 applications that have this kind of hybrid setup or certainly embed parts of the warehouse actually in kind of like the UI. Is there anything that MotherDuck are doing in that space? And obviously we're aware of DuckDB there, but what is MotherDuck doing to enable maybe new ways of working and new ways of kind of running applications? Yeah, so I think the hybrid execution is something we're really investing in. So the MotherDuck UI,

Starting point is 00:41:09 the query results are returned in the browser and we've embedded an open source tool that actually one of our engineers has created called TAD, which is a pivot table editor. And the pivot table editor sends SQL queries to do the pivoting. But the SQL queries that it's sending is not to the DuckDB in the cloud. It's sending the DuckDB in the browser. And so, you know, we can get these, like, you know, just really, really fast, you know,

Starting point is 00:41:34 pivots on this, you know, query result data and, you know, sorting columns and filtering and, you know, doing these, you know, on-the-fly aggreg-fly aggregations, we can do that locally. And that's just sort of an example of the kinds of things that you can do with hybrid execution. And very shortly, we're providing a WASM toolkit and an application, data application toolkit, you know, to enable, you know, other people to really, really easily build, you know, build things in the browser that can take advantage of local execution and the power of cloud at the same time. Okay. Okay.

Starting point is 00:42:19 So, again, putting your product manager or founder hat on, what's on the roadmap really for this? I mean, so far you could say you've provided DuckDB as a managed service, but where do you see this going? And what, I suppose, you know, Jordan, you're pretty sort of like significant in industry really. What was it that inspired you with this? And where do you want to take this really?

Starting point is 00:42:39 Particularly now that you're actually one of the founders and you can actually realize your sort of vision on this. The thing that, I'll one of the founders and you can actually realize your vision on this? I'll start with the last part. The thing that inspired me was just this opportunity to focus on what I thought was the problems part of the burger, and to innovate and to build a SaaS service that really differentiated on delivery. And I think if you can compare with like how the typical way open source is done, you know,

Starting point is 00:43:21 open source is done as a company and as a SaaS service, very often there's a team, there's a great team that's building an open source is done as a company and as a SaaS service, you're very often, there's a, there's a team, there's a great team that's building an open source product. And they focus on making that great because that's how they, that's how they get their name out there. That's how they expand.

Starting point is 00:43:36 That's how they get investment by like building a great open source product that lots of people use. And then people say, well, how are you going to monetize and say, Oh, we're going to monetize via SaaS. And so they have like a small team building a SaaS service and they run it in a Kubernetes cluster and then they ship that. But the problem is like cloud enables

Starting point is 00:43:55 you to do so many different things. And so because the focus is not on the delivery and not on the service, they're not pushing the innovation nearly as far as they could be. And I think this relationship we have with the DuckDB team is ideal, in my opinion, because they focus on building an amazing database, and we focus on building an amazing SaaS service, and then we innovate on the delivery of that and with a hybrid execution and some of the things that haven't been done before. And so that's sort of the most exciting part for me. Getting back to your question about the roadmap, A, we want to push hybrid as far as we can, make it seamless so that you can pull data down

Starting point is 00:44:53 to your local environment seamlessly. We're going to have Git-style branching so I can branch the data into my local browser. I can basically then promote that back up to sort of to production, you know, better collaboration, et cetera. And then, you know, sort of data application toolkit, making it super easy for people who are building data applications to build these sort of hybrid applications, you know,

Starting point is 00:45:23 in basically a templatized way. So just make it incredibly easy to do so. We have some really, really good front-end engineers that we've hired that are going to help us enable that. You know, there's, you know, a bunch of stuff just sort of making it a great, you know, making it sort of a great lightweight data warehouse lightweight data warehouse, being able to integrate with the full ecosystem. So I also think that DuckDB focused on the getting data in, getting data out, and the parts that aren't just the database. And I think for us, we want to sort of carry that forward and make sure that we have really, really tight connectivity and connectors with BI tools, with data ingestion and integration tools, with things like DBT.

Starting point is 00:46:10 You know, and I think a lot of the really new, interesting BI vendors, you know, Hex, Real, Omni, are're using DuckDB, there's this sort of opportunity for us to integrate with what they're doing and have just a super seamless environment where it's sort of, you know, it's just DuckDB all the way through. Yeah, brilliant. So last question, how do people find out more about MotherDuck, sorry, and how do they get some hands-on experience and

Starting point is 00:46:40 get to experience this, particularly this hybrid sort of model? Yeah, so you can go to MotherDuck.com, you know, you can go to motherduck.com. You know, you can sign up for our wait list there. We can, you know, we're letting people in, and it's pretty quick. We just sort of want to make sure that we don't get overwhelmed. There should be a bunch of information there. And then you can do pip install DuckDB, brew install DuckDB.

Starting point is 00:47:03 There's sort of lots of stuff out there on DuckDB if you just want to get started on DuckDB. And will we ever see you in that duck outfit again? Quite possibly. Quite likely. I feel for you. I feel for you in that. The engineers, obviously, you took one for the team there, didn't you?

Starting point is 00:47:20 But are we ever going to see that again? Yeah. I mean, we realized that we had to um uh a company with a name like mother duck uh we either had to sort of pretend like we we you know we weren't really a duck based company or do the opposite and you sort of embrace the duck and so that's part of our like kind of core core company values is embrace the duck so um yeah i would I would expect to see that and things like it in the future. Fantastic. Well, Jordan, it's been great having you on the show.

Starting point is 00:47:50 Really, really interesting. Wish you the best of luck with the product. And thank you very much for telling us the story. Thanks so much, Mark. Thank you.

Your Ad Here

Drill to Detail - Drill to Detail Ep.111 'Making Analytics Fun, Frictionless and Ducking Awesome' with Special Guest Jordan Tigani

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.