The Data Stack Show - 120: Materialize Origins: A Timely Dataflow Story with Arjun Narayan and Frank McSherry

Starting point is 00:00:00 Welcome to the Data Stack Show. Each week we explore the world of data by talking to the people shaping its future. You'll learn about new data technology and trends and how data teams and processes are run at top companies. The Data Stack Show is brought to you by Rudderstack, the CDP for developers. You can learn more at rudderstack.com. Welcome to the Data Stack Show. Today, we are going to talk with Frank and Arjun from Materialize, founders, and we are so excited to chat with them. We had Arjun on the show over a year and a half ago, Costas, and I can't wait to catch up with

Starting point is 00:00:40 them. Materialize as a product has gone through a ton and as a company they've gone through a ton. But the most important question is what do you think is going to happen because Brooks is gone and I'm in charge of recording? Which means that when the cat's away the mice will play and I have a feeling we're

Starting point is 00:00:59 going to go long. Yeah, like we're going to taste like the sweet taste of freedom. I am able to do whatever we want with the show. So yeah, let's do that. I feel like a teenager again, but like my parents have lifts, you know. I really do too. This is going to be a rager.

Starting point is 00:01:18 The podcast version of a high school rager where we're discussing you know academic papers yeah widely oh yeah let's ah yeah let's do it okay well first of all with this newfound freedom what are you going to dig into you have unlimited freedom what are you going to ask frank and arjun about oh yeah i think like first of all mean, we definitely need like to ask them about their relationship, right? Like how they met, how they ended up being like co-founders and how these relationship has evolved as the company has evolved, right? It's so it's going to be very interesting, like to see that.

Starting point is 00:02:05 And to be honest, like see, they're hear their story from them. Yeah. And that's one part. And the other is what got happened in one year, which is, you know, like in startup land, like a year, especially at the end of H1. 18 months actually. Yeah. Yeah.

Starting point is 00:02:26 It's an extremely long timeframe. So let's see what happens, where they stand today and hear about their story, how they came together and why. All right. We'll wrap this up in about four hours. Oh yeah. That's going to be wild. Let's dig in with Frank and Arjun. Let's do it. Frank, Arjun, welcome to the Data Sack Show. We're so excited to catch up with Materialize. It's been over a year, I can't believe that, since we last had you on the show.

Starting point is 00:03:00 So welcome to both of you. Thank you. Thank you very much. Okay, we usually start with introductions, but we're actually going to switch it up a little bit today, which is very exciting. Brooks isn't here driving, so I get to do whatever I want, which is my favorite thing. And actually, because the background and the stories are so good, and both of you bring such a different perspective. So we're going to do that second. First, though, could one of you just give us the baseline? Like, what is Materialize? And why would you use it? Just so our listeners understand it out of the gate. Yeah, so Materialize is what we call a streaming database.

Starting point is 00:03:36 So it looks and feels like a database. You interact with it using a SQL command line shell, and you write select statements and create table statements and things like that should feel very similar to a database that you've already used. In fact, we like to say it's like a, it's a streaming database you already know how to use because you're probably already familiar with Postgres. So Materialize looks and feels like Postgres. What's different about Materialize is you create Materialized views, right, as opposed to running one-off select queries, and you plug Materialize to your data sources. So those could be upstream OLTP databases.

Starting point is 00:04:13 Those could be event streams like Kafka. And Materialize incrementally maintains very efficiently the results to your materialized view statements. So that when you ask for the latest result, it is already pre-computed for you and you get your, what would otherwise be a high latency, complete refresh over potentially large amounts of input data are very quick. So the kinds of things we can, materialized views are not a new concept, right? They've been around for over two decades. What is new about materialized is the breadth and depth

Starting point is 00:04:54 of materialized views that we can incrementally maintain very efficiently. So think like a eight-way join with like four subqueries, right? And all of the underlying eight input sources are changing at, you know, low volume, high volume, slowly changing dimension, doesn't matter. You know, within milliseconds, the result of that answer is kept up to date for you. And this is useful in a wide variety of cases where today you may be using some kind of batch pipeline that is sort of continuously rerunning, potentially in a horrendously expensive way, and you're probably still

Starting point is 00:05:28 not getting the latency experience that you want even if you're horizontally scaled out and using an unlimited budget. There's various queries where you simply, sort of the speed of light of the amount of data that has to be crunched over means that your pipelines are going to be perpetually an hour out of date. So think like some complex retargeting persona segmentation input pipeline. And the fact that this can now be incrementally maintained with tens of milliseconds or hundreds of milliseconds or single digit millisecond latency.

Starting point is 00:06:06 Opens up the aperture to the kinds of experiences you can build on top of your data platforms for your user. And Materialize is, we believe, the best way to power these real-time experiences. It's available as a cloud-native, horizontally scalable database in the cloud, like many others that you may be familiar with, like Snowflake or Redshift or things like that. Very cool. You've definitely talked about this a time or two before. That was super helpful. Okay, so let's rewind. And Frank, I would love to start with you because you've done a lot of work in the academic space, actually.

Starting point is 00:06:49 And so I would love to hear about your journey to materialize. And prepping before the show, I loved hearing you both bring a very different perspective to how you came together and materialize came to life. So Frank, give us your background and tell us how you got into Materialize. Yeah, sure. Thank you. So, as you say, originally, academic background was mostly interested at the time in theoretical computer science algorithms and designing more efficient algorithms, data structures, what have you, which transitioned eventually into some work with big data. Essentially, the big data is a great place to show off the difference between a pretty

Starting point is 00:07:29 good algorithm and a really good algorithm or a really bad algorithm. Also, we've got a bunch of those out there. But I went from grad school off to Microsoft Research, where I did some work on data privacy there and eventually transitioned into working on big data problems at Microsoft Research. They were then working on systems called Dryad and Dryad Link, which if you can sort of think about what Spark looked like when it came out, it was that but a few years ahead of time.

Starting point is 00:07:58 Savvier version of MapReduce. And I was a big user of that and sort of picked up some of that DNA and started to work on a system there called Nyad, which is a more powerful, more expressive big data process that allows you to put in loops and does streamy style work instead of just large chunks of batchy data. The place I was working vanished. The research lab in Silicon Valley vanished, and I went on vacation for a while. I hadn't gone on a big vacation for a while,

Starting point is 00:08:36 so I went on vacation for about three or four years. That's awesome. Yeah, no, it was not exactly. I did a whole bunch of different things, a little bit of work. And a bunch of the time. Yeah, no, it was, it's not exactly, you know, I did a whole bunch of different things, a little bit of work, but, and a bunch of the time that was actually kicking around with Rust,

Starting point is 00:08:49 which was just stabilizing at that point and doing, in some sense, version two of NIAID, you know, changing the things that I wished

Starting point is 00:08:56 I could have changed and baking it off at various points, writing a whole bunch of tart blog posts about my take on the big data space and stuff like that. But eventually, or, you know,

Starting point is 00:09:10 not eventually, but like periodically, was in touch with Arjun, who increasingly was expressing the idea that although it's probably great fun to be writing as an individual these little blog posts that sort of jab at and needle various other people

Starting point is 00:09:25 if you want to actually see if the ideas had merit if they would go anywhere the the logical thing to do the smart thing to do was to attempt to assemble a company not just because having a company is fun um so much as this is the right way to get together the set of people with necessary skills to actually go and translate this from one person's hobby project into like actual thing that people would want to use. In part, and you know, substantial amount of credit by realizing that people wouldn't actually want to use the inner engine part, you know, like no one actually wants to buy an engine and try to drive that around town. You know, they want a car and yeah part of building that requires a whole bunch of other parts that other people are better positioned to put together and yeah and then you know eventually not immediately eventually agreed that was an interesting sort of next thing to do done some other things when i was thinking like should i what should I be doing next? And this is absolutely the most interesting, by far, thing to be doing. So, showed up in New York, and we've been here for now for like four years, I think, roughly,

Starting point is 00:10:32 building Materialize. Very cool. All right, Arjun, your version. Yeah, so I was doing a PhD in computer science at Penn, initially focusing on data privacy. And Frank sort of massively understates his contributions to data privacy. He's one of the co-inventors of differential privacy, which I was working on. And as part of being a good grad student is one maintains sort of various lists of people and lines of work where you diligently follow up on every publication and new citation and things like that. And I had noticed that Frank had drifted away from data privacy into distributed systems. I was working on the confluence of sort of building large-scale

Starting point is 00:11:15 distributed systems that maintained data privacy. So it still seemed entirely appropriate to be following that. In 2013, when Frank published the NIAID paper, it was sort of the perfect moment because I had immersed myself in all of these and you sort of have to rewind the clock a little bit 10 years ago where, you know, it was Hadoop this, Hadoop that, Apache Spark had just come out, and there was all these various bespoke, you know, here's a large-scale distributed system for computing triangles in this graph. Here's this large-scale distributed system

Starting point is 00:11:51 for computing. And it was like, one was very confused why one needed 100 bespoke distributed systems. And when NIAD came out, it was sort of, at least from my vantage point, sort of a seismic moment

Starting point is 00:12:09 of a unified and subsumed entire classes of what at the time people were thinking of as separate streams of research. It was the first system, at least I think the first one, that unified batch and streaming and sort of capability. So it was able to do everything that the contemporary batch processors were able

Starting point is 00:12:29 to do. And also it was streaming and performance competitive with the batch system, in fact, much better performance wise. And so I remember, you know, going up to my advisor and being like, why are we still reading these other papers? Like these people should stop what they're doing and rebuild on top of this clearly sort of superior, theoretically well-principled and well-architected thing. Sort of getting the painful lecture of like, that's not how the world works at all. At the same time, I was also, you know, obsessed with distributed database. At the time, like eventual consistency everywhere, and there were very few strongly consistent data systems, particularly distributed data systems. was one by Google Spanner, which was the first sort of horizontally scalable, globally serializable OLTP database at the time. And I found that very also simplifying. I think the

Starting point is 00:13:35 thing that attracted me to both systems was they both greatly simplified the level of complexity. One of the big advantages of strong consistency is from the user's perspective, there's way less nonsense going around that you have to reckon with and reconcile in your brain to get your job done. And I remember pinging Frank and saying, hey, are you thinking of starting a company? Because I assumed my initial assumption was he was going to, because it was so clear to me that it was a wonderful platform to build many additional layers approaching closer to what a customer would actually use. Stumbling upon small, then I think series A startup, Cockroach Labs, which was building a open source clone of Google Spanner. And I was very attracted to the open source sort of business model, the building out in the open, as well as all the sort of rigor and theoretical basis that they were building on top of, which had been sort of published in Google Spanner.

Starting point is 00:14:51 I, as a, I think about 20, less than 20 people, joined the engineering team and discovered that I was a mediocre engineer, but I learned a surprising and delightful amount on how to build a database that builds customer trust. Because databases have this sort of no one gets fired for buying IBM property if there's an inherent amount of conservatism. You're making these choices in a decade-long horizon as a purchaser of databases. And it takes an exceeding long amount of time and sort of engineer hours or engineer years to get to that level of polish and fidelity that customers would buy it.

Starting point is 00:15:33 I think the first quarter I joined Cockroach, our OKR was 24 hours of uptime without a crash on a single node, which does not inspire confidence when it comes to databases. And it was early, right? We were nowhere close to it. But what I really witnessed over those initial years was how to communicate clearly how you are building things and when you will be ready eventually without overstating it, without any of those little lies that spiral out of control. The thing that I really loved about Cockroach was the honest communication as to where the system was and how we were going to make it more scalable, more robust, more stable over the years.

Starting point is 00:16:28 And Cockroach is a very successful company now and rock solid and deployed in all these sort of staid conservative institutions that pick trusted and true technology. And that journey was one that did a lot of counterintuitive things, which was you actually show your warts or you overshare the things that are broken because that actually builds trust from the buyer's perspective. Yeah, yeah. As opposed to pretend it's more polished. I've seen, you know, some other mistakes that other folks have made, which is to sort of cover things up and pretend that their system is as stable and rock solid as Oracle. And, you know, it just inherently can't be with missing three decades or four decades of development.

Starting point is 00:17:13 At the time, I would sort of periodically reconnect with Frank and sort of try and sort of check in and say, have you changed your mind? Like, this is really cool. I tried to deploy your Rust build and it doesn't build, the build is broken. There was a lot of that and it's true. The, and I kept hearing sort of, he was, you know, I know I'm very happy. I'm on a beach in Costa Rica and writing code few hours a day and I don't really want things to change.

Starting point is 00:17:52 I think I grew increasingly frustrated that he wasn't going to commercialize and build a system on top. As we had more conversations and as I probed as to why, he had a lot of questions around, why even start a company? Why even do this? And so I would patiently explain, well, you know, it's, you may be totally fine as, and totally capable of building a robust distributed system as guy on the beach who occasionally comes to computer. But your customer may want somebody that they can call up at any moment and you do not fulfill that SLA. So you need to sort of, and they actually need some education because right now the query language is this Rust library you wrote and you know, there's lots

Starting point is 00:18:30 of people who would want to use your system who do not wish to, or do not wish for that interface to be this bespoke Rust UX that you've, that there are standards like SQL and things like that. And eventually we came to this point where I convinced him that these things needed to exist. And what I was hearing from him was that he agreed that all these various functions needed to happen for the project to be successful and actually changing

Starting point is 00:18:59 or having an impact on the world. But he didn't want any of these things to be things that he had to do personally himself. And the meeting of the minds was like, well, that's actually what a company is. It's a way to align a group of specialists in a variety of different expertises, all sort of working together to make the project successful as a group. And that's how we started Materialize. I sort of want to put in a small plug for the wonderful founders at Cockroach who were incredibly gracious and helpful. The first check-in was Spencer, the CEO.

Starting point is 00:19:36 The second check-in was Peter, the CTO of Cockroach. They were very kind with their time as well as their liberal checkbooks in getting us off the ground with Materialize. Love it. What a great story. Hey, I have one more question for each of you, and then I want to dig into the product stuff because we have so much to cover. Frank, I'll start with you. Do you remember the moment where you sort of changed your mind about starting a company? I remember vague. There's a few moments that I remember. So it was in Vermont where my parents live.

Starting point is 00:20:15 I did not, strictly speaking, live anywhere at the time, but was there visiting them. I would have to call his phone as mom would pick up. It was like a landline. It was a landline. That is so great. The best cell coverage. And yeah, discussing things there with Arjun. I think largely I was told where my mental model of what started being a company was like,

Starting point is 00:20:40 basically looked at a whole bunch of academic analogs or various people you know essentially threw out some ideas and hoped a company formed around it and had to run around and shill it for a while or something grim like that and that wasn't really what i was about yeah but you know it's a little bit more chewing and developing on it it became a bit two things became clear The plan was absolutely to involve other competent people rather than just try to cash in on whatever this particular piece of work. But also there was something actually interesting and valuable to build, right? Like it was not just a shinied up version of the code base. It was SQL wrapped around a thing,

Starting point is 00:21:24 which is many orders of magnitude more relevant than fantasy shiny engine and it started to click that was actually worth producing early you know seeing at least if it was worth producing you know if you could actually take good ideas one of the struggles you have as an academic is figuring out how to take your ideas which are very clever i mean everyone in their heart you know the super clever idea but how do i get everyone to understand it and appreciate it yeah and a lot of that's sort of what the proposal was in many ways it was like here's a mechanism by which we can translate things that you thought were really important sorry this is the frank centric view of things of course things that you know you think really

Starting point is 00:22:01 important into larger benefit, basically. Yeah. I love that because I think it's so easy, especially in startup world to just default to like, well, you have an idea, so you just start a company. But I love the first principles thinking of first asking, do we have a strong conviction that this thing should exist or that these things should exist? And if the answer to that is yes, what is the best way to do that? And for some things, the answer is a company, and for some things, they may not be. So that's just really helpful first principles thinking. Okay, Arjun, question for you. And the NIAID paper is really a phenomenal piece of work.

Starting point is 00:22:42 And Arjun, I want to know your experience. And so maybe I hope this isn't awkward for you, Frank, but I'm interested to know, it sounds like when you read that paper and you went to your supervisor, whoever you were working with, and you were like, we need to stop working on all this stuff and start building on this. What was it like to, you know, it sounds like you had been sort of knocking on these doors that were, you know, questions looking for this answer that sort of said, yes, this is the way that we should do this. What was it like to read that? Was it sort of like a bunch of stuff congealing at one point, you know, almost like an epiphany? Or can you just describe your first, you know,

Starting point is 00:23:23 sort of like when everything clicked? Yeah, yes, it was like that. It first, you know, sort of like when everything clicked? Yeah. Yes. It was like that. It was, you know, it's, there were many papers that I'd read in sort of increasing confusion. The default experience of reading a incremental piece of the sort of new, you know, new set of publications come out and you sort of go through them one by one. You read the abstracts and then you read the ones that seem exciting and you go through

Starting point is 00:23:49 and each one of these sort of makes you more confused. It's like, well, why? Oh no, oh no, another thing I don't understand. Oh no, why does this thing have to exist? Oh, this site's a whole bunch of other stuff that I don't know and I gotta go. You know, it's sort of like the endlessly expanding tabs version of like learning about

Starting point is 00:24:08 something and you're just like, good grief. I started with 14 open tabs and now I'm with like 37 open tabs and like it's dark outside and I feel more confused. And the NIAID paper was the giant tab closer, right? You read that paper and you go, oh my goodness, I get it now. These people are wrong. These people are onto a partial solution. These people, you know, are completely subsumed. These people with a slight twist, I have a suspicion would be able to, you know, I'd be able to reproduce that on top of NIAID, but very much simplified. And so that was the sort of rough experience

Starting point is 00:24:46 that I had reading that research paper. I think what I will credit myself with is a lot of academics in that are very wedded to their own ideas and I wasn't wedded to mine, right? So a lot of folks, they really want, I built a cool thing. I want to commercialize my cool thing because it's mine. And I did not make that mistake. I was like, well, everything I have is a piece of junk, but that's a relief because I don't have to worry about how to close that loop anymore. The answer is right here. And that was why I was sort of in the mindset of, you know, very willing to throw things away.

Starting point is 00:25:26 Yeah. Love that humbleness. All right, Costas, I've been monopolizing, so please take the mic. Yeah. Thank you. Thank started a company, right? Very different experiences, I guess. I don't know. I mean, I've touched academia a little bit. I've never been in big tech. I've done something with startups. So how, like, how do you, how do you feel about, like, how, like, what's the difference?

Starting point is 00:26:14 And okay, I would like to ask you, like, what do you prefer today? But I think, like, you'll probably say, like, materialize, of course, but it would be great, like, to hear, like, the good and the bad of each one, because they're like, you know, like, it's very rare to find someone who has done all three and so successfully also. They're different. I should be clear, the big tech thing that I did was at a research lab, industrial research lab, which is not as different from academia as you might imagine. It's a bit like getting your toes in the water with respect to industry but it's still it has a lot of the safety nets that that academia has yeah it's a good i don't know i my personal vibe at the moment is i've had a bit of a break with academia and i'm happy to be not not doing that anymore and part of the reason i think is from my point of view and a lot of different takes on this but it's the the motivation. Why do you do things? And, um, from outside, I think people think maybe the academia is about finding truth

Starting point is 00:27:11 and finding meaning. It's often not, it's often about finding the right way to, to shine the thing that you're currently holding onto. It's about constraints where like, you haven't had a good idea for a year, but you got to write a paper. So what are you going to do? And in many cases, the correct answer would be, keep your mouth shut. Don't go and confuse everyone by saying something weird and complicated.

Starting point is 00:27:35 And on that, I like to joke that I believe very much that, you know, there's academic work that we are doing that will be worth writing once we have done a sufficiently meaningful amount of work that we are doing that will be worth writing once we have done a sufficiently meaningful amount of work that he, you know, immaterialize paper is worth writing. That has, you know, you'd have a different perspective in academia where you, the moment you have the absolutely least amount of thing that is publishable, you push that out as opposed to, you know, here where we're like, well, we could certainly publish a paper now, but would it be the best possible version that has explored all the paths?

Starting point is 00:28:08 No, not yet. So let's take more years. And speaking to that, actually, one of the differences as you go from academia, as I went, at least from academia to industry at Microsoft Research was a bit of a longer timeline on the work that you're going to do. You were expected to do better, higher quality work, but you're given a bit more time to go and do it. The NIAID paper, for example, got rejected,

Starting point is 00:28:32 I think three times, something like that along the way. And it got better each time. And at no point did we like, ah, in some panicked mode have to go and throw out the window or something like that. It was totally fine, took our time with it. Had support from the organization from management at the lab so healthier in that sense in terms of like actually trying to find something of value to put back and contribute to to the group and my experience has

Starting point is 00:29:00 been as you go now to a startup there's just that much more attention that's being paid to actually having impact and meaning. A lot of academic work and even in industrial lab can be a bit inward facing. It's a bit like, wow, I've done a really impressive thing. I'm going to show my friends and see if they're impressed also. liked about, well, certainly in a startup setting, but also like as I was transitioning, I guess, from academia through industry, appreciating more and more that computer science is sort of the art of abstraction. It's about taking a really clever thing and actually not really having to show someone how clever it is for them to appreciate it and enjoy it. And there's a little bit of letting go in that because if you have a lot of

Starting point is 00:29:41 self tied up in the cleverness of the thing that you've made great you know as you slide along in my early days of materialize i really held my nose nose with respect to the sequel and how grim that was going to be because personal opinions and really come around to appreciate that like it or not it's incredibly useful as a way to communicate with people who absolutely do not want to have to know how all of the complicated stuff works inside here. You're very advanced. Stream processors want to work, and they already know how they want it to work. And 100%. Yeah, that's very interesting.

Starting point is 00:30:15 And like one additional question that has to do with academic patents. I'll start with academia. So you were doing like research in privacy, right? And very successfully also. Like what made you move into the processing instead of... Yeah. Um, I think this is a pretty, pretty easy non-technical answer actually, which is that doing a really good job at data privacy never resulted in anyone being

Starting point is 00:30:46 happy with you. So you'd walk into the room and like, I have a very important data privacy announcement for everyone in the room. And they're like, Oh, not this guy again. Last time he was here, he said we had to stop doing everything. And, you know, you think you've actually done a net positive thing by like, you know, introducing like know introducing like oh oh like bad things could happen if you know this and that and like you're saying you get around it and and people are much happier five minutes ago before before you showed up uh and you see that now like there's a bunch of tension about privacy in the census bureau and the former data consumers of the census are not super positive about the whole privacy thing. But if you then do something more like big data, everyone is delighted, possibly irrational.

Starting point is 00:31:31 I mean, here's another big difference, actually, which is as you go from academia to the real, to startup stuff, I don't want to misstate this, but certainly the attitude goes from one, from a combative attitude of like, I'm also a very smart person looking at your work and you need to convince me that you're also smart and your thing is good to a much more receptive, friendly, like, wow, no, if you do something that makes my life better, that's amazing. I will tell you how happy I am. I will not feel threatened by the fact that you might have just ruined my next few years of

Starting point is 00:32:04 research. I'll be delighted that I can throw away that horrible thing I've been working on and replace it by your thing. So the attitude is like this sort of emotional feedback. Pretty grim to much more positive. Yeah, yeah. Yeah, I think from my experience also in academia, I think people in academia are much more grumpy than the people I can stop. I just want to put a plug. I had a delightful experience.

Starting point is 00:32:31 I think the PhD, you know, I finished it. I graduated. It was probably some of the happiest sort of most carefree times where I got to read whatever I wanted to read, sort of had access to experts who I could poke at length, decide I wanted to read this large tome, go sit in some, you know, 18th century library that was beautiful and spend morning to evening drinking like five cups of coffee, reading three books. There's a lot, there's a lot, there's a lot of, you know, my personal experience was wonderful. Sorry, to be totally clear, I very much enjoyed my academic time as well. A PhD was great. I guess I would just say that as you go from a place where you're like, you know, satisfied by the things you're satisfied in academia, you try to translate those

Starting point is 00:33:14 same things into finding happiness, delivering, you know, real things to be, you know, you might come to different conclusions when you show up and try to communicate with people who have real problems and are going to tell you what they think of your answers. Yeah, yeah, 100%. And like, I don't say like, okay, it might sound a little bit like I'm saying that, you know, like working in a startup is like being in the la la land or whatever. And like academia is like, just like, you know, grumpy, sad people. That's obviously the case. It's not like the incentives and like the type of work is different, right? So you have like, when you see like, there is a reason like people in academia is

Starting point is 00:33:49 like so thoroughly critical about things. That's like how progress is made at the end when talking very like abstract things. So I'm not trying to say that like academia is not a happy place. It can be very happy place. Okay. Enough with academia. Now I want to ask, I assume-like question that is a bit more product and startup-oriented. You mentioned databases and how unique they are in terms of productizing them and getting them to market, right?

Starting point is 00:34:21 And my question is, what is an MVP for a database? I think an MVP for a database is like an MVP for many other products. Something that's, you know, you ship too early, that you put out there, that people see the potential, but are unwilling

Starting point is 00:34:39 to put into production, right? So people may not be willing to put your database into production for an actual use case that has an SLE attached to it, but if they can see the potential that this actually significantly accelerates the time to value for them to build whatever it is they are building, then you will get some signals from the market that it is worth continuing to put in effort, right? And what that concretely translates to means that you have to look for signals of success

Starting point is 00:35:14 that are not revenue because you can't get money for that excitement, but you can measure that excitement, right? You can look at the number of folks who are downloading and using it. You can look at the number of folks who are paying attention. It's just not going to be revenue. And you have to be very clear that the rough thing that I had in mind when I started was it takes, roughly speaking, $100 million of capital raised and a lot of it, the large fraction of that deployed before that monetizable moment. And instead of going into it eyes wide open, I had the tremendous benefit of that advice from Spencer, right?

Starting point is 00:35:52 So Spencer had sort of known this when he started Cockroach, which was $100 million is roughly. And that doesn't mean we raised $100 million in one shot, right? Like we did it over three subsequent rounds. It's that those, each level of de-risking involved showing the world an MVP of a non-production piece of software that was incrementally more filled out and measuring that sort of positive reaction that this is a thing which when completed, I would be delighted to use. All right. And where is Materialize today as a product?

Starting point is 00:36:30 How far ahead of like the MVP stage is it? David Pérez- Great question. We we have this since we last spoke I think 18 months ago, we have built a cloud native distributed version of Materialize that maybe Frank can get into some of the details about that is currently in early access with a select group

Starting point is 00:36:54 of users. If any of the listeners are interested in becoming one of those early access customers, please go to our website and sign up. And we're onboarding new folks to our cloud platform every week.

Starting point is 00:37:12 But it's something exciting. Why not give it over to Frank to give us the details? It's actually a good transition from like the question of MVPs and potentially like sequences of things that you reveal to build more confidence and also exercise, assess more excitement about things. About a year or so ago, certainly when you talked with Arjun,

Starting point is 00:37:32 the state of Materialize at the time was a binary that you could deploy on a large computer. And sometimes it was like a Postgres-like type of thing. You know, you get a big machine, you run Materialize on it, and it will keep your

Starting point is 00:37:45 materialize use up to date, very fast. But I had some similar limitations that people bump into with Postgres, where got some limited resources. And if you have one of these, and your friend shows up and says, that's amazing, can I use it? You say, no, this is mine, stay away. You know, you're going to screw something up, like I have a real projection job running on here. You can't. And so going from a thing, which I did this individual binary to a thing that did a great job of assessing people's appetite for this. They could see this and like, this is great.

Starting point is 00:38:19 I really like to use this and start asking questions about like, you know, what happens when I have the second use case, or if I want to bring in more team members, these types of things. And folks had realized, you know, credit where credit's due to be people out there like, you know, Snowflake and whatnot, that separation of storage and compute was a great way to go and do this, that if you design a system where the data can live and scale independently of the compute assets that you want to attach to it. It's super easy to go and turn on additional compute assets and then bring more people onto your pile of data and give them the experience of an unboundedly scalable system, both in terms of

Starting point is 00:38:57 how much data you could throw in there, but also, you know, the person who says, I'd like to use this too, you're like, yeah, just press this plus button right there, you'll get your own computer and you won't screw up my production system either. Where Materialize has gone is essentially in this direction of decoupling the architecture, the previously monolithic architecture into, in fact, three layers, but like a storage layer decoupled from a compute layer decoupled from what's essentially a serving layer where you land data in the storage layer. Now, it's not just static blobs of data that you lend there. We're lending essentially for you

Starting point is 00:39:31 continually updating histories of data, so how your data have changed over time, with very clear indicators as they land. And what is, from our point of view, the exact time at which this change happened? Right now, essentially, enough information that any two people who were to look at this data would agree how did the data change and at exactly which moments. Which allow us now to turn on these

Starting point is 00:39:54 compute nodes, which are the same compute engine that was in the monolithic materialized, but now on as many computers as you want of big sizes, small sizes, whatever, you know, all of the above. Reading as you want, big sizes, small sizes, whatever, you know, all of the above, reading the same data, coming to the same conclusions. So sorry, coming to, you know, compatible conclusions, you know, exactly consistent conclusions.

Starting point is 00:40:15 You know, if one of you gets a count of this data and another of you gets a count of that data, and I look at the numbers will add up exactly at all times. And this is, you know, in the same way that Arjun mentioned earlier that consistency guarantees, strong consistency is very liberating for users. Same sort of thing I think that a lot of them are looking for in these low latency data systems

Starting point is 00:40:35 where, you know, if you've got a low latency data system and you just have to be like, wait, that's not very useful. I mean, it does a thing, but, and low latency, these three things are sort of our three watchwords. Scalability, wait, that's not very useful. I mean, it does a thing, but, and low latency, these three things are sort of our three watchwords, you know, scalability, consistency, low latency. Pick three.

Starting point is 00:40:51 Is the new cloud native version of Materialize. So, you know, you can rock up and be confident that you get that same experience, but it can grow with you as you either get larger use cases or more use cases or more team members, that sort of thing. A lot of sort of things that folks have asked for when we had that monolithic

Starting point is 00:41:10 materializing experience also sort of fall out neatly from a separated storage compute architecture. For instance, you know, the storage is infinitely scalable, right? So it's backed by S3, so you can sort of land extremely large histories and store them very cheaply. Another one is replication, right? So you can have highly available materialized because with this, you know, permanently stored exact timestamped history, the computation is always replayable. So you can have two of these running at the same time. You can have two of them that have different hardware footprints as well. So you can have two of these running at the same time. You can have two of them that have different hardware footprints as well. So you can have a horizontal scalability. You can have a horizontally scaled multi-machine cluster running alongside a small, tiny sort of test cluster or a, you know, development cluster.

Starting point is 00:41:58 Have those sort of give you the exact same answers. Maybe this one takes longer because it has fewer computer resources attached. And you get this mix and match experience, kind of like Snowflake, where Snowflake you can mix an Excel warehouse with an XS warehouse. They're all connected to the same sources of data. So I'm going to go a little deeper on one of Arjun's examples there, because I think it's really cool. I resonate with it, at least ergonomically, which is replication.

Starting point is 00:42:26 You can have, because we're computing exactly consistent, exactly identical results. If you want to do rescaling, for example, you've got a large machine, you realize you need to go out to two machines or four machines or something like that. You literally just go and spin up another copy of the same computation.

Starting point is 00:42:44 There's a nice index for this. With more resources, it comes up to parity. You turn off the first one. There's no noticeable interruption to your use of the system at this point. The cutover is consistent and instantaneous. And you now have just rescaled from one size to the next larger one to accommodate whatever spike you're seeing or just general growth. And you didn't have to spend 10 minutes with everything turned off and rehydrating itself. Okay, that's super interesting.

Starting point is 00:43:15 I have a question. So when I played around with Materialize a long time ago with the binary that you could download. The feeling that I was getting is that this is like a technology that I can use on top of another database, right? Like I can have, let's say my Postgres and I want really good and low latency materialization to happen there. So I can attach to the application log and start like doing very low latency, like materialized use on top of that. Or on a Kafka topic, right? But let's say there was always something else involved in there.

Starting point is 00:43:59 Like there was some other like database that I needed and I would enhance, let's say, functionality of that database system with Materialize. If I understand correctly right now, I'm talking about like a more end-to-end system where I can do everything with my data inside Materialize. Is this, I get this right? You haven't actually, maybe I should have said something which is obvious to me and it wasn't. Which is

Starting point is 00:44:28 one of the things that a storage and compute separation gives you is a storage layer. We didn't have one of those before. That's maybe what you're pointing out. So we'd always act essentially as a cache of some upstream source, be that Kafka or Postgres. And by owning the data

Starting point is 00:44:44 now, by owning the data now, by owning is wrong, but by pulling in our own copy, this is healthy, both from providing consistency up to people. Basically, we have to hope that Kafka doesn't throw away data or something that just isn't actually the case.

Starting point is 00:44:59 We're able to provide some guarantees up, but you're also able, if you're interested, in lending data directly into materializing and create some tables or inserting data into those tables will happily keep those around for you as well. It's a little, I mean, there's trade-offs in that if you want the highest of throughput ingestion, something like Kafka is probably going to be better than one PC SQL connection where you copy paste

Starting point is 00:45:22 a bunch of stuff in, but. Henry Suryawirawan, Okay. We'll, we can't shut up with the trade stuff in. Okay. We'll, we can chat about the trade those. Okay. That's super interesting. So again, I want to focus a little bit more like the experience, just like, because I want people like to understand not just like the technology, but like what they can do today with like materialize, right?

Starting point is 00:45:41 I think it's important. So let's say I create an account on Materialize and I would like to throw data into it. Can I create like a bucket on this tree and start pushing like... Stig Brodersen It's even simpler than that, right? So you sign out, you go to cloud.materialize.com, you get a connection string, you type psql connection string, you're in, and then you say create table and then insert into table, insert into foo values,

Starting point is 00:46:11 blah, blah, blah, just as if it was Postgres, right? And these things are inserted and then materialized will save that in the, in an S3 backed storage engine layer, persistence layer. You can, again, connect to this using any application that writes data that has a Postgres driver. So any programming language is a stock Postgres driver that you can use and can start inserting data into Materialize. You can then also create these materialized views. You say create materialized view as a count star, as select count star from this table, join, you know, or join some other table or things like that. One interesting thing is that you can join data from heterogeneous systems, right? So most people's architecture involve many things,

Starting point is 00:47:07 right? So they already have an OLTP database. They already have some web events coming in through that maybe are loaded in a Kafka topic. Maybe they have some other systems that are loading data into Kafka, some microservices, things like that. And one of the things that people often want to do is to enrich these streams of data by joining them against another stream of data, right? So you might want to take your web events and then join it against the customer sort of record of truth that is in your OLTP Postgres data, right? So you might want to take the stuff that you're landing in these new tables and join it against those. Materialize is very powerful at doing these sort of join materializations, primarily due to the architecture of differential data flow and timely data flow. the Postgres client driver protocol, be it inserting data,

Starting point is 00:48:06 be it for that control plane of creating those materialized views and creating that stack of data pipelines, be it for the create cluster, Excel statements, that requisition resources in the cloud, or for reading back from some application or some BI tool or things like that. Henry Suryawirawanacik, Okay.

Starting point is 00:48:25 So if I'm, let's say like a user of Snowflake, right? Like I'm using Snowflake. And I'm, you know, like I have like a very specific way, what's very common of like, you know, collecting my data, ingesting my data, loading my data, transforming my data, collecting my data, ingesting my data, loading my data, I'm transforming my data, consuming my data, right? How is this low for, let's say, lifecycle of data, like in a traditional cloud's warehouse represented with in materialize. Is it different? Is it the same?

Starting point is 00:49:06 Can I reuse, let's say the tools that I have like using already and just like, you know, throw out Snowflake or BigQuery and put materialize there? David Pérez- You could with some caveats, which is that, you know, we don't support all the, yet all the ways to ingest data. So, so, you know, we don't support Fivetran yet. But a lot of Snowflake users are putting data from Kafka into Snowflake. Okay. And that very same Kafka topic can also send the data out to materialize.

Starting point is 00:49:37 Users may be using dbt models to do the staged pipeline creation in Snowflake. Those very same dbt models that materialize most, many, maybe most of our customers use dbt to orchestrate their pipelines and to sort of manage that life cycle of bringing them back up or making some change and then recreating the downstream ones and testing and all that. Though, sorry to interrupt with dbt. This is like a great story because people have been trained to express their query and their needs in dbt and dbt behind the scenes, you know, does some head

Starting point is 00:50:14 scratching to figure out like, oh yeah, you really don't want me to redo all of that work, do you? They have this incremental mode that they will attempt to go and. I believe they require you to write the incremental version of your SQL query, which is fraught with... It's wrong. It's going to be wrong if you write it. But there's a materialized dbt driver

Starting point is 00:50:34 that will just say, like, you know what, you don't actually need to do anything. Just, we will keep your query running for you, and as your data change, you don't want to redeploy the model, or if the model is still the same. I think another way, if you're familiar with dbt, a different

Starting point is 00:50:49 way to frame materialize is an automatic dbt model incrementalizer. Get a dbt model and make it incremental on a millisecond level basis. If that sounds exciting, then materialize is what you're looking for.

Starting point is 00:51:07 So in many cases, I would say, the goal for sure with Materialize is to let people take their existing business logic for sure and like some of their let's say like the business analog of their practices and transport those into Materialize without much friction. Some of my

Starting point is 00:51:23 happiest moments, sorry this is weird that I get happy with these things, but like folks have shown up with like the world's filthiest SQL query, uh, with zero attention paid to like, how would I make this work fast and materialize and I just plopped it in and it's, it runs and it gives the right answer and they're like, wow, it gives her answers. And that, you you know these are like left joins and sub queries and all sorts of horrible uh sorts of things that you wouldn't have encouraged them to start with they're not in any of our quick start guides you know that you

Starting point is 00:51:53 should do this but it's super pleasant that they're you know step one is not like i refactor all of my business logic to to look different yeah and you know there's some data inertia of course, like, you know, if all of your data are in Snowflake, we don't magically have access to all of that data, but in principle, you know, one of the, one of the challenging things, figuring out how to change how all of your queries worked and how you're hoping to interact with stuff that's, that's meant to be as low friction as possible. Yeah.

Starting point is 00:52:21 Yeah. It's very interesting because you know, like for anyone who has worked in like developer tooling in general, I think it's even more profound when you're talking about like data infrastructure. Like big part of the developer experience is like, okay, yeah, like your technology is great, but like, do you know like how much shit I have to migrate, like to move from one thing to the other? I think like one of the best examples of that is like Python, PySpark and Spark,

Starting point is 00:52:50 right, like there are literally thousands and thousands of lines of Python that is moving data around and it's just not that easy to like, you know, like ask like engineers to go there and like just rewrite everything, ask, like, engineers to go there and, like, just rewrite everything, right? Like, to move to something else. So it is important, in my opinion, like, to try and, you know,

Starting point is 00:53:16 like, close this gap and let, like, the developer have, like, an easier life at the end, like, choosing the right tool and, like, migrating when they have to, right? There's a very uncanny valley here, and this is sort of to Arjun's point about how much of a database you have to

Starting point is 00:53:31 build before, you know, if you've built 95% of SQL, great, there's like 100% chance that a person's query isn't going to work. You might have just told them that up front, that they'll need to more or less start over. There's, you know, there's been many attempts over the years to sort of build these sort of sequel like half of sequel that I think hive is the one that

Starting point is 00:53:55 sort of is the safest one to talk about because there's no one is pretending anymore that hive is, is a sufficient sequel, but there's many of these analogs exist in the streaming world, which is like, well, yeah, we have a SQL as well. And then you sort of look at SQL and it's like, only do the interjoins, only, you know, don't support this, don't support that. But by the time you get down the list of caveats, you're like, well, what is it that you do support? And it's like the four examples that are presented in the examples page. And one of the decisions I feel very good about in retrospect was the sort of dogmatic insistence that we're going to try and do pretty much all of Postgres in the list

Starting point is 00:54:36 of caveats really needed to be absolutely minimal. And we chose the Postgres plug for Postgres because the surface area of SQL is so large. If you are going to implement a standards compatible SQL, you have to make a variety of judgments calls around the way.

Starting point is 00:55:00 It's like, what do you do with null handling in this weird edge case? And it's remarkable to the extent to which Postgres has like very well reasoned and publicly available documented like reasons as to why they do various things.

Starting point is 00:55:15 And so you can go and sort of reimplement that to spec. MySQL doesn't have as much sort of documentation and thorough rigor, even though it has a larger deployment base, which is very useful when trying to reimplement to spec. If you don't mind, let me leap into it.

Starting point is 00:55:35 I'm just going to riff on a thing Arjun just said, because I thought he was going to say something else. No, it's just fine. Which is that where Arjun mentioned there is a bunch of not 100%, not even 99% SQL implementations on top of big data systems or streaming systems. You have the same problem with databases and materialized view implementations. There are a lot of them that are like, oh yeah, we support materialized views. Yeah, absolutely.

Starting point is 00:55:57 And then there's a long list of like, oh yeah, but don't use aggregates other than sum and count. The big one is don't use joins. A lot of them are just like, don't join. A lot of them don't use joins. A lot of them, like, if you're lucky, Oracle and SQL Server are pretty good, and they're like, you can use joins, but you've got to have primary keys,

Starting point is 00:56:12 or you've got to, you know, don't do this other type of join. No self-joins for some reason that I don't understand. But a lot of them basically have the same property, that if you plopped your SQL in there and said, go, it would either say no or it would say yes just a moment my my plan has changed to just like re-evaluation from scratch or something horrible like that it just the fact again that there was you know it was not 100 coverage there

Starting point is 00:56:39 meant that the actual lived experience of trying to use materialized views in existing systems was not one of delight and pleasure so much as would be trapped in landmines. Views and materialization of views is such a complicated story. I don't think that people that haven't worked in big enterprises understand like how big of a problem is. Like it's very good that you mentioned Arjun like Hive because like I'll give an example. Like one of the biggest problems that people have in migrating away from Hive is like how to migrate in an automated way all the views definitions from Hive to whatever system. Because obviously like, yeah, there is SQL out there, but SQL is not one language.

Starting point is 00:57:28 There are like so many different dialects. And if you get also like into the UDFs that people are like building and like how the semantics of like the UDFs might be different, like it's just like such a hard problem where think about like, you can have like an enterprise that has like thousands of people that like run like queries every day, like tens of thousands of queries on this system, like migrating all these views and like all these UDS, like it's just like too much of work, right? Which means that like, there's a lot of opportunity for people to do businesses, which is good, and also like do interesting research probably and come up like with interesting

Starting point is 00:58:05 ways, like solving this problem. But let's go back to Materialize. Question for you, Frank. So the architecture as it is right now, if I understand correctly, like I'll sign up and I'm choosing, I have like to select as a user clusters or so like, what do I choose there? Is it like a completely serverless like experience that I have? Do I have to choose like something like similar to what the warehouses in Snowflake? It's more like warehouses in Snowflake.

Starting point is 00:58:37 Yeah. So you'll log on and you'll be plopped. You'll get your own environment and you're plopped in this default cluster, which is a relatively small thing that you will quickly exhaust if you try to do anything tremendous, but you have the ability to start to create new clusters. Within the clusters, you create replicas. These are the executors, if you will. The cluster is where you aim the queries or aim the views that you'd like to maintain.

Starting point is 00:58:58 And then you provision them with bits of resources and stuff like that. And you can point your... You can either build materialized views or you can build indexes on view. One of them gets sort of sunk back out to S3 and one of them lives in memory in indexed form. You can build these on the clusters of your choice. And you can chain them? You can chain the materialized views up together

Starting point is 00:59:17 and... So you might create cluster prod, create cluster test, create cluster interns, create cluster test, create cluster interns, you know, go and deploy the various things that you'd like to under prod. Don't touch it again. Go around over to test and maybe in test, maybe you're doing some sort of blue-greeny style deployment.

Starting point is 00:59:37 So your test looks a lot like prod, except you're going to do a few different things to it. You just want to make sure that everything still stays up. Maybe you need, learn how you need to size the underlying instance in test rather than in prod. And the interns at the same time are doing like 10 way cross joins on data that they shouldn't have. And the fact that those computers are going to melt and catch on fire is fine.

Starting point is 00:59:56 It's that's, they'll feel bad, but everyone else will not be disrupted. Yeah. Yeah. Okay. That's, that's interesting. So I think it's like a right, like you give this like the right opportunity to talk about tradeoffs. It's because people like choose the cluster sizes based on like the tradeoffs that they have to make considering the workload on one side and the computing

Starting point is 01:00:19 system on the other side, right? So help us like understand the tradeoffs there when we are working with materialize. So is this, sorry, just, help us like understand the trade-offs there when we are working with Materialize. So is this, sorry, just to double check, are these trade-offs within Materialize you're saying or between Materialize and other alternative solutions? No, within Materialize, within Materialize. Like I'm a user of Materialize for the first time, right? Help me like make the right choices.

Starting point is 01:00:42 Like what should I have like in my mind when I... So here's a thing that you could do when you log on to Material. You've got a bunch of money. You could rent the biggest machine that money will buy and just do all of your work there. This is one way that you could make one cluster, a bunch of resources,

Starting point is 01:00:57 and just start building all your stuff in that one place. There's some pros and cons to this. So an obvious con, of course, is that if you do this and your friend is in the same place and they write a crappy query, you would never write a crappy query, of course, but like they write a crappy query,

Starting point is 01:01:11 you're using shared resources and you're going to interfere with each other, potentially take down the instance. If you go and run out of memory or just generally degrade performance. So that's a bummer, but there's a cool thing that you could do. I want to say with materialize and not a lot of other tools, which is build indexes over these streamed, these sort of continually changing bits of data and reuse those within and across different data flows

Starting point is 01:01:40 that you've built, queries and data flows. So in some sense, one of the ways I think about materialize, this is probably wrong, but if you imagine a nice data warehouse system like Snowflake and said, could I just bring my own compute and build my own indexes on it and do my own work there

Starting point is 01:01:54 with preformed indexes that I keep up to date? All of the properties, you might imagine millisecond response times and the ability to run 10 queries over the same relational data at zero incremental memory cost. That's the sort of benefit that you get reusing a cluster.

Starting point is 01:02:10 So if you've got five tasks with the same relational data, they're all going to be looking at joining various things together based on their primary and foreign keys. So you build the smart indexes that you want to build on primary keys, maybe some secondary indexes on foreign keys. Your ability then to deploy increasingly complicated queries that do the sort of predictable joins that you might do between all of these keys is greatly improved. Basically, your additional cost of your next query, you don't have to reflow the entire data set. You don't have to build a whole bunch of private state

Starting point is 01:02:46 in each of your operators. Very handy, and like, if you want to sort of run the leanest thing, something that actually is up and runs and isn't going to fall over, putting it all in one machine so they can share these indexes, share the memory, essentially. That's usually the scarcity source's memory.

Starting point is 01:03:03 Makes sense. But, you might want to start to shave up all of the work that you need to do there into smaller bits and pieces for reasons of, yeah, isolation, performance isolation or fault isolation. Like, here's a classic example, which is you run an org.

Starting point is 01:03:23 There's a bunch of people, a bunch of analysts on your team. Someone is in charge of data ingestion. You pull in fairly raw, gross data from somewhere. It's just JSON. It's not even parsed out into the appropriate types yet. Some of the data are bad. So you've got a big, chunky view materialization task, which is, you know, actually there's a view that I define on this, which cleans up the data for me. Cleaning up the data, take some resources, you write it back out as a materialized view into materialized, and it's now available for all sorts of other people to show up and say, oh, it's amazing. I'd love to just pick that up. Whatever you wrote out, I'm going to pick that up

Starting point is 01:03:58 and work with it. So five people now can essentially put themselves on the end of your pipeline in a way that, you know, if you had some other streaming system, you'd sort of have to copy paste the, that view ahead of time. If it was expensive, you'd have five people doing exactly the same thing. And this is how a lot of organizations work, right? Like there's some set of experts who build the canonical personas. And so like that should be run on clusters that they manage. And that may actually be so critical to the company. You might run that at some high replication factor so that it's

Starting point is 01:04:31 extremely highly available, but then that gets sunk into a materialized view. The canonical personas V V 14 or whatever it is and evolved over time because they've enriched it and added some more columns and things like that. And then there's plenty of downstream consumers. There's the machine learning team. There's the fraud detection team. There's the upsell team or whatever it is. And all these other teams,

Starting point is 01:04:57 they don't want to be rebuilding the canonical personas pipeline. They want to simply consume a always up-to-date canonical set of canonical personas. Henry Suryawirawan, All right. One last question about Materialize and then I want humans to delve deeper into NIOD and like the core technology behind Materialize.

Starting point is 01:05:23 So my question for both of you is what are, let's say, the most... not interesting, but let's say trying to help people to decide based on the use cases that they have, if now is a good time for them to go and give a try. Materialize, right? So what use cases we should have in our mind as a great example for going on materialize right now and figuring out the value of the product as soon as possible? I can probably each give an answer. The one that I see a bunch are people

Starting point is 01:06:04 who have already realized like you know what i would do if i could get this data faster is you know their folks are already chafing against the like takes 15 minutes to refresh my data okay i just can't use this for interactive experiences or something like that or i can't use this for and so this is maybe too easy to even out, but like clearly if you're already sitting on a example use case where if the data were fresh to within the second, you could turn it around to substantially a greater purpose than if you got the rollup at the end of the day. Or I think just, you know, the use cases for sure they exist for the rollups at the end

Starting point is 01:06:41 of the day, but there's just these new classes of applications for folks, you know, interacting like with data apps, basically, where someone has just showed up and said, Hey, I thing, and you want the ability to go and grab the current up-to-date answer and show it to them. You know, if you want to do that without 10,000 lines of microservice cred that you have to build and maintain, there's a good time in my mind to think about, what would that look like in SQL? Could I just write that? And if you can, amazing. And you should just try it out.

Starting point is 01:07:12 Yeah. I was going to say, if you are beginning this journey of building a microservice, I think it was Josh Wills who came up with the, that your microservice should have been a SQL query joke. And yes, you should absolutely start that way.

Starting point is 01:07:30 And not to ding too much on microservices, but I do think 90% of microservices can be SQL queries. I don't think it's 100%. But it also frees up your team to focus on the 10% that truly is so differentiated in capability that requires its own bespoke set of code. focus on the 10% that truly is so differentiated in capability that, that requires its own bespoke set of sort of code. Okay.

Starting point is 01:07:51 Okay. So anyone who is listening to the show, I mean, go and try the cloud offering. If you haven't downloaded and play around, like I would suggest you also do that. For me, it was a very refreshing and like interesting experience, even if I didn't have like a use case in Hubs at that point, but seeing like how you can interact with data and you can materialize like queries, like I think it's very interesting and it can help you like identify use cases that you have, so I'm suggesting, you know, like go and try it in Mother Realize one way or another. Stéphane Leroux- These days too, there's a few fun load generator

Starting point is 01:08:34 sources that come with, that generate various auction data or TPC-H data. Even if like initially one of the big challenges actually was getting people on board with their streaming data. Like they maybe had an idea of what to do. It was like, oh, operationally, how do I do it? Even if like initially one of the big challenges actually was getting people on board with their streaming data. Like they maybe had an idea of what to do. It was like, oh, operationally, how do I do? But there's a bunch of realistic-ish looking data that you could ask. Like, you know, let's see if I can put together a little mini website that I'm actually building over here. But getting to the point of can I prototype up part of that in SQL is a lot easier now than it used to be just,

Starting point is 01:09:05 you know, some pre-canned streaming data to, to play with it. Alessandro Bellofiorelli 100%, 100%. I think we need like at least one episode just chat about that stuff. Like how you get data to try something, how like it's such a huge mess out there, but anyway, I can complain about that like forever. So let's keep it offline. Woo! That was awesome. Of course, as we said in the episode, we're going to break it into two.

Starting point is 01:09:35 So let's close out the first half here. We can at least give that to Brooks for how much pain we've caused him. One of my big takeaways from Frank and Arjun's story is how fundamental the motivation was for each of them. They came at it from different directions. Frank had very strong convictions about what he had studied and the projects that he was working on. He was building it in Rust, and there were problems there, obviously, that Arjun had highlighted, but it was an area of passion for him. And the way that Arjun described it was, I believe that these things can be truly helpful to the world, right? If we build this technology

Starting point is 01:10:19 in a way that makes it accessible to people. And hearing how that shared conviction at a really root level drew them together, you know, and especially drew Frank, you know, sort of to a place where he wanted to be involved in a business where before he didn't. That was just such a compelling process to hear about. And I really appreciated the deep thought and the time it took for them to work through that. And of course, you know, as a result, they're building something really amazing and materialized. Yeah, a hundred percent. Actually, it's very, how to say that, like, it's very exciting to hear their story.

Starting point is 01:11:02 And also, I think it speaks like a very interesting truth that's okay. Some people might say that even like sounds like a little bit romantic, let's say, say that. But it's, it's amazing to hear like from a person who has already like huge impact in the scientific community. I mean, if someone goes and searches for his name and sees just like the certain number of citations on like his academic work, like, okay.

Starting point is 01:11:31 I mean, that would be enough for many people to be like, okay, I am done with contributing to society. Exactly. But hearing from him about like the dialogue that they had with Arjun about if you want like what you're building to have the maximum possible impact out there, like the best way to do that is like so building product and the company and getting this like market out there.

Starting point is 01:12:00 I think that was probably like one of the best things that I would keep from this, from this conversation. And it's also like, I think the foundation for, and the teaser for the next part of our conversation, where we will hear like Frank and Arjun, but like, it's, I think it's even more important to hear that, like from Frank saying how much of a distance someone has to go to take a technology and turn it into something that it's a product and can be used by many different people. So let's keep it here because I don't want to give too many make it like spoilers for the next parts of our conversation. But yeah,

Starting point is 01:12:52 hopefully we leave people on the right deep hunger, like, you know, the best shows out there. Yes. And you're starting to sound like a marketer. I'm worried. Thanks for listening to, thanks for listening to the Data Stack Show.

Starting point is 01:13:08 Definitely check out part two of this one. You don't want to miss it. We dig into a ton of technical details and learn all about timely data flows, SQL dialects, et cetera, and hear from some of the smartest brains in the industry solving these problems. Catch you in the next one.

Starting point is 01:13:22 We hope you enjoyed this episode of the Data Stack Show. Be sure to subscribe on your favorite podcast app solving these problems. Catch you in the next one. datastackshow.com. The show is brought to you by Rutterstack, the CDP for developers. Learn how to build a CDP on your data warehouse at rutterstack.com.

Your Ad Here

The Data Stack Show - 120: Materialize Origins: A Timely Dataflow Story with Arjun Narayan and Frank McSherry

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.