ACM ByteCast - Michael J. Freedman - Episode 69

Starting point is 00:00:00 This is ACM Bytcast, a podcast series from the Association for Computing Machinery, the world's largest educational and scientific computing society. We talk to researchers, practitioners, and innovators who are at the intersection of computing research and practice. They share their experiences, the lessons they've learned, and their own visions for the future of computing. I am your host, Rashmi Mohan. If there is one technology revolution that impacts every one of us today, it is the one around data.

Starting point is 00:00:36 We collect it, we analyze it, and use it to make all of our critical decisions. And the ability to influence how that data is stored, managed and retrieved to meet our ever-burgeoning needs is a great responsibility. Our next guest is no stranger to that. Balancing a demanding research career and teaching at a premier institution while also being a serial entrepreneur requires a special kind of dedication. As a co-founder and CTO of a now unicorn company, Timescale, Michael Friedman lives this journey every day. Michael is a scientist and the Robert E. Kahn Professor of Computer Science at Princeton University. His areas of focus have been distributed

Starting point is 00:01:18 systems, networking, and security. He is the recipient of many prestigious awards, including the ACM Grace Murray Hopper Award, the Presidential Early Career Award for Scientists and Engineers, and more recently, the SIGOPS Mark Weiser Award, amongst many others. Michael, we're so excited to have you on the show. Welcome to ACM BikeCast. Thank you for having me. Of course.

Starting point is 00:01:42 I'd love to lead with a simple question that I ask all my guests. Michael, if you could please introduce yourself and talk about what you currently do and maybe give us an insight into what drew you into this field of work. Sure. As you very kindly said, Mike Friedman, I play two roles today. I continue to be a professor of computer science at Princeton, where I've been on the faculty since 2007. So, time flies, almost 18 years now. And in the last couple of years, I've also been leading a startup timescale, which supercharges Postgres, a popular relational database,

Starting point is 00:02:19 for real-time analytics, time series, and now AI applications. Great. That's a very, very accomplished career, both as a professor, as a researcher, and as an entrepreneur. I know we'll talk a little bit more about that. But what brought you into computer science in the first place, Mike? Is that something that you were introduced to early

Starting point is 00:02:38 on in your schooling days? Yeah, so I came to it somewhat in two stories. This actually, for me, really goes back to high school. When I was in high school or even earlier, we didn't really have much in the way of computer science education like you do today, where they even started in elementary school, where they start programming. But for me, it was really two kind of notable events. One is I was always attracted to science. So I used to do these summer programs and one actually funded by the NSF.

Starting point is 00:03:09 And one of these summer science programs I did in high school was actually at Michigan State University. And when I first showed up there, you get assigned to different project. I was actually working in a bio lab. I thought this would be interesting, check it out. And I remember it was like studying, it's timely now, it was avian diseases.

Starting point is 00:03:29 And so it was like avian flu and avian pox. And I remember after two weeks of that, of what turned into basically pipetting eight hours a day, I was thinking, you know, I probably am not excited by a thing of science where a robot would do something 100 times faster than I ever could. And it didn't seem like really I was pushing my mind. It didn't really feel creative. Some people are excited to that today, certainly. But I didn't find it intellectually stimulating the same way as a creative discipline. And so the previous year I'd taken some, one programming course in high school,

Starting point is 00:04:07 and I actually walked up and down the halls of the computer science department there. And I remember knocking on doors and introducing myself and say, hey, I'm here for the summer. I would love to work in your lab. Now, in hindsight, now that I've been on the other side, I'm like, why did a professor ever take this high school student which showed up on his doorstep?

Starting point is 00:04:28 But on the flip side, I could say, I really like people who put it out there, who seem to bring a certain level of, I'm just going to get this done. One of the professors there was very kind and actually had me work in his lab. I spent all summer doing what turned out to be visualizations for high-performance computing and it really turned me on both to computer science as well as to what the research discipline. Then it turned out throughout high school I

Starting point is 00:04:58 actually further developed my programming and my interest by actually programming online games. So these large multiplayer games that this was still in the dial-up, BBS area of the world, you know, before you would know we had high speed internet at home. But I used to build some of those online role-playing games and program them. I think that was like a great experience both for programming as well as building something that people use, the creative aspect. And I always viewed computer science for programming as well as building something that people use, the creative aspect.

Starting point is 00:05:26 And I always viewed computer science and programming as a very creative thing and almost a product building thing. And I think that was a great start to that, to my career. That's an excellent story. And Amir, to your point about why would a professor take a chance on you, I think it also says a lot about a high schooler who would actually go knocking on doors asking for an opportunity. I think that also requires courage and I think in many ways that's a very significant factor

Starting point is 00:05:54 of people having the confidence that you could actually do the job that you said you would. So that's actually a great story and a great lesson for many of us who kind of hesitate to take that first step and say, you know, I don't know enough about this field, but maybe I'm still going to ask. I think it's an old word, but gumption is a very valuable skill. Right, absolutely. So from there on, you took on computer science. I mean, and also, I think the point you brought up about creativity, I think that's significant as well, right? The ability to actually build something that people use, get feedback is actually a very exciting way to sort of enter your career.

Starting point is 00:06:31 So that took you into college, and how did you get into the specific area of research, which I understand is mostly distributed systems. Yeah, so I went to, as an undergraduate, I went to MIT, obviously very well known in computer science, a lot of amazing researchers and professors, and they have a large program called Europe for undergraduate research, which was really something I took advantage of. And so I actually didn't realize in the middle, I think going through college, that I ultimately was going to get into research.

Starting point is 00:07:01 I thought, you know, probably I'd go work, I'd enter industry, go work for a company. And I mean, it's different than it was today. You know, I graduated college in 2001. Big tech wasn't actually yet a thing. When I entered college, we hadn't the first dot-com movement yet. So, you know, tech was still small back in the day. So you thought you'd go to a company and whatnot.

Starting point is 00:07:27 But I think I would really credit MIT had this fifth year program where you could stay and do a master's. And that's when I first started working in a research group with two great professors, Robert Morrison and Franz Koschuk, who themselves had had many successful students, have influenced a large number of I think systems community But it really turned me on to research as well as another project. I did with a student on

Starting point is 00:07:52 Privacy and anonymity and I'm happy to talk about that but you know early on I actually was very much I worked a lot in the being in my career in security and privacy and even some crypto and Kind of as the evolution over the years, I still was always interested in it, but in many cases I found that part of the goals in making something more secure, yes, you kind of need your systems be secure,

Starting point is 00:08:17 people won't use online things, but they were enabling in the sense that you could now have applications that maybe you'd be unwilling to have if they were not secure. But typically increasing the, that's called security of a system would often lead to friction and frustration for users. And so over time, I got more excited by things which let's call it were enabling new type of features as opposed to how do we harden the existing features that we had. So even though I started more in security and privacy, I kind of transitioned to much more building systems, building distributed systems, and so forth. Got it. So going maybe a little bit more into privacy and security, were there specific,

Starting point is 00:09:06 how did that interest come about? Were there specific applications that had the need for privacy and security that you were trying to solve that problem for? Yeah, so at the time we actually were, this was the time where all these peer-to-peer systems started getting popular, starting with file sharing like Napstunetela, and there was a lot of interesting academic work, this whole area called distributed hash tables. A good part of my PhD work was actually in thinking about how to apply these peer-to-peer systems

Starting point is 00:09:34 to content delivery networks or CDNs. But at the time, what we were trying to do is this was pre-cloud. We didn't have this magical infrastructure that either we could rent or gets paid for by a business model of online advertising. And so people were starting to think of like, well, if we want these internet scale systems, if you think of the first internet, it was this inherently decentralized system where we were peering together random networks on

Starting point is 00:10:01 the network. I'm talking about, of course, the IP networks. And people started saying, how could we actually build applications that spread across these networks in a similar way? And of course, if you were then connecting any two hosts directly to one another, you would worry about, you don't know who the other entity is. You don't know if you should trust them. And so the question is, if you're interacting with a whole bunch of computers on the internet that

Starting point is 00:10:28 you don't trust, how do you prevent a single or set of these nodes who might be malicious from either getting your data, identifying what you're doing and so forth. So actually, the initial project that I worked with at MIT was actually with a number of other students and we were building a form of what became known as onion routing or multi-hop network anonymity. The interesting thing is that same group the other student who kind of led the effort there went on to ultimately build Tor, which is actually one widely available open source system today. And so it was really interesting to see the start

Starting point is 00:11:10 of what became this online anonymity system. And that same person, I kind of got an early view of some of the things we built, that person went on to later build Tor, which Roger Dingle, that was his name, which I thought was pretty exciting. That's very insightful, Mike. The other question is, and I know you were talking about content delivery networks,

Starting point is 00:11:30 and I know that one of your first ventures or ideas was called Coral CDN. Would you help us maybe understand what problem you were trying to solve with this idea? And maybe what was adoption like, just curious about what that journey was. Sure. Quirrel CDN was this concept that there was a lot of interesting content on the internet and people didn't have a good way to download it.

Starting point is 00:11:56 Today, that seems ridiculous. We have YouTube, we have other sites, we could stick something in an Amazon S3 bucket. We have any number of free services that make it cloud flare, that make it infinitely easy to deliver content. That wasn't the case back in 2001. Your real CDNs had only started appearing. Probably the most known one then was a company still around today called Akamai,

Starting point is 00:12:22 which built expensive infrastructure to deliver content for corporations. And so the observation in the middle of this peer-to-peer interest was that we had all this spare capacity by end hosts. How can we actually leverage these end hosts to kind of self organize themselves into a way to deliver content. And the idea here is that you could take any piece of content on the web, add a suffix to the end of the URL. And the idea was that then if that suffix is added at the end of the URL, or the end of the host name of the URL, requests will get routed through the DNS system I ended up building and deploying and they will redirect to this network of

Starting point is 00:13:10 web proxies that we have running around the internet. The web proxies themselves might have an efficient way to discover if the data is stored in any other web proxies cache and if not, download it directly from the end host, cache it, make it available to others, and return it to clients. At the time, there were popular websites called Slashdot or Slashdot.com, a lot of it's somewhat equivalent to, let's say, Hacker News Today,

Starting point is 00:13:36 where that's where a lot of developers and engineer would read. And it was pretty common that once a piece of content was posted to Slashdot, the original website that was hosting the content would go down. This is just something you don't see today. But the idea was that if anybody could make content available through this network, then we would stop running into the problem that an end host would have to provision it for

Starting point is 00:13:59 peak load because we'd have all this spare capacity. I built this as part of my PhD. I ended up deploying it on a research network, which spanned about a thousand kind of computers across hundreds of universities worldwide. And this actually ran for ten years. It wasn't needed in the end, I kind of just kept it running because I didn't really want to take it down. But especially in the first couple years, it was actually getting millions of users transparently using it per day and serving a lot of really interesting stuff.

Starting point is 00:14:28 Again, pre-YouTube, it was serving content for national disasters, serving scientific content and serving normal everyday websites. So it was actually a really interesting experience both applying research algorithms to build these scalable systems, facing the problem of there's the algorithmic aspect of systems but then the practical. How do you make this work with the mess that is the legacy internet and legacy web? And then also thinking about how to make a good user experience around that. And so I thought that experience of what it actually means to build an end to end

Starting point is 00:15:05 product turned out to be very useful in thinking about my own research going forward and later my own work in kind of the commercial sector and startups going forward as well. Yeah, no, that's that's extremely insightful as well, Mike, because one is, I'm guessing was this primarily built on open source technologies? Was this monetized in any way at all? It was not monetized in any way. It was built from a combination of open source, but also somewhat from scratch. I think today we would probably build it with more open source. Frankly, when I was doing this, when I started working on this, which was 2002,

Starting point is 00:15:44 the set of open source technologies 2002, the set of open source technologies, particularly the set of open source software that was extensible, was much more limited. And so a lot of it was somewhat built more from the ground up. Got it. Did you have other collaborators who would help build this? I'm just trying to sort of going back to your point about the learnings that you had from this in terms of product development and

Starting point is 00:16:06 deployment of this large-scale system, both in terms of how do you grow and kind of continue to build new features as you see them necessary? And two is how do you get feedback from customers? Like how do you know who's using it? What would they, what do they like? What do they not like? What was that process like? Yeah, so I certainly had some assistance. In fact, my own PhD advisor, David Maziaris, actually ended up building some of the code as well, which now in hindsight is pretty amazing, thinking of it from the other side. But it was primarily, I kind of led this effort for many years.

Starting point is 00:16:41 In terms of the deployment, I actually benefited from, there was actually a research network that was being deployed called Planet Lab, actually based out of Princeton, although of course at the time I wasn't there, that was creating this coordination between hundreds of universities to allow people to deploy software across those universities. So I certainly benefited from that as a deployment network. I didn't actually, there was no monetization here, but there was user feedback in the sense that you could think of this today as almost like an open source network or really today you would think of this almost like a free SaaS or free service tool.

Starting point is 00:17:20 And that I kind of do the same thing that you might see in a startup. I actually built a community. I had mailing lists. I actually had external people who joined it, asks questions, asked for support. Today, I think I would think that would be in Slack or discord. Back then we had mailing lists, which did that. I got feedback from them. I had built some centralized monitoring.

Starting point is 00:17:41 So I kind of observed both in terms of end-to-end performance. You know, you kind of see what seems to be working, where you see latency, it helps you improve the network. You also get the experience of running an operational network. I had to interact with federal law enforcement and Interpol a few times. You know, if you have an open network, that of course can be used for a variety of things. And so you get the experience of what it also means to operate a real system, which I think was certainly an interesting experience as a graduate student. So I think that was very useful. And I think that there was value. If I had to take something away from this, this was absolutely not a way to maximize publications. You know, if you think about that, this was, let's say,

Starting point is 00:18:27 I ultimately ran it for 10 years, but I really only had two papers on Quorl, one kind of within the first six months of building it or a year within building it, and another one six years later as like a retrospective. But I do think that even though it's not a way to maximize, let's say, publications, I would look at it in two ways. One, I think the broader research community in the end values impact more than like accounting. And so the observation of what we were able to

Starting point is 00:19:02 build, what we were able to learn, especially in this area of peer-to-peer systems, which saw so many publications, but so few real systems built and used, I think was very valuable. And two, if I have to look forward, I think it actually affected my own career in thinking about what a product feels like from the user's perspective, how you actually help build and manage a community, and how little things which might not be important from a research perspective actually are extremely important from a user experience or developer experience perspective. And that could be one of the major reasons why a product is built. For example, with Coral CDN, probably the biggest thing it did was you could

Starting point is 00:19:53 take any URL, the domain it added was.nyud.net. And so if I take any domain and I add that, you can now send traffic through Coral. You didn't need to register it. You need to hit an API. You just did it. And that was a really powerful, frictionless way for people to show adopted. So we started seeing web servers adopted. Programmatically, we started seeing browser plugins adopted. We started seeing client-side software adopted because

Starting point is 00:20:25 the interface was so simple and frictionless. And I think that really a little bit is a foreshadowing to how people thought about building super frictionless products and onboarding even years later. Yeah, no, I think there's so many lessons in what you just described, Mike. I mean, one, the ability to connect with your end users, to be able to listen to what they're saying in terms of adoption, to simplify the adoption of the product itself, like what you were just describing, make it easy enough that more people will adopt it and use it for the ultimate purpose

Starting point is 00:21:00 that you're building it for. But I would also like to touch upon one aspect that you brought about, for. But I would also like to touch upon one aspect that you brought about, which is the idea was not to optimize for publications. As someone who is a researcher and somebody who's building, and at that time I think you said you were in graduate school, you're building this product just because you've seen a need for it and you see, you know, customers or end users finding value in it. How do you balance where you spend your time or where you kind of, you know, dig deeper?

Starting point is 00:21:29 Yeah, I never set out to have a five-year plan or 10-year plan where you were saying, you know, in order to achieve this next thing, I'm going to do X, Y, and Z. You know, I didn't know going into undergraduate that I ultimately go to grad school. I didn't go into grad school saying I want to be a professor. And so I'll say two things. One is that I worked on problems I found interesting. And two, I've always been very broad in what excites me. And you know, there are some people who work on a problem,

Starting point is 00:22:06 work deeply on the problem for many years, sometimes their whole career. That was never really me in that. Even in graduate school I mentioned I obviously spent quite a bit of time on these peer-to-peer systems and this work. I also collaborated with a lot of other people on work and security and privacy and crypto and other type of systems work. I generally liked to collaborate. I always think it's good to pay it forward in the sense that invite people into your work and simultaneously you often find that they will invite you into their work. And I think that's a good thing.

Starting point is 00:22:47 And in the end, kind of work on, work on stuff that you, you find really exciting. And I know that's not a definitive instructions, but I find my own career in general, I just look for exciting things that feel like the next challenge. And then that really has also been my path to entrepreneurship. Like what is the next challenge as opposed to kind of somewhat staying more static and just working on the same thing kind of years and years and years. Yeah, no, I think that's a great philosophy and you know, thank you for that insight because I think it's important whether you're in academia or an industry, finding interesting problems to solve for tends to lead to the best opportunities.

Starting point is 00:23:25 I think it's hard to kind of plan, like you said, and saying, I will do these three things and this is now ultimately going to give me this particular opportunity or this particular idea that I want to kind of build on. ACM Bytecast is available on Apple Podcasts, Google Podcasts, Podbean, Spotify, Stitcher and TuneIn. If you're enjoying this episode, please subscribe and leave us a review on your favorite platform. So what then led you to where you are today, Mike, in terms of building time scale? I mean, one, I'd love to understand,

Starting point is 00:24:08 at least at a very high level, what is the problem that time scale is solving, and maybe go into a little bit more detail in terms of like, even for our audiences who may not be aware, is what is different about a time series database in comparison to like a relational database? What kind of applications does it serve? And how did you kind of, you know, hit upon this idea?

Starting point is 00:24:30 Sure. Let me first describe what TimeScale is, then a little bit of the journey to why we built it. TimeScale is a database for one of the fastest database for real time analytics or time series data as described. We are actually built on Postgres. So a full transactional relational database that allows you to combine both your, let's call it time series data, metrics, events, observations with business data, metadata, whatever, query it, have the full richness of what you might expect

Starting point is 00:25:06 from a relational database, and yet bring the scale and performance that you need for these particular settings. What's interesting, I'll talk for a minute how we got there, but this kind of touches a little bit about, I think the problems that people often face when building real products versus maybe the common problem that we think about in academia. I think in academia, if you look at a lot of research, it's working on what it deems

Starting point is 00:25:37 as the most cutting edge problems, which if you translate it, often are the type of problems that accompany like, let's say, Google or Amazon or Facebook might face extreme scale. And the flip side is if you think about what problems are facing most companies and most developers, I call that building for the 99%, which is they are trying to balance a whole bunch of different stuff. They obviously want their infrastructure to be fast and scalable and reliable,

Starting point is 00:26:07 but they also want operational simplicity. They want to really move quickly as a developer, as a product team. They don't want to spend, Google could spend years building custom infrastructure, in fact, custom hardware to solve their scale problems. That's not what 99.9% of companies are trying to do out there.

Starting point is 00:26:26 And so when adopting the type of technology you do, you wanna say not only does it satisfy the great speed I need, but is this a type of software that my organization can use? Does it have all the integrations I need? Does my whole team need to learn something from scratch or do they already all know how to use it and I think that was really the thing that was unique about what we did a time scale is we're able to bring these amazing

Starting point is 00:26:53 capabilities and yet operate almost within a constraint an engineering constraint of How do we bring this to Postgres and if we could bring up a postgres? We know it's like the most popular database across all developers today. It's used by almost every company. It has this massive ecosystem, massive integrations. And basically, by both innovating within the database and then for our managed cloud product around the database,

Starting point is 00:27:20 we were able to take this thing which already all these companies and developers know and trust and now Bring it to new settings which people previously hadn't thought was possible now how we started there was we actually and this goes back to 2015 2016 This is just post tenure for me I was thinking about a new challenge and kind of I reconnected from an old friend. We actually go back to the first week of college if you could believe that and

Starting point is 00:27:50 we were starting talking about some of the challenges that we were broadly seeing both technologically and in the industry and at the time there was a lot of new excitement around IOT or Internet of Things and Kind of an observation is if you've looked at the first couple decades of technology, this what some people might call the digitization of the back office, you know, it was the IT side of the house,

Starting point is 00:28:14 all the paperwork now went online into the cloud. And I think with IoT, what we started thinking about was, now we have the digitization of everything else, all the physical infrastructure, homes, buildings, power lines, cars, everything else was gonna be connected. And so the question we asked was, now that we had all of these new types of things connected and we know they're gonna be generating data,

Starting point is 00:28:39 they're probably gonna be generating huge streams of data. Think about sensor data continuously streaming or even events. Think about your electronic vehicle or in fact even an online game. Every time something happens it's a discrete event that it's wanting to register. What is the type of data infrastructure that we want to support that? So we actually started looking at it here, it wasn't thinking as a research problem board, it was very pragmatic about how do we build a platform to do this. We tried to use a number of existing open source technologies. Again, we were trying to be very pragmatic. And we found that none of them really fit our needs precisely because of this reason. Because we had this

Starting point is 00:29:23 separation between what was deemed, here's a relational database versus this is something built for analytics. And you had to store your data separately, you had to then connect this data in application space rather than direct them to the database, it was slow. Most of those things built for analytics weren't really as reliable as we needed. They weren't really as flexible. They didn't often support a lot of SQL things that we wanted. And so we were unhappy with that software infrastructure.

Starting point is 00:29:49 And so that's when feeling our own pain, which I think a lot of the best innovation comes from, that is when we ultimately started thinking about how might we change Postgres to allow us to bring this data into one operational database, get the value of both, and then make that kind of handle the scale and performance you ultimately need. Got it. No, so thank you. Thank you for that background. So if I were to resummarize, and because that was one of my questions as well, is building

Starting point is 00:30:18 on Postgres was a very conscious choice, both from the perspective of, hey, this has already got significant adoption. And so for somebody to be able to grow from there, to have familiarity with how Postgres works would actually be a much easier sort of transition. And also the fact that the audiences that you were trying to target were different than the super large organizations or I would say the customers that you were trying to look at were probably the ones that weren't being served by the larger products that were out there. So did you find that the customer base that you started to sort of target, was it a prerequisite

Starting point is 00:30:59 for them to kind of use Postgres or did you find other folks also interested in sort of adopting? Or what does that transition look like? What if somebody weren't using Postgres to begin with? Sure. So we see both types of customers, those who come to us from Postgres or those who come to us from a different database or storage technology. Most of the time, the reason people adopt a new technology is because

Starting point is 00:31:25 they have a pain point. And for us, what that typically is, is that the current system that people are using isn't scaling with their needs. It can't handle the capacity, can't handle the performance, or it's not cost effective, given the amount of data that they want to use it for, you know, many terabytes in your core database. And so that is the case. Just the broad adoption of Postgres means that a lot of people who come to us either are directly coming from Postgres or certainly know about it.

Starting point is 00:31:54 But we certainly have people come to us from a variety of other sources, come to us from MySQL or Mongo or SQL Server, or they might have even today be using the wrong technology they might have started on snowflake or data bricks and said. That is good for a data warehouse or data lake but it's actually not meant for when i'm trying to. Build an application where i want this to be customer facing and operational so they bring that portion of their workload to us. And of course, we also have those customers who are just starting from scratch, which we think of as this greenfield. But no, I mean, I think when you think about deploying technology commercially,

Starting point is 00:32:35 you obviously need to solve the biggest pain points they have, especially if they're bringing a brownfield application that is an application that exists, they must have an existing problem, they're running from some pain they have, and you're able to solve that also without kind of massive disruption or massive change for what they need to do. Got it. And that makes a lot of sense, Mike. So in this journey, you know, I think you've been, you said you've been working on this idea for like seven or eight years now. What were some sort of key pivoting moments in terms of when you were building this company?

Starting point is 00:33:11 Right? Did you start out with the basic premise and the idea that is still sort of the cornerstone of your business or were there transitions in the market and the industry in the needs of customers that had to have you rethink your strategy? Yeah, so I think there was probably two main changes or pivots. One, as I talked before, is we were starting to look into the IoT space in 2015-2016. In hindsight, building an IoT platform is actually not a good idea, particularly for a startup. I'm happy to share why. But the main reason is these things are deployed in very heterogeneous environments.

Starting point is 00:33:49 And so the needs across different types of IoT deployments, whether industrial, whether energy, whether decentralized or centralized, they're all there whether commercial or home, they're all very different. And so most of the time, you don't have a end-to-end platform for these. You end up having more what I'd call solutions where developers are picking and choosing

Starting point is 00:34:14 from a ray of different technologies that end up working for them. We had some early adoption. I think we had about 100,000 devices, customers sending the platform we built. But we weren't getting that pull from the market that I think describes when you really have product market fit. The interesting thing is after we ended up building the database to serve our own needs, we started talking to users and they started saying,

Starting point is 00:34:37 this platform is fine, but tell me more about this database because that really sounds interesting and kind of solves a problem I have. So we kind of got this natural pull and that was probably one of the first pivots is when you said, let's actually change the problem we're trying to solve. Let's not try to solve this IoT platform, but let's actually become a database company. And for there, sometimes you listen to your users and you know, we kind of felt this pull from our users for a much narrower thing, at least a narrower thing on its face. In hindsight, databases are incredibly horizontally applicable. And so now we see users and customers from a much wider variety than just IoT. We

Starting point is 00:35:20 see them certainly in these areas, industrial, manufacturing, energy, we see logistics, we see finance, we see fintech, we see SaaS, we see dev tools, we see gaming, media and tech. So you see across many different industries. So the first thing we of course did was move from a platform to a database, which we decided to open source. I'm happy to talk about that if folks are interested. And so we spent two years know two years let's say building the core tech but we at the same time did believe that ultimately the the way that most companies, I mean many companies especially forward-looking ones are going to adopt software is actually a services as fully managed cloud services. This is not super

Starting point is 00:36:04 insightful I mean certainly lots of people were building managed services at actually as services, as fully managed cloud services. This is not super insightful. I mean, certainly lots of people were building managed services at the time. But I think for a startup, what was interesting, particularly database, is if you look at most data infrastructure, it first starts by, especially open source, it first starts by trying to sell and trying to monetize its open source offering, which typically means building an enterprise version

Starting point is 00:36:26 of its software that you deploy on-premise to customers, whether or not they're private data centers or their own accounts in the public cloud. I think what we said is we first started open source, but we weren't going to actually monetize that. And we were going to instead build our fully managed cloud and that's the direction of monetizing it. And it is interesting from a company

Starting point is 00:36:49 and engineering perspective, because you end up having to build a very different practice and team if your goal is to really build a mature operational platform than to ship software. And it also leads to a very different experience when kind of industry called this product-led growth, you build your system very self-service.

Starting point is 00:37:15 You do it so somebody come to your website, they click a button, and within a minute they're in a free trial using your database, as opposed to a more traditional enterprise motion where you might spend weeks or months talking to a customer before they ever start using it. And of course, that really means from a product perspective, you have to really nail the zero to hero.

Starting point is 00:37:37 How do you get your product to be amazing? How do people see value as soon as possible? As opposed to needing to explain all these things about how does X compare to Y, why is it better, this whole process of adopting something. So you really need to focus again on the developer experience, which you have to do less of when you think of, let's call it traditional on-premise software.

Starting point is 00:38:01 Yeah, no, thank you for sharing that. I mean, first of all, I mean, you know, listen to your users, like a novel concept in and of itself. But also, I think the point that you bring up about product-led growth, I mean, make the product simple enough, easy enough to adopt, to gain value from in a very short period of time, and it almost sells itself, which is phenomenal. I would love to understand from you, Mike, in terms of technical challenges, right? What are the biggest technical challenges

Starting point is 00:38:29 that you're trying to solve for today? Either those are challenges that your customers are surfacing to you, or just as a part of your own sort of, you know, research and growth as a company that you're trying to address. Sure. So let me start by just talking about

Starting point is 00:38:47 some of the technical challenges that we had to kind of address and think about them. I mentioned before that one of the interesting things is engineering with constraints. And it's actually interesting now, we're talking about this, because especially in the AI era, we're looking at what's happening versus people throwing a lot of hardware or getting really efficient.

Starting point is 00:39:11 This is obviously something even last month with let's say the DeepSeq of you could say, what do you do when you engineer around constraints and how does that affect what you build? So we always decided that one of the things that we were not going to do is we were not going to, we were going to build in Postgres and we were not going to fork it. That is, this is not a constraint that as academics you care about. Like in academia you'll say, well, I'll write a kernel patch. I'll like modify the core code to ingest my thing. Of course, anytime you built, if you see any academic project that like wrote a kernel patch, you know two years later the thing is unusable, right? Because it's very hard to maintain that fork and keep it up to date. Now, 15 years ago Postgres introduced the notion, similar to I guess a kernel patch, it introduced the notion of an extension framework. So that means that even at this point that this massive piece of code written

Starting point is 00:40:06 in C started over 30 years ago, it has all these hooks in its core that third party extensions could get installed in. So we're still writing in low level C. We run in the same memory space and the same process, but we get installed in. And we basically at these hooks take over part of the planning process, take over execution, maybe have our own storage layer. And that is kind of what we decided we say we're going to stick to this constraint and we're going to optimize with it. And in fact, we started building on Postgres. So we're going to Postgres 9.2, I think, and we now support Postgres 17, which is this year's release. So we've managed to keep that through throughout the last seven or eight years. But what's interesting is, as I said, we want this analytical database.

Starting point is 00:40:52 So if you think about all the things that people have done for analytics, let's say a very common thing is you want to bring columnar storage. Well, Postgres is traditionally a row storage engine. You want to be query vectorization. Well, Postgres is traditionally a row storage engine. You want to be query vectorization. Well, Postgres doesn't actually support that partially because it doesn't make as much sense in row stores. You want to handle all these incremental materializations and be consistent with transactional guarantees.

Starting point is 00:41:21 Well, Postgres doesn't support many of those things. On the flip side, you could again throw everything out, start from scratch. You know, there's pretty common ways. Take a column, store every column as its own file, build a new query engine. Well, part of the reasons is you don't actually guarantee anything transactional. You don't have the same atomicity guarantees you get in a transactional database. So you throw away a lot of things in trade-offs and obviously you lose, of course, the whole Pusk Race ecosystem. But what I think was really interesting, and obviously I can go into little details, but what was really interesting is how we were able to, through careful

Starting point is 00:41:56 engineering and kind of engineering around constraints, basically bring a columnar engine into this core, where now it's as competitive with many of the other columnar engines that were built from scratch, bring a lot of query optimization to it and bring query vectorization to the database. And so I think every so often we've identified some of the problems like, you know, storage isn't efficient, these type of queries aren't fast, and you have a great team, you have some constraints, you have a clear goal and you know, often you could engineer a way to ultimately solve those problems. Understood. Are a lot of those insights just things that you're sort of monitoring for and observing, Mike,

Starting point is 00:42:36 or is it direct feedback that comes from your customer as a combination of both? I think it's all of the above. You see both where the market is moving, you ultimately understand some of your user problems and you over time of course build a strong intuition backed by lots of data points about what they need. You also speak to a lot of people. Early on again, I talked earlier about building community when we were just open source and we're not monetizing anything, set up a public Slack group. At this point, we have 10,000 people in our Slack group and you hear feedback even in the open source. And so you take that, you hear directly from users.

Starting point is 00:43:15 One of the interesting things about, again, running a managed cloud as opposed to just software that you sell on premise is you actually have access to an amazing amount of usage data. And so from that you could also see what problems people run into. And you know, taking all these data points together as well as where you see the market moving and who your real customer profile is, what problems you're trying to solve for them, kind of use that to triangulate where to go. If anything, I think the problem is always too much data, not enough, and too much feedback, not enough. And so a big part of that is basically understanding where you're going and making those decisions and focusing

Starting point is 00:43:53 as opposed to also trying to do everything. Yeah, no, that's excellent advice. I do have to ask, Mike, you know, your role is today a CTO. Could you help us understand what does that entail and how is that different from, say, head of engineering within an organization? Sure. Well, if you asked me what a CTO is,

Starting point is 00:44:14 I'd say look at five companies, you know, probably say eight different things. And I think there's two actually important roles that maybe I hold is CTO and co-founder. And I think the co-founder is probably the more important one or in the fact that it affects what I do more in the sense that like one of the unique things that co-founders have is in an organization is the ability to make change and get people focused on what you think are the most important problems. Conversely, the biggest risk that a co-founder has in the organization is you carry a very loud voice, and so you have the ability to make changes,

Starting point is 00:44:58 and so you have to think carefully about how you use that voice. In terms of Head of Engineering, I think of, we have obviously an amazing VP of Engineering, we have a great team. They're the ones who are responsible especially for delivery. You know, so we have a time scale as a product team that thinks both about the user as well as the commercial side and what you call go-to-market. Really think about the outcomes of features, not did we ship the features, but did the features solve the problems they sought to say? Is it being adopted?

Starting point is 00:45:29 Why are we not? How do we grow that? How does it help grow our business? And engineering are building both the short, medium, and long term of where you want to go and focusing on delivering that. So that's what I think why we have a VP of Engineering who leads the team, focuses on the people process

Starting point is 00:45:48 and product and delivering that product. Kind of me as the CTM co-founder can work both on some of the, we think, biggest problems of the day, think about that, as well as think more long-term about kind of where we're going, where we see the industry going, have, you know, thinking a lot more about in last year to, you know, what the next generation of cloud infrastructure look like, what that looks like in the AI era and what is the role of the database

Starting point is 00:46:19 in the AI era and so forth. Yeah, that's great. Thank you for sharing that, Mike. But I would love to know from you, for our final bite, what is it that you're most excited about in this space over the next, say, five years? So I'm increasingly, this is probably no surprise, I am increasingly excited about how Gen. AI, I think we're in this Cambrian time of an explosion of possibilities from Gen.ai. I think that is exciting as a researcher and engineer.

Starting point is 00:46:53 There is so much opportunity. I think it is also one of those times where we don't yet know what the future is going to exactly look at because so many things are being tried, so many new technologies are adopted, they're being thrown at the wall, they're dying. And so I think actually navigating this space is actually a challenging part because now taking the perspective of when you're building a database, the great thing about data infrastructure is it powers these types of applications that's on top of it. I have no doubt that databases will be around in 20 years, and I have no doubt that data

Starting point is 00:47:36 infrastructure will power AI. But exactly how it does, I think, will depend a lot on, on let's say how models are used for inference, how they evolve over the next couple years. And so in thinking about how you want to plan what your own product looks like, I think benefits a lot from understanding that you know they say you want to skate where the puck is growing not where it's been, benefit now what where you think this what the infrastructure needs will be in the in the years to come as we continue to build more and more exciting JNI things. That's fantastic advice. Thank you so much for an amazing conversation and for taking the time to speak with us at ACM ByteCast.

Starting point is 00:48:18 Thank you for having me. ACM ByteCast is a production of the Association for Computing Machinery's Practitioners Board. To learn more about ACM and its activities, visit acm.org. For more information about this and other episodes, please visit our website at learning.acm.org slash Bytecast. at learning.acm.org slash bitecast. That's learning.acm.org slash b-y-t-e-c-a-s-t.

ACM ByteCast - Michael J. Freedman - Episode 69

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.