Software Misadventures - Nathan Marz - On changing the economics of building large-scale software with Rama - #23

Starting point is 00:00:00 Welcome to the Software Misadventures podcast. We are your hosts, Ronak and Gwan. As engineers, we are interested in not just the technologies, but the people and the stories behind them. So on this show, we try to scratch our own edge by sitting down with engineers, founders, and investors to chat about their path, lessons they have learned, and of course,

Starting point is 00:00:25 the misadventures along the way. All right, Nathan, welcome to the show. Super excited to have you with us. Great to be here. So Nathan, we thought we would start with asking you about being a pilot. Can you tell us more about how you got into flying? So I think it was back in 2013, I had a friend who was taking flying lessons, and that just really piqued my interest skill to be a pilot and then just adventure like it just feels really adventurous to just be cruising at 3500 feet and you can kind of just

Starting point is 00:01:10 go anywhere you want when you were taking lessons was there any oh shit moment that kind of made you like question everything i was trying to learn how to paraglide and then i think after like three classes i was like oh shit the tolerance for failure is narrower than when I looked at it from the ground. Was there any close calls or anything like that? I wouldn't say close call. I had one moment, which was a little spooky. This was my second time I ever soloed. So I was taking lessons in Palo Alto airport. I decided on this flight to fly like to the coast. And there's this like tiny little island with this like one building on it that's like 500 feet off the coast and i i just wanted to fly there and just go down

Starting point is 00:01:51 to a thousand feet and circle it and then fly back that was actually on the way back i was flying north along the coast usually pilots fly at some multiple of 500 feet that's like the altitude you would keep but i figured i'll fly at not 3500 i'll fly at 3700 feet because why fly at the same altitude everyone else is flying and so it's really hard to see it's actually pretty hard to see other planes from the air like they can blend into the sky or the clouds and they're just really small you get that's one of the skills you develop as a pilot you get better at like just seeing other planes so that you can avoid the traffic but anyway i was flying at at 3 700 feet and then all of a sudden in front of me about 200 feet down from me there was another plane

Starting point is 00:02:34 and we we literally were completely aligned like if i was flying at like at 3 500 feet we were like we would like hit each other like nose to, presumably avoid each other. So that was a little bit spooky. Collisions in flying are incredibly rare, especially now. I think in 2020, there's a new regulation that all planes need to have a new sensor in them. It's called ADS-B, so you'll actually get notifications

Starting point is 00:03:01 if you have planes near you, as long as you're near a control tower, which most places are, unless you're really out in the boondocks. But that was a weird moment. But it wasn't really a close call exactly, but it was a little spooky. It certainly didn't stop me from continuing to fly. So you mentioned something called a spin training in one of your blog posts. What does that imply? Well, spin training, okay, so you don't have called a spin training in one of your blog posts. What is that in flying?

Starting point is 00:03:27 Well, spin training. Okay, so you don't have to do spin training as a pilot, but it is a situation you could find yourself in. So I did think it was important to at least experience it and know how to get out of it. So a spin in a plane, it basically it's more of a tumble so it's basically when the plane stops flying and it's just basically falling like a rock and it's tumbling like end over end and so it's actually pretty hard to get a plane into a spin especially like the Cessna 172 that I was flying so you like really have to put a lot of effort to do it it's pretty easy to get out of one as long as you know what to do so you know you know, I asked my instructor to show it to me. And, you know, beforehand, I prepared myself.

Starting point is 00:04:10 I looked at like lots of videos on YouTube of people doing spins, but wow, nothing prepares you for the real thing. So technically, it's a stall with rotation. A stall basically means the wings are no longer flying. And then rotation means you're just rotating every which way, including inverting. And so you lose about, I think, 600 feet every couple seconds or something like that. So I think we started the spin at 4,000 feet maybe. And then by the time we got out of it, we were probably at like 2,000 feet. But yeah, you pull about.

Starting point is 00:04:45 It just feels weird. It's like nothing else you've ever felt one of the expressions i heard about it that i heard someone describe you guys is as a roller coaster without rails i think that's a really but yeah i mean and then and then once you're when you're getting up so when you exit the spin when you do the procedure you're actually just you're actually just vertically flying straight towards the ground so then you have to pull up slowly to get out of the dive and you pull about two g's so that means you feel double your body weight as you pull out so it's pretty intense i think i've done i think on that lesson we did four spins and then you know i didn't throw up which is good although i do i do in the future i do want to do like actual aerobatic training where you pull like

Starting point is 00:05:27 i think up to four g's i think four positive g's and sometimes you pull i think negative one and a half g's i'll most likely i'll most likely throw up when i do that training that seems incredible because i don't have a great stomach but aerobatic training just seems like super fun is that where you can then like do shows or? No, it would just be for fun. I wouldn't be. Impressed with it before dinner. But that's what you do stuff like barrel rolls and hammerheads and,

Starting point is 00:05:57 you know, loops and all sorts of fun stuff like that. Nice. Well, I don't do very well on roller coasters, so I'm not sure if I would be into trying that necessarily. I definitely recommend you do just an intro flight lesson because the first time you're not just in a small plane, but when the instructor tells you take the controls and you can fly it, it's an incredible feeling when you take the yoke and you just move it a little bit and the whole plane moves around you, like it's really incredible.

Starting point is 00:06:26 And I want to be clear, like a lot of people think small planes are really dangerous because the only time you hear about small planes is on the news when they crash. But they're actually like very safe. The main reasons a small planes crash are pilot error. So one would be, believe it or not, one of those most common reasons planes crash is because they run out of fuel. Wow. It's so stupid to me because what you're supposed to do as a pilot is you have a checklist. You have a pre-flight checklist. And you religiously follow that every single time, doing all the checks of the plane, all the components, including checking the fuel.

Starting point is 00:07:06 Like what you do when you fly a small plane is you literally open the fuel tank and you look inside and you measure it. There's a special stick you use to measure. If you do that, you're never going to run out of fuel on a plane. So it's really, really like stupid error for a pilot to make to just not go through the pre-flight checklist. And the other reason planes crash is a pilot flying into weather they shouldn't be flying into, which, again, very easily avoidable if you just check the weather before you fly. Great resources to check the weather. Certainly that's not going to happen on an intro flight. I love taking people up for their first time in small planes.

Starting point is 00:07:36 Yeah, Rane is now down, but I have a volunteer right here. No, no, no, no spence. Yeah, yeah, yeah, we won't do it. I. Yeah, yeah, yeah. We won't do it. I might show you a stall. So coming back to the more technical side of discussions. So Nathan, we actually met at Insight back in 2015 when you came to talk to us about Apache Storm, which you built and Lambda Architecture, which you wrote a book on. How did you go from that to building Right Planet Labs? Yeah, well, at that time, yeah, I mean, I was getting a lot of demand

Starting point is 00:08:10 from people wanting consulting and support services for Storm. So I very easily could have started a company. That was a proven model, you know, successful open source project with a ton of traction. It would have been very easy to raise money and pursue that route. And Storm, like, Storm did a lot. It really advanced the state of scalable real-time computation. It was the first project to do that in a fault-tolerant way.

Starting point is 00:08:33 And also just in an easy way. It was very easy to use and do that kind of stuff. But I was always thinking deeper. I think what has always really motivated me and really interested me about just programming in general, like it's the only field of engineering where you can completely automate what you were doing before. And instinctively, it seems like the work you should be doing should be whatever is unique to that thing. And you should otherwise be able to reuse every other piece of it. Right. And with backend development, it did not feel like that at all and

Starting point is 00:09:06 like storm helped certainly helped for just doing creating the capability of real-time computation but when you looked at just like what it took to actually build the back-end for a product end-to-end storm didn't move the needle there um really when you look at like the end-to-end cost and that was the part that like really bothered me and it was through working on my book and just just developing like the theories of the book beforehand that i started to see that there's this different approach that could be taken which really would move the needle and make it so that what it took to build a back end was closer to what it just what it took to describe

Starting point is 00:09:46 it. And like it just to like give an example of that, right? I like to use the Twitter example a lot just because Twitter is a very well known product. And I used to work there. So I know what went into building that product at scale. And so like the original Twitter consumer product, they reached scale in 2011. It was started in 2006. That's a product you can describe what it does. Timelines, social graph, follows, retweets, hashtags, search, et cetera, et cetera. You can describe every feature of that product in a couple hours, max. It's not very complicated to describe all the different user experiences and flows that you go through in that product. But it literally took Twitter 200 person years to build that product at scale.

Starting point is 00:10:30 So again, we're in a field that's entirely about abstraction, automation, and reuse. So how is it taking 200 years to build something you can describe in two hours? And it's not just Twitter. You look at any product, especially at scale. There's just huge disparity between how long it takes to describe it and how long it takes to build it at scale. There's just huge disparity between how long it takes to describe it and how long it takes to build it at scale. And so this new approach that I saw the broad outlines of

Starting point is 00:10:53 seemed to me something that could really change that, make it so that that cost would be much less and just fundamentally change the economics of software development. And so that really interested me and seemed much more important than building a company around Storm, which would have purely been about monetization at that point.

Starting point is 00:11:13 I don't think Storm as a project was going to change the economics of software development to that extent, not nearly to that extent. So that's why I decided to pursue Red Planet Labs and then just with Storm, donate it to Apache and let it just be a full open source project. How did that vision of Red Planet Labs, like did it evolve over time?

Starting point is 00:11:35 Yeah. I mean, basically what I started with and what took me years to figure out was what is a common set of abstractions that can express any application end to end with just that one tool? So it's like completely inclusive. It handles everything in the back end that you need, data ingestion, processing, indexing, and querying. So basically what I started with and what I knew from writing my book and developing Lambda architecture. I think the most important thing which I explained in my book

Starting point is 00:12:08 was how to look at building software applications from first principles. When you look at how back-end development has been done since the 80s, the gold standard has been the relational database. The relational database is not it's not based on the true first principles of back-end development i know that's going to sound sacrilegious to a lot of people who consider it the gold standard but it's really not like what are the what what exactly are the principles of it like like the idea behind relational databases that you have you have tables you have keys you have tables, you have keys, you have

Starting point is 00:12:46 comms, you have foreign keys. So that's the model, right? Now, can you say from that how that encapsulates all possible systems you want to build? Like it is very unclear. There's not a direct connection from that. So what I, like the first principle, which I showed in my book, it's so simple and it's so clearly encapsulates every possible system to ever want to build. And it's query equals function of all data. So any question, like a backend is all about answering questions, right? What is Alice's current bank account with Alice? What is Bob's location? What is the total number of page uses URL or range of time and so on and so on, right? And the most general way to ask a question is to

Starting point is 00:13:31 literally run a function, an arbitrary function over all of the data that you've ever seen, that your application has ever seen. So clearly that encapsulates everything you could possibly ever want to do with any system, right's that's clear right that's a much better starting point than the relational database model which is arbitrary right um and so in my book so obviously you can't literally do that you can't literally run a function over your 10 terabyte or 10 petabyte data set whatever size it may be every time you want to ask a question so in in my book and with lambda architecture i showed well what, well, what is the smallest set of trade-offs you can make to actually have a general model?

Starting point is 00:14:12 And all you have to do is add the concept of the index. So query equals function of all data becomes indexes equals function of all data and query equals function of indexes. And that's actually, that does capture every single backend system that's ever been built. Now, with the way backends have been built since, you know, basically for my entire lifetime, you're using different tools for each one of those pieces,

Starting point is 00:14:36 data, functions, indexes, and queries, right? And so what I was looking for when I started working on Red Plot Labs is, well, how can I generally meet that model, like build a general purpose system where you can have arbitrary indexes, arbitrary ways of computing those indexes, and arbitrary ways of doing those queries at scale with a simple set of common abstractions that compose together into any application you want to build, whether it's Twitter or Google Analytics or a bank or what have you. And it took me a long time to figure that out. But that's basically what I started with. So I had the general model, right?

Starting point is 00:15:15 Indexes equals function of data and query equals function of indexes. I also, the other thing I had figured out by that point was just like, like specifically on indexes so when you look at how databases work they all they're actually all narrow like there's no there's no such thing as a general purpose database now they all have what they call a data model and that's that's all it can do right so you can have relational document graph column oriented like and so on and so on and so that indexes data in a

Starting point is 00:15:47 very specific way, and then it has very specific ways in which you can go about querying those. And what I realized back then was that a much better way to express indexing is as data structures, not data models. And in fact, every data model is just a particular combination of data structures. Key value is just a map. Document, a map of maps, column oriented is a map of sorted maps, and so on and so on, right. And so I knew at that point that the right way to express indexes was as data structures, so that, you know, to build an application, you'll need to build many indexes, and each one of those can be shaped exactly as you need it to meet every individual use case of your application, which is a big problem that you see with backends that are

Starting point is 00:16:32 using databases, which is every backend right now. But as soon as you choose to use a database, which you have to now, not that it's really a choice, but the moment you choose to use a database you have now created a lot of inherent complexity in your application because you have to twist your application to fit that data model and there is no data model that will that will fit your application perfectly and this is like the first and possibly the biggest impedance mismatch which you take on once you like at the start, like as soon as you start your application. And so that's like a huge problem and a huge contributor to complexity. The fact that you can't actually model your indexes exactly like

Starting point is 00:17:16 you need to for your application. So that was a starting point. So I knew the general model indexes equals function of data or equals function of indexes. And I knew that indexes should be expressed as data structures. And I knew that, you know, to actually build a full application, you'd be materializing many indexes of many different shapes. And probably wasn't until I'd say 2016, that I had, I was confident that I, that it was possible, that I had figured, that I had figured out what the abstractions were. So it was pretty difficult, to say the least. So in some of your posts, you mentioned this idea of suffering-oriented programming. Was that one of the key principles that you thought about, too, when you were starting

Starting point is 00:18:00 to build at Planet Labs? How hard it is to build these back-end systems? Yeah, well, the suffering phase was everything i did before replant labs so everything i'd ever built and also everything i'd ever helped people build right through my open source work the idea behind suffering going into programming is that like don't build abstractions in the abstract like don't build an abstraction until you have suffered through the pain of not having an abstraction and you know what are all the use cases that such an abstraction would need to fill and all it's like weird permutations so certainly i had experienced that beforehand just building scalable systems i call it i call like the current approach the a la carte model

Starting point is 00:18:40 right which is that you pick you pick different tools for different parts of your system and then you get them to fit together and I had certainly experienced how fundamentally flawed the Alucard model according to some of the things I just described with the fact that a database is inherently flawed, how you have to twist your application model to fit

Starting point is 00:19:01 the data model and so on and just other problems with getting things to integrate together. So yeah. So basically building Reptile Labs and building Rama was suffering oriented programming on steroids, where my list of use cases was literally every application that I'd ever worked on or ever helped working on. So it was a pretty expansive, just like set of use cases I had in my mind trying to unify them together into a concept of abstractions. So you recently announced Rama, which as you describe, it's the 100x programming platform. And we want to talk more about Rama. Before we get there, you started the company in 2013.

Starting point is 00:19:38 And you mentioned that around 2016 is when you kind of found the right abstraction to start building on top of. What did that 2013 to 16 period look like? Because if I remember correctly, you hadn't started a team around that. Like you were the one doing all the research yourself. Can you tell us more about what that phase looked like? Well, I didn't fundraise and hire anyone until 2019. So 2013 to 2016 was a lot of me sitting at my computer staring into space,

Starting point is 00:20:08 thinking, and a lot of just, I had like this one big text file where I was exploring stream of consciousness, just like, here's an idea, let me work through it, let me test this idea, let me test this idea for an abstraction against all these different use cases and see what happens.

Starting point is 00:20:24 And just like very slow process of kind of getting closer to what the right abstractions were. I think the main thing I had to figure out from 2013 and 2016, there's actually a new programming paradigm underneath Rama. And actually when you use Rama, like Rama has a Java API, a regular Java API, but that API is actually expressing a subset of that programming paradigm basically the programming paradigm it's actually like a general purpose programming paradigm a new one it's basically like data flow programming but generalized into

Starting point is 00:20:54 a general purpose language so you can do all the things you can do with a regular language like variables and conditionals and loops you can do in data flow but it's expressed differently and data flow is a great abstraction for doing distributed programming, as we already knew at that point from Dataflow tools built on top of things like Hadoop. And so what Rama is doing is it's way generalizing the idea of Dataflow. And so a lot of that work from 2013 to 2016 was discovering that program paradigm and seeing how that you know how things would fit together to be able to just express arbitrary applications were there stuff that happened in that time period all the way leading up to 2019 like new technologies that came along were getting more adoption i did that kind of impact your interests as well as like how

Starting point is 00:21:41 you think about the problem because right like i feel like one of the reasons why lambda architecture caught on and all these applications became a thing is because storage got so cheap right and did you would compute like without those sort of advances it's very hard to kind of ditch the database right so to speak yeah yeah yeah for sure and i don't think i think i think the big thing was storage becoming cheap for just enabling these other ways of building things. And that was the case well before 10 years ago, right? That was the case 20 years ago, like when MapReduce became a thing, when that was a thing, right?

Starting point is 00:22:18 I don't think there's been, there has not been any fundamental advance like that in the past 10 years. Obviously, there's been a lot of innovation, a lot of new tooling, but everything is still working under this, everything is still doubling down on this a la carte model of to build a back end you're going to have a dozen or two dozen or more different tools that you're actually using and fitting together in some way where you're still like everything is still very narrow very specific and forces you to take on these impedance mismatches.

Starting point is 00:22:46 Like I described with data models for databases. So nothing, nothing in the past 10 years chain, ultimately like what I was doing was replant labs and Rama was a really fundamentally different approach to the way software has been designed for like my entire lifetime to actually have one tool based on a simpler set of primitives that can compose to all these other things that people are doing with specific tooling and able to build your backend end-to-end on a single platform instead of a dozen different

Starting point is 00:23:17 platforms. So Ranec asked the serious and the good questions here and I asked like the really trolly ones. So to stay on brand, like maybe generative ai gets so good that now you have this blow to the software but it still somehow works right like you just basically be able to write all this like crappy code but maybe you have some ways of like performance tested such that you can still kind of package it up such that you don't necessarily well okay, okay, I think I sounded smarter when I started. So you're wondering what's like AI programming? How does that affect? That's the smarter question to ask. Yeah. Well, AI programming is still in its infant stages.

Starting point is 00:23:59 It's certainly not capable of building a back end end-to-end for you. I mean, I think ai is ultimately going to be limited by the same things that limit human intelligence i don't think it's like magical and i think if you have a much simpler set of primitives that you're building upon i think ai will do a lot better and if you're using a dozen different tools that creates all this complexity that's going to make it a lot more difficult for ai to reason and it's still going to be difficult to to you know to operate in production like one of the cool things about rama because it's such a cohesive general purpose platform it's a much better target for ai for building back ends than you know the hodgepodge of a million different

Starting point is 00:24:41 tools that you have to use currently so i'm actually really excited to explore that in the future. After paying for the ChatGPT premium, that is one thing I noticed quite a bit is when you ask you to do stuff, it's a lot better at writing out the intermediate steps, which to our point here translates into those more generic sort of models before you continue on. So one way I'm thinking about Rama is that you're saying instead of an engineer going and saying, I want to use a relational database, or I want to use the specific messaging queue, or whatever the technology may be and build around that, you're saying, let the engineer

Starting point is 00:25:17 describe with the abstraction that Rama provides, what it is that they're trying to do. And the tool behind the choice, whether that's a relational database or a key value set or whatnot, that's an implementation detail. But what stays the same is the abstraction or the use case that the user is describing. Is that the right way to think about this?

Starting point is 00:25:38 Yeah, let me describe what Rama is. Let's take a step back. So you're not using database when you use Rama. Rama's doing all that stuff with a simpler set of abstractions. So everything that a database does, Rama's doing for you. But you're not using any of the tools. Rama's doing everything. So I'll describe the Rama's programming model.

Starting point is 00:25:58 So again, I described the first principles of building backends. Indexes equals function of data. and query equals function of indexes. And that's basically the program model of Rama. So you have four concepts, right? Corresponding to each of those things and those first principles, right? So you have the first thing in Rama is called a depot. That's how data comes into it. And a depot in Rama is a distributed log of data.

Starting point is 00:26:24 Think of it, it's actually exactly like Apache Kafka, but built in and integrated into the system. Then you have ETL topologies. Again, all this stuff is inherently distributed. So ETL topologies consume data from depots as a stream, and then do computation on it, and then produce indexes, which are called partition states, which is the next concept. And we usually refer to partition states as P states. And partition states are how you do indexing in Rama, which as I described before, it's in terms of data structures.

Starting point is 00:26:54 So to build an application, like if you look at our Twitter scale, Mastodon instance that we open sourced in one of the modules that's doing like the core stuff. So profiles, timelines, fan out and all that stuff. I think there's 33 P states with a huge variety of data structure combinations between them. So when you're building an application in Rama, you'll materialize potentially many, many P states.

Starting point is 00:27:17 All again, completely fine tuned and shaped precisely for your application. And the last concept is querying. So how do you actually query your P states? So there's two ways to query in Rama. So one is called point queries. So that's when you just want to fetch information from one partition of one P state.

Starting point is 00:27:35 And it uses what's called a path-based API. So these P states are arbitrary combination data structures. And so we have a mechanism so it's very, very easy and concise to reach into a P state, regardless of how complex the structure is, to retrieve a value or some aggregation of values. The second kind of query is called query topologies. And these are predefined queries that can look at any or all of your P states and any

Starting point is 00:28:00 or all the partitions of your P states. And it's basically a real-time, on-demand, distributed computation looking at all that stuff. So you can do some really powerful stuff with query topologies. So query topologies would be analogous to a predefined query in a SQL database, for example, except it's defined using the exact same API that you use to define ETLs, which is the regular Java API.

Starting point is 00:28:22 And it lets you reuse code between both contexts as well as being generally a lot easier to just manage because it's not using some bespoke system or registration system like you would in a SQL database. So those are the main concepts. And you can see how it's literally just the first principles, right? So indexes equals function of data. So that would be depots, ETLs, and P states.

Starting point is 00:28:46 And then queries equals function of indexes, which is just how the two different ways of putting P states. What was your original question? I was basically thinking, trying to think about how, as an engineer, like when we're building these backends, we are so used to just thinking about, okay, think about a data model,

Starting point is 00:29:02 because, well, that's what we have been doing. Well, if you pick, let's say, a relational database, well, you've got to have primary keys, foreign keys, rows, and columns. If you pick a key value store, then, okay, depending upon the kind of data store you choose, you will be limited by some of the features you can build, or you might have to kind of reimplement some of that in their code. So when someone's starting to now use Rama, for example, in a way, should they just stop thinking about databases, for example, and think about what do I want my applications to do? And let me define that data structure and whether Rama stores it, how it stores it, what data structures it uses under the hood. That's implementation detail for the engineer.

Starting point is 00:29:40 So that's what I was trying to figure out. Yes. So you're liberated as a programmer using Rama because you're no longer restricted by your data models, no longer have to twist your application to figure out how can I fit it into this data model. So again, Rama's P states, it's just data structures, right? So if you want to

Starting point is 00:29:59 use those data models, okay, well, that's fine. Use that data structure combination. If you're going to do something else, if the way to to meet that use case is different data structure combination well now you can do it instead of having to twist your application so like for example so basically the way to approach developing a rama application i actually have a great on the rama documentation which is on our website on the last part of the tutorial, I actually, or it goes through this process. The last part of the tutorial is building like a Facebook style social network from scratch. So like bi-directional relationships, like a post, like a wall for every user with posts and stuff like that. It only ends up being 180 lines of code at the end of it to

Starting point is 00:30:40 fully scalable, like social network. and it goes through the process right so the way you start is well what are the queries i have to do what are the questions i need to be able to ask and what's the data coming in so for something like a social network it would be like who who are the followers of this user and i need people to ask that in a way such that i can agitate through them right because someone might have a million followers. Or likewise, who are the friends if you're looking at like a Bidrack thing? What are the friend requests? What is a page of posts on someone's wall?

Starting point is 00:31:13 Like what is someone's profile, right? What is someone's age, whatever, right? Or maybe you have an analytics query in there or something, right? Like how many users signed up per day or something like that. So you start with your questions, right? And because ultimately what an applicant is, right? What are the queries I need to support? And then you think in terms of, okay, well, what's the data I have coming in? So you have things like account registration, friend requests, accepting a friend request, making a post and so on and so on, right? So then the first step is to actually start with the P states.

Starting point is 00:31:47 So, okay, well, what are, what set of P states do I need? What data structure combinations do I need to be able to answer these questions? And then what does it look like to ask those questions on these P states? So it might be that one P state you create can answer 10 of your questions and another piece date might exist only to answer one particular kind of question. So to give you an example, like if you look at looking at who someone's followers are and you also need to be able to say, like, how many followers does someone have? Does not just looking at someone's followers, but also just asking, does this person, does user A follow user B? Like these are the kinds of social graph questions. So all that can be supported with a data structure,

Starting point is 00:32:30 which is a map to a linked set. So a linked set is a set that also remembers the order of insertion. And so you can do this with Mama. So like in our master implementation, we have the followers. P state is a key to linked set. It a map of linked sets when you want to get the number of followers you just get the size of the inner set and even if the inner set has 10

Starting point is 00:32:51 million elements in it it's still a fast like you know less than a millisecond operation to get that if you want to paginate through it then you're just querying that in order you're doing range queries on that inner set if you want to ask if user A follows user B, well, that's just a set membership query, right? And all the stuff you can do very, very easily with Java. Whereas if you look at like a different part of the application, like personalized follow suggestions, well, that's a completely different P state with totally different indexing, right? And then once you figure out what your P states are, what the queries look like, then you look to see, all right, how do I materialize and maintain those indexes from my data that's coming to my depots?

Starting point is 00:33:30 Like follow requests, accept follow requests, making posts, et cetera, right? And that's where you write your ETLs to actually, right, that's you're making a function from data to indexes. So that's the mindset, I guess, you use when using Rama. And the great thing is that you're able to do all this stuff, all this flexibility on top of a single platform. And one of the really nice consequences of this is how much it simplifies deployment. Like when you look at companies building applications at scale,

Starting point is 00:34:00 like deployment engineering is no joke. Like it is really costly. Like you often have entire teams only doing that, writing sometimes millions of lines of code and conversion. And none of that code has, there's no business logic in any of that code. It's pure complexity. It's plumping and putting bits on boxes.

Starting point is 00:34:20 It's crazy. Like it's really wild. Now with Rama being such an integrated platform, when you can deploy your whole thing on one system, well, Rama understands how to deploy and update Rama applications. That's like the core parts of the platform. So it's able to do it in a general purpose way with all the best practices just built into the system so that you can take an existing application that you have running, which is called a module in Rama. And you can say, I want to update it. And we spend a lot of time on module update.

Starting point is 00:34:49 So it's completely fault tolerant. It does the transition very, very smoothly to transition responsibility between the two versions. All the stuff people are doing manually and in a very complex way currently. So all that stuff is just free. Because it's a general purpose platform, it's able to implement it in a general purpose way.

Starting point is 00:35:06 And then boom. Now as a developer, you don't really have to worry about that anymore, which is like, I think one of the most like brutal costs of the a la carte model is, is the fact that you have engineered deployment yourself. Oh, for sure.

Starting point is 00:35:19 Now the other, yeah, the other really nice consequence is monitoring, right? So Rama being such a general purpose platform, it's able to implement monitoring. Like monitoring is the same thing, right? Monitoring is you're collecting data, you're materializing using that data, and you need to have a way to query that data. So Rama actually implements monitoring using itself.

Starting point is 00:35:39 So there's a built-in Rama module, which collects data and then materializes telemetry on that data. Sorry, it's collecting the data from all the other modules and then materializing using that data. And then it has a built-in cluster UI where you get very deep and detailed telemetry on all aspects of your module or of all your modules, which I think it's really cool that it's able to just be like recursive likeive like that. Rama is just using itself to implement telemetry. It's not doing anything special. The telemetry module is exactly the same like any other module. So how much Rama helps on deployment and monitoring?

Starting point is 00:36:16 I was always thinking from the start in terms of end-to-end cost, and deployment and monitoring are a very substantial part of the end cost. So that's one of the things that really excites me about Rama, that all that complexity is just gone when you're using Rama. And it's so much simpler, so much easier. I think it's super cool that you guys literally built a Twitter clone in order to just show how powerful it is. And just to quote some numbers from the blog, that it was only

Starting point is 00:36:45 10,000 lines of code comparing to like the 1 million that Twitter wrote to start with. And then this is having 100 million bots posting 3,500 times per second at 400 average fan out, which sounds like super impressive. The other aspect you measured is to quantify this 100x improvement is how long it took to do it. So nine months versus the 200, sorry, nine percent months versus the 200 percent years. Yeah, so I'm very curious to learn more about the trade-offs. So in terms of the pros,

Starting point is 00:37:22 so in addition to deployment and monitoring, as you mentioned, I imagine this is a bit harder to do, but I imagine bugs are also become like less and much easier to fix, right? A lot of the production bugs happen in between kind of systems that sort of come out of this a la carte menu that you described, right? So if you actually start from scratch, if i may i feel like a very maybe bad example is like you know going from a like dynamic language like python you don't where you don't have to learn to know a lot right to a compiled actually strongly typed where you have

Starting point is 00:37:56 to kind of specify all the things up front like you get that trade-off like now there's way less you know random things can happen right like yeah like what do you think about like the debugging and then, you know, the outages, like that aspect of things? Yeah, well, like, first of all, bugs become much less, when you have 100x less code, you're going to have a lot less. Actually, no, that's right. No code, no bugs. It's not just the line, it's not just another line of code. It's just the reduction in complexity.

Starting point is 00:38:28 And again, when your code doesn't have impedance mismatches, when you're able to actually represent data in a way that actually is optimal and makes sense as opposed to having to twist it like you do with databases, that reduction in complexity just helps a ton, right? But of course, you're still going to have bugs. We had bugs in our master implementation that obviously we worked them out before we deployed it, but you work them out in the same ways that you would work out bugs in any system, like through testing, right?

Starting point is 00:38:59 That's actually another aspect that really helps with Rama is just how much easier it is to test because you don't have to to think about oh how do I start up these 10 different components create them with mock data and all this stuff and get them to work um in in a test environment which is all within a single process so Rama provides something called in process cluster where you can simulate a Rama cluster in process and you can use that just like you would a regular cluster. Deploy modules to it, add data to it, do queries, whatever. So that's how we test it. If you look at our master implementation,

Starting point is 00:39:32 we've written a lot of test code using that exact same approach. We deploy modules, we append data, and then we do assertions on what happens afterwards. So that helps a ton. And of course, when it comes time, if you actually have a bug in production, well, again, it it has module update built in. So you just update your module to fix the bug. So I'd say that all that stuff on multiple different fronts, reduction

Starting point is 00:39:54 in complexity, the ease of testing makes it much easier to build like quality software. Like we actually, we actually look at how like backends are built today. Like back end programming, especially at scale, it's gotten so complex. It's beyond the realm of human understanding. Like there's no one working at Twitter or Google or Facebook or any of these services that really understands what's happening. Not to say they don't understand, but it's so complex that you can only understand things empirically. So based on what I'm observing now, that's what I understand. So it's not a surprise at all how buggy all these different platforms are. Isn't it crazy that every one of these services you use, they have bugs on them all the time.

Starting point is 00:40:44 And they have billions of dollars and thousands of engineers how come they can't fit sometimes the bugs are like brain dead like how is that even a bug you know and these are these are the companies with good engineers right not to mention the companies that that are you know weaker engineering teams how are they so buggy like such visible bugs in a consumer product and it's just because no it's impossible to to comprehend these systems because they're so complex because you can describe what these applications are doing again in two hours but they've invested hundreds of years person years in developing them so like like there is a big complexity problem in in software and more than anything else that's what rama is tackling is this extreme

Starting point is 00:41:26 reduction in complexity like rama's like what rama's doing is it's just you're able to do things in a way that avoids the complexity that people have been taking on for decades um and it's it's it's not more it's not like it's doing anything magical it's it's really about what you're not doing with rama which you're forced to do when you're kind of doing stuff the traditional way with the alacrity. Going a little bit meta, so in the process of building out this Twitter clone, did it change some, I don't know,

Starting point is 00:41:58 like code in Rama itself of basically where you draw the boundaries between different concepts and things like that? Did it have any... No. We basically just used Rana as is to develop the proof. Just because the model is... I mean, the model

Starting point is 00:42:14 makes sense. It is general purpose, right? Core equals functional data, right? Expanded to function equals... Sorry, index equals function of data and core equals function of indexes. Like, that's the model, model. It makes sense. It's general purpose. It clearly encapsulates all systems. And that's the model using the program in Bamba. And so building Twitter was one application of that

Starting point is 00:42:35 model. We could have built anything else. We specifically chose Twitter just because we were familiar with the actual implementation of Twitter. And so we could do a true comparison against cost and when you actually look yeah when you actually look at like twitter like why was it so expensive to build and the reason was like the like one of the main reasons was because of just how much specialized infrastructure they needed to build over the years because they needed to represent their social graph and there was no tool which could do that the way they needed to represent their social graph and there was no tool which could do that the way they needed it so they had to build it from scratch they had to they had to build multiple other databases and services from scratch and in all of these things they were like a lot of stuff is repeated right so when you build the database from scratch well you're repeating a lot of stuff

Starting point is 00:43:19 you got to build replication again which is insanely expensive to build by the way and also insanely difficult and you have to figure way, and also insanely difficult. And you got to figure out durability and distribution, how things network and talk to each other. That's just being repeated over and over, right? The one mantra everyone knows from programming, from really the start being a programmer, is do not repeat yourself. D-R-Y. And it's completely non-existent in back-end programming.

Starting point is 00:43:48 As an industry, we are repeating ourselves constantly by the fact all these systems are actually doing the same things or they have a lot of their subsystems such as replication are having to do the same things sometimes they're doing it in different ways sometimes but they're trying to solve the same problem. So with Rama, with a true general purpose system, we're able to implement replication for Rama, which is something we spent a lot of time on. And now it encapsulates all possible ways in which you might specify these computations or indexes, if that makes sense. So one thing I'm thinking about is when we look at any of these apps today, their complexity grows over time because a majority of applications start with an API backed by some database that is fine for the prototype.

Starting point is 00:44:33 If the application works out, you need to grow out of that single database. Either you shard it, you build different views for the same set of data. Sometimes, as you mentioned, you want to represent it as a graph, sometimes a key value store and whatnot. A lot of time back in engineering teams spent doing migrations because either the amount of scale you're hitting, your app cannot handle anymore, or you want to provide a new feature. For that migrating to, let's say, a new system is going to be much better. What does that look like with Rama? Like if you wanted to represent your data differently, would you just go create a new P state or like, does it also ease

Starting point is 00:45:10 the migration piece? So yeah, so if you need a new view of your data, then you can just build a new P state. You can always read whatever, you know, read as much, you can start constructing a piece state by looking at the start of a depot or just at some point in the past of a depot. There are other cases where you might want to actually change your existing piece states because you want to change the format of something. So you can do that manually now with Rama, like through a module update, although we are currently working on a first class migrations feature where you would be able to just take an existing P state and then just change the structure. Right. So maybe like instead of using this data type for this value, you want to use this new data type with maybe more fields in it or less fields. So that's coming soon. And then there's like another level of it where you might want to not just migrate each partition of a P state, but you might want to actually include some repartitioning of that during the migration. So actually change

Starting point is 00:46:09 where stuff is stored, not just how it's stored. So all that stuff is coming. Still doable to do. Yeah, it's still doable to do right now. It's more manual, but we'll have some first class people do that soon. I see. And so double clicking on deployments for a second, and this is just me trying to understand this better.

Starting point is 00:46:25 So if I look at an existing application, it's like we went through this whole microservices and whatnot, but eventually what it means is you have your data store running on a set of machines,

Starting point is 00:46:36 your web services running on a set of machines, your Kafka queues or some other queues running somewhere else. And maybe you add other pieces to your ecosystem as the app grows. So if I think

Starting point is 00:46:46 about building a Rama application, if I just look at it from bits and boxes perspective, what binaries get deployed where and how do you scale this thing out when once your app keeps growing? Right? Yeah. So first of all, I think before we go further, you could, Rama doesn't have to be used in isolation. I think some people may get the wrong impression of that. Like when you're using Rama, it doesn't mean that now everything has to be built on Rama. So Rama can very easily integrate with other systems, just like, just like you do with any other system using Alucard architecture. Very, very simple to, to, if you want to integrate with, you want to use a database from Rama, very, very easy to do. Or likewise, you can also, Rama can consume data from external queue systems, right? So we actually have an open source project called Rama Kafka, where you can use Kafka

Starting point is 00:47:35 as a data source for your ETLs. And it works exactly like you'd be using a depot. But more generally, in terms of deploying Rama itself, it's Rama runs as a cluster. So it has a central node called the conductor, which is the conductor doesn't do any sort of, it's not involved in data processing. That's just how you do module operations, like deploying a module, updating a module or scaling a module. And then there's a cluster of worker nodes that all have daemon on it called the supervisor, which just listens to the conductor for assignments. So what workers from what modules should it be running on the machine, the supervisor responsible for starting and stopping worker processes as dictated by conductor. And so when you deploy a module, it's just a one liner at

Starting point is 00:48:15 the terminal, where you tell conductor, here's my jar with my code. Here's the module I want to run, go right out. And here's the palettes and I bought and the conductor will figure out which supervisors to run that on. And then likewise, when you want to update, it's the palettes my bot and the conductor will figure out which supervisors to run that on and then likewise when you want to update it's the same thing you tell the jar you tell the conductor i want to update this model here's the jar and goes ahead and does that process and same thing with scaling where this time you don't have to give it a jar because you're not changing the code you tell the conductor here's a new problem settings i want and then that launches the scaling process. And when you're scaling, when you do an update,

Starting point is 00:48:50 it's going to deploy to the same set of nodes it's already deployed on. So it's a co-located update. When you scale, you actually need to move data across nodes because now you're spreading nodes. But again, all that stuff is behind the scenes and transparent. So scaling will take longer because of the data transfer step, but it's all very, very simple to do. It's just literally just like you just say, here's how many more resources I want, go, and then it takes care of the rest. And by the way, I'm just trying to figure out, is there a limit to how big a cluster could be?

Starting point is 00:49:16 Perhaps looking at the open source master and example you have, is it possible to share on how many nodes is that running today? Well, a full Twitter scale would take about 600 nodes. That's it? That's little for what I was thinking with that schedule. Yeah, yeah, yeah. And that full Twitter scale would be 7,000 tweets per second at 700 average fan out. Again, with a very unbalanced social graph but actually the key thing for twitter is in terms of scalability or just in terms of performance

Starting point is 00:49:50 usage is the average fan out as well as the number of tweets per second so i mean we went to this in depth in our blog post about our mess on incidents so there's a lot of stuff you have to do in regards to having unbalanced social graph in terms of achieving fairness and whatnot. But in terms of resources needed, it's really just about average fan out and number of tweets per second. And yeah, it'd be about 600 nodes to do that whole product. The consumer product. We're just talking about the consumer product.

Starting point is 00:50:16 I'm not talking about all the other stuff that Twitter does. So we're looking at Twitter like 2015, let's say, on the consumer side of things. So the dominant side of things. So the dominant cost of Twitter or that deployment would be storage because you'd be absorbing like, I think it would be like five or six gigs per node per day of like new tweets. And so, you know, you'll need some pretty big disks on those partitions.

Starting point is 00:50:43 So like when thinking about scaling, usually today, teams scale different parts of the application depending upon where the bottleneck is. It's like, well, if your API isn't performing, let's look at the bottleneck. Is it database or is it just you don't have enough instances and you're chewing through too much CPU? When with a Rama application, like what does that look like? Do you also look at the same factors you would in an Alakarta model, or you just scale the entire cluster, and Rama figures out what to put their storage and compute and work on? Oh, yeah. Well, I mean, it's all about telemetry, right?

Starting point is 00:51:17 Actually finding where the bottleneck is. So that's where Rama's built-in telemetry is really useful. And that was very useful for us developing our Mousetron instance to actually find what were the know what were the hot spots in performance so you look at things like we just look at like for a p state well how many rights is it having per second and what is the average time of those rights or what's the distribution of it and that stuff helps a lot to find where the bottlenecks are you can also look at skew search telemetry let you look at not just like the overall picture for a P state, but also partition by partition. So one really common reason for a bottleneck would be skew.

Starting point is 00:51:52 So this one partition has the load of another one, and that's going to slow down the whole system because you have some resources idle while another one is very hot. Right. And that is the main issue with the unbalanced social graph is it's inherently like extremely skewed so that's why we put a lot of effort to balance the processing even when the the social graph is so skewed and that by the way this twitter obviously went through the same thing a lot of a lot of their implementation is also trying to deal with with that inherent skew so a lot of things we did was similar to what twitter did but obviously in a more integrated way, in a much simpler way. And we ended up getting, yeah, like, you know,

Starting point is 00:52:31 like we did one optimization to reduce variance between tasks in terms of like, okay, so like, let's say you have someone with 20 million followers. If you're processing all their fan out just from one partition, that's going to be very skewed. Because whenever that person posts a status, suddenly you have 20 million units of work. Whereas with normally per second, you have whatever 7,000 times 400 is, right? Which is a lot less than 20 million. So that person creates a huge burst of load, right? And it's creating a huge burst of load on one task, right? So if you do the naive thing of just processing all of someone's followers from one task,

Starting point is 00:53:09 you're going to be super skewed. That's going to massively slow things down. Because now that one task needs to work through the queue of 20 million things, whereas everything else is much smaller than that, right? And so one of the things we did in our implementation is we basically have a different view of the social graph specifically for fanout. So when someone has a lot of followers, their followers get spread around all the partitions so that when you want to process a person's followers, you do it in parallel for some

Starting point is 00:53:35 partitions and you balance the processing. For fairness, such that that person posting doesn't delay everyone else, per iteration of fan out. We only will process up to an argumentation with 64,000 followers per user per iteration, right? One iteration takes like 500 milliseconds on average, right? So it's processed. All of someone's 20 million followers will take a few minutes, right? But that's, that's, that's the trade-off you have to make because the number of resources you have is fixed at any given point in time.'s and that's you know again that's not on the rama side that's on the macedon implementation side that's how we utilize rama we're able to do these things like like materialize multiple views of the social app for different purposes and you know all these things

Starting point is 00:54:17 we did there were other things we did to reduce variance when you reduce variance you increase throughput because you have more balanced processing and so you have less situations where you have some resources idle while another one is really, really busy. And there's actually more we could do on the massive implementation to actually reduce variance even further. But obviously, we got the performance to a point where it was as good as Twitter, actually better than Twitter in many respects, so that we didn't feel it was necessary to keep developing it.

Starting point is 00:54:42 But we could make it, I bet with a little bit more work, we could probably squeeze another 5% to 10% more throughput out of a family. But obviously, it's already a very, very high throughput. So this is a different way of thinking about how you build backends. And as you mentioned, this is very different from Alakai to Marvel.

Starting point is 00:55:02 You have a platform that is truly generic. So how are you thinking about adoption here? Like for companies to adopt this to build massive scale systems, usually the thing that many teams or engineers start with is like, let me just hack on something, put a database in front of it, and I'll go. So how are you thinking about Rama getting adopted? Yeah, that's a great question. And actually a lot of it I learned,

Starting point is 00:55:26 a lot of this, how I'm approaching this, I learned from the open source work I did before, especially with Storm. So my general approach to adoption is the bottom-up model, which is as contrasted to the top-down model. The top-down model, which would be where you try to drive adoption through,

Starting point is 00:55:43 you talk to CTOs and VPs of engineering, and you try to convince them to give it a shot. So it's outbound sales. It's very expensive because you got to literally do it one by one, right? So what's much more efficient, and I think more effective, and this is what I did with Storm, is the bottom up model, right? So you make something that's really compelling, that gets engineers really interested in it, such that they try it out themselves without you even knowing about it. And they become your salespeople for you at the company that which they work. That's what I did with Storm.

Starting point is 00:56:14 That's how I was able to get Storm to be such a big project just by myself. And that's what I'm doing with Rama. Now, there's a big difference between Rama and Storm. Storm was like, it had like two concepts in it. And it wasn't that different from what other people are doing. It was just doing it in this, you know, fault tolerant way. It was very, very easy for someone to pick up Storm and try it out just because there wasn't that much to learn to pick it up. Rama is different.

Starting point is 00:56:38 Rama is a paradigm shift. Rama is a major, major paradigm shift. So it has a much, much higher learning curve than something like Storm or really anything else that you would look at. So the high learning curve makes it more difficult to get adoption just because there's this upfront cost. We actually have to learn it before we can really start using it. And so that's why we launched the way we did.

Starting point is 00:57:00 We started with this Twitter clone running at scale and this ridiculously small number of lines of code to basically create the motivation to put yourself through that learning curve where, oh, here's this thing. It's doing something really unusual, like really unusual, like literally reducing the cost of building this major service by a hundred X. And so, you know, my theory is that that would compel a lot of people to want to benefit from that. And what I've been seeing in these first few weeks since we've launched is that message has gotten through

Starting point is 00:57:37 to the early adopter type of people who base their decisions on the technical merits of things and what is the value it provides as opposed to like a later adoption kind of crowd where they base their decisions largely on social proof so i want to use like i've been like someone like that is saying oh i want to do something that's similar to what someone else is doing like i literally want to see my use case done in a similar way already before i use that thing. Obviously, that's a much less technically savvy crowd. It is a big portion of the crowd, and that stuff is important.

Starting point is 00:58:09 You do need to be able to show those things. Right now, obviously, we're focused on early adopters. And the main things I've been seeing from early users and the early enthusiasm I've been seeing is from people who – it's like two things. So people who have systems that they need to scale and maybe they've been through it before so they have a lot of anxiety about what they're going to go through to scale the existing systems so they know how painful it's going to be to do the alicart model to use a dozen different systems and so on and the other thing other thing i'm seeing is people who are sick of these impedance mismatches for their whole careers for

Starting point is 00:58:46 20, 30 years, they've been having to twist their model into these data models. And they understand how much complexity they're taking on from the get go. And so that the idea of P states of being able to tune and shape your indexes to what you need as opposed to the other way around it's very compelling for them and so that's like the general approach i'm taking and so like over time like right now we have as a demonstration we have this one example right this twitter scale mass on instance which actually inside of it is actually like 20 examples because there's like 20 different or you know i mean there there are so many people in Mastodon, right? And they all work completely differently, right?

Starting point is 00:59:27 But it's always just as one product, right? So over time, like as we are working with early adopters and helping them achieve like massive success, well, now we're gonna have more examples to show. And I expect that'll help Rama break through to kind of later adopters who need that social proof or need to see like ramen used in a way that's similar to their needs before they can give it a shot so that's that's like the way i see

Starting point is 00:59:51 adoption so i don't anticipate it being that fast just because it takes time to build that social proof but i do i do especially with like how much early enthusiasm i've gotten. I do expect us to get there. Taking a step back, for you personally, how do you see this transition going from like, you're somewhere by yourself going through that big text file that you have, right? And then really thinking really deeply about these like really challenging problems to now more kind of day-to-day running a company, having to manage people and then having to think about marketing, right? Talk to developers. How has that transition been? Yeah, and then having to think about marketing, right? Talk to developers.

Starting point is 01:00:25 How has that transition been? Yeah, well, there's basically two transitions, right? I went from by myself in 2013, and in 2019, I fundraised and built a team. So that was a big transition, learning to, like, manage a team and do all that stuff, right? And I learned a lot. In the past four years, I've learned a lot about that subject.

Starting point is 01:00:44 Can you share, like, the aspects that you learned there? About management and just building a team. I mean, there are a few parts involved, right? Like you went from having a problem you wanted to solve. You spend some years researching, trying to kind of better frame the problem and figure out a direction you want to take. Then you're at a place where you're able to describe that problem and fundraise, which is not an easy thing to do. And you're trying to solve something which wasn't done before. This is something new, a new paradigm in how you build software applications.

Starting point is 01:01:14 So part of it is also selling in a way. You're trying to show what you're trying to build, fundraise, and then build a team. So there are many pieces there. Well, learning to fundraise was its own thing. And it's kind of a weird thing because it's kind of fake. Fundraising is you're selling a product that doesn't exist yet to people who are not in your target market. That's what fundraising is, especially for something like this, deep tech intended for

Starting point is 01:01:45 serious software engineers i mean some of my investors have a software engineering background but they're certainly not doing that anymore i don't think any investors were like that hardcore of engineers except for maybe max levchin max levchin is pretty hardcore and like yeah he was he was a very difficult one to fundraise from. But he probably grilled me harder than anyone else in terms of like the technical details of Rama. He actually was very interested in the underlying language behind Rama, which no one else took an interest in. But anyway, still, the principle remains, right?

Starting point is 01:02:17 Investors you talk to are not in your target market. And also, they just don't really understand what you're doing. They don't really understand. I can describe at a high level, level, 100x cost reduction, build Twitter or any other application on one platform instead of a dozen. They understand that, but they don't really understand. Fundraising is its own art,

Starting point is 01:02:34 its own... My fundraising went really well, but I will say until it started going well, I thought I was going to fail. That sounds like fundraising. Yeah. I was as successful as you can possibly be. I raised more money than I

Starting point is 01:02:50 wanted at a much better valuation than I was initially seeking. And I got every investor I wanted. Why was that? Do you know what aspects played a role in fundraising being as successful as it was for you? I know every aspect that went to being successful, but again, I thought it was going to fail

Starting point is 01:03:06 until it started to go well. And also, it's not like I thought it was going to fail for a long time. I got the whole round done in basically one month. But man, it was not looking good for a while. Because it's like, yeah, I mean, this is a whole topic, right? But especially with venture capitalists

Starting point is 01:03:22 who are not investing their own money, they're investing the money of their investors, the LPs. So the motivations of a venture capitalist are, they have other motivations besides what you're doing, right? So like you'd think with fundraising, all it comes down to is tell a story about how you're going to build this product, which is going to have this multi-billion dollar market for it.

Starting point is 01:03:43 And then the second thing is be credible in that story, right? So my story was pretty simple. General purpose platform reduces the cost by 100x or more. And it unifies these things, which are currently done as like a dozen different tools. And I think I was pretty credible with Storm and my book and just being like a fairly well known and respected person in this field for which i'm building a product right yeah so it turns out those are maybe the two those are those are important those two things but they might be the least important things look at why investors actually invest so and and what's funny is that i had already read all this

Starting point is 01:04:23 stuff before about why investors actually invest, but I didn't really understand until I went through the process and went through these meetings where, and it was actually specifically Paul Graham that wrote about this stuff. Like he was writing about this stuff in like 2008 or something, or maybe even before that. Right. But the issue with investors, so there's other motivations with a VC. So first of all, they have their investors and you don't see a return on investment for a long time maybe 10 years but you need to make sure that your your investors your lps think you're doing a good job in the meantime before you get a return right on that investment so you need to you need to explain to them well here's

Starting point is 01:04:58 the investments i made and why i made them and so they have to be able to tell a good story and what they're looking for is traction of some sort either traction in the market which is obviously something i did not have in 2019 because i was still building the product um or you're looking for traction in another way such as oh this other big well-known investor who's very respected invested them also so i co-invested with them right so that's like the whole momentum thing that you hear about with and it's something that paul graham wrote about a lot and so that's like the main reason why investors invest especially vc is that they want to be able to they want to see that traction so they can tell that story to their lps now as a founder where i'm just trying to like build

Starting point is 01:05:38 something cool and change the economic software development it's very frustrating to have to play this weird game we need to like generate this momentum and social proof so that i can build momentum and finish my round so that's like yeah yeah it drives you crazy because like ultimately what like a lot of these early conversations i would pitch them and be like oh this is great can you talk to these other people to see what they think and they'd be really slow like all doing all this is great. Can you talk to these other people to see what they think? And they'd be really slow, they all do all this like due diligence, which also seems completely unnecessary, because like I told the story, I'm credible, what else do you need, but really, they're just delaying things

Starting point is 01:06:14 because they want to wait to see some, you know, someone else invest so they can they can tell that story to their LP. So the first year, that's the most important, like once you get the first year, the the wants after that become relatively easier. Yeah, so the day I closed my lead, which was initialized, Gary Tan was the investor. Gary Tan is now the president of my combinator.

Starting point is 01:06:32 But yeah, the day I closed my lead, everything changed. So now I was able to go back to all the other investors who were being slow. And I just wrote them a very polite email where I said, it's great that you guys are interested, but I'm looking to close the round now.

Starting point is 01:06:45 Initialized is leading. Here's the price. Let me know how much you want to invest. You can invest anywhere between this amount and this amount. And let me know by this date. If it doesn't work for you, that's fine. No worries. And basically what I'm telling to them is I'm not doing any more due diligence for you. I'm not going to do any due diligence for you at all. So either you invest or you get out. This is a polite way to say that or another way is to say like stop wasting a polite way to say that to them right and man it feels good to be in that position you have the leverage and fundraising where you're no longer playing that game feels great right

Starting point is 01:07:23 so yeah that was good and what's interesting is that it was it was four days into my fundraising that i hit that point so again i wasn't like bogged down i was not like i was bogged down six months like a lot of founders are right so again it went very well but yeah it was not looking good up until that four-day mark and that's and that's i'll say a lot of that credit goes to Gary Tan for not being like that as an investor, for really seeing, being able to understand the merits of something on its own

Starting point is 01:07:52 and not needing all this silly momentum stuff first, right? So, yeah. Congratulations on the successful fundraise, even though back in 2019, but congratulations. It's a huge deal for a company. So you mentioned, like, just during the fundraising period, you were thinking you might fail. Keeping fundraising aside, it took 10 years in a way to build drama, starting from 2013 to now. Was there any point in this 10-year period where you just wanted to stop and do

Starting point is 01:08:22 something else? Yeah. Well, I wasn't confident it would be possible until 2016. I think what kept me going is, the main thing is just the opportunity to have this big of an impact on the world. I wrote a blog post about this, about why I started at Planet Labs. And the thing that I think, something that really inspires me is the,

Starting point is 01:08:45 like the Apollo program in the sixties, the space program. Cause it was like, it's really incredible what they did. Like that speech that JFK gave at Rice university. Like I've listened to that probably 50 times. Like, I love that speech.

Starting point is 01:08:58 It's so great. And it's so like audacious. I mean, JFK was like, first of all, like the space runner at that point, like they couldn't, they couldn't even launch a rocket reliably. audacious. I mean, JFK was like, first of all, like the space runner at that point, like, they couldn't, they couldn't even launch a rocket reliably. I think like, 25% of them were exploding at that point, something like that. I may have the number wrong, but a lot of them

Starting point is 01:09:12 were exploding. And to say that before the end of the decade, I think that speech was 1962, if I'm not mistaken, something like that. But to say before the decades out, we're gonna have men walking on the moon, and we can't even launch a rocket yet and to say that on a national stage like that like that is unbelievable and then they did it they did it and just like the way they did like the engineering was incredible that they did it was i mean it was brave what they were doing like those astronauts were like man at that time yeah it was just the whole thing was incredible and they really i mean they developed space travel and they figured out everything that goes into doing that and working in space and doing all that stuff you know setting the stage

Starting point is 01:09:59 for all the stuff that came later all the stuff we use space for right how much that's improved society at large and so on. So I just find that super inspirational, like pushing this frontier, not being afraid of it and really advancing human potential. And like back in 2013, when I started working on Rama, that's what I saw with Rama, a way to fundamentally advance human potential.

Starting point is 01:10:22 So more than anything else, that's what kept me going, that thing. As well as it just being a really interesting thing to work on just as a programmer, right? This new programming paradigm, figuring out how to enable abstraction, automation, and reuse in this aspect, this major aspect of software engineering,

Starting point is 01:10:40 which has suffered so much in the past, you know, forever in these aspects. So at no point was I ever going to stop. I mean, I would have stopped if I determined it wasn't possible, obviously. But at no point did I think it. At no point did I get to a point where I thought it was not going to be possible. I was making progress over those first three years. Like it was definitely slow to be possible. I was making progress over those first three years. It was definitely slow at some points.

Starting point is 01:11:06 There were definitely some wrong directions, some very wrong directions I went. But I don't think, I think probably, I remember when I was working on P-states, the abstractions for P-states, especially for reactivity. That's something I understood from the beginning, just the importance of reactivity

Starting point is 01:11:24 and how reactivity should be fine-grained, where you get, when something changes, you get a very precise information of what changed, which is actually very different than how databases work, which are coarse-grained. Like, at best, you would know that, like,

Starting point is 01:11:37 oh, this row changed, but it doesn't tell you what changed in that row. The fact that, oh, maybe this, like, this one value changed in this one way incrementally like this one value could be a set so it could be this this one column in this one row this one set had this one element added to it but actually all you know is that this one row change right so that's course ring rama's fine grained right so you get precise information so regardless of the complexity of data structures you can do these reactive queries where it actually tells you that this set inside this map, inside this list had these two elements

Starting point is 01:12:10 added to it and this one element removed. So that's find information, which is really powerful, can power some really interesting stuff on top of it. So it was working on especially the API for P states and how reactivity could work work which really had me stumped i think it was for like three or four months and i went down the completely wrong path and eventually figured out that like i have to take a step back and question my assumptions here and then i and then i ended up figuring out what is the right way to do it which was paths this path model which can do both non-reactive queries very efficiently and reactive queries it's like it was just like once i i remember the moment where that like clicked in my head and just being like like just like an explosion in my head it was like it's so elegant

Starting point is 01:12:56 it's so perfectly enables everything that you would want from from indexing which is a very obviously broad topic and there were other moments there were other things like that, maybe not as extremely like that, of just this two steps forward, one step back kind of process. But at no point did I think I wasn't going to succeed. That is very inspiring, to be honest. And kudos on going that long and actually building grammar. It's really impressive. By the way.

Starting point is 01:13:25 Yes. Let me say one more thing about that. I'd say the thing that enabled all this was the fact that in 2013, I had achieved enough financial freedom that I knew I would be able to pursue this. I knew it would be difficult. It could take a long time. I didn't necessarily think it would take 10 years at that point, but I didn't really know how long it would take because it's such a broad thing to be working on. So the fact that I had some financial freedom, like I wasn't like, I wouldn't say I was like super rich, but I had made enough money from, I was part of a startup called Backtype, which was acquired by Twitter. That's why I know so much about Twitter. So I made enough money from that, that I was in a position where I could pursue a crazy thing like this. And that was a lot more appealing to me than doing something like a storm company, which is just about monetization, just about making more money, right? But we only have one life. So let me do the thing that that would make me like proud of my life, which is to expand human potential, right, as opposed to just making more money than I already had. So that was, I think, the foundation that enabled this process.

Starting point is 01:14:30 That is incredible. I mean, it shows the path of higher resistance than the least resistance. It would have been easy to be on a path, like either start a company with Storm or even like, let's say, be a distinguished engineer or a technical fellow at one of these big companies and still make a lot more money with that. I mean, you had the credibility to back that up. Well, I would have made more money in the past 10 years going that route. I think Red Font Labs is worth way more money than that could possibly be. But again, like resistance, it depends how you define resistance for yourself and really what you

Starting point is 01:15:04 want to see from your life. Like for for me like being so inspired by something like the apollo program like like it wasn't even a question which path i should take right like it's like doing i actually feel that doing the red plant labs path was less resistance for me it's just like like like internal resistance if i didn't I had this idea for Red Plot Labs and I decided to pursue a storm company, well, then that would be eating at me forever. This is the regret minimization framework that you refer to in the blog post, right?

Starting point is 01:15:35 Yeah, that's the thing that Jeff Bezos said, right? Where he was in, like his situation when he started Amazon was similar, right? He had a really high paying job, very comfortable job. I don't know where he was working before he started Amazon. Then he had? He had a really high paying job, very comfortable job. I don't know where he was working before he started Amazon.

Starting point is 01:15:46 Then he had this crazy idea for this online bookstore. And then, yeah, he said that like, so which way do I go? Do I take this leap or do I just stick with my comfortable lifestyle and job where I'm making a lot of money, right?

Starting point is 01:15:59 And then he framed it in terms of regret, right? Where he said, I would never regret trying the bookstore thing, but I regret it forever if I did it. And I'd say the same mental process went through me as well. So a couple more questions before we close off. You've been building Rama, which is a deep technical platform. At this point of the company, it's about, as you mentioned,

Starting point is 01:16:22 you started building the team in 2019, and now it's about scaling the team, building the company, it's about, as you mentioned, you started building the team in 2019, and now it's about scaling the team, building the company culture. Can you share more about how you are thinking about the company building aspect of things, which is slightly different from building deep tech? Right, yeah. Well, I've learned a lot.

Starting point is 01:16:38 That's something I've learned a lot over the past four years. So from the start, I did decide to do a fully distributed team. I just think that's, I actually think that's a much better way to run a company. Obviously, a lot of people had to experience it during the pandemic. And unfortunately, they experience like a distributed team, first of all, requires people who actually want to be on a distributed team. So one of the reasons that the fourth distributed team pandemic didn't really work that well was because those people wanted to be in an office, but they weren't, right?

Starting point is 01:17:07 So that's a really big aspect of it, right? But I do think that productivity and collaboration and whatnot are better distributed, presuming that everyone wants to be distributed. Do you get to work in the same time zone where everything's in writing? No. So, yeah, I do find it important to be in close enough time zones that you can still do video calls so yeah so so we don't hire we don't really hire globally initially i was intending that but i do think it's important to be close enough but it's still a pretty wide range right so i actually changed when i moved to hawaii i changed my schedule so now I wake up at like 5am. Right. So but it actually turns out

Starting point is 01:17:49 to be a great lifestyle to lead in Hawaii because mornings mornings in Hawaii are incredible. It's not too hot yet. Doing stuff outdoors is amazing in the morning. But basically, we work on an East Coast time zone, right? So whenever we talk about like times internally, we're always talking, we just assumed east coast right so 2 p.m means 2 p.m eastern which would be 8 a.m hawaii time so but yeah obviously across that range of time zones like it's still a huge portion of the globe that you can hire from right i do think that's one of one of the like really major advantages of distributed over co-located teams is the fact that you're able to hire from a huge one of the like really major advantages of distributed over co-located teams

Starting point is 01:18:25 is the fact that you're able to hire from a huge portion of the globe instead of just this one city right whereas you can only hire from and co-located you can only hire people who are already in that city or or or who are willing to move there if you're doing if you're looking for experienced engineers well chances are they don't want to move because they have a family in the suburb where they are. Right. And I have found that the best engineers kind of come from the most random places and live in the most random places. Yeah. Yeah.

Starting point is 01:18:55 That's just something I observed over my whole career. And so when you say I want to be co-located in, let's say, San Francisco, which is obviously a very common place to have a startup. Well, you are severely limiting your talent pool um as a consequence it means that the quality of engineers will be less than they would be if you're distributed just because you have access to so many less engineers so i think that's a major thing although i do think distributed is still better even without the recruiting aspect being such a huge advantage yeah like like some of the problems in a co-located team, like if you're co-located in an office, it's just like distractions, right?

Starting point is 01:19:30 Programming requires focus, and an office environment is kind of inherently unfocused, like with distractions. There's office layouts which are better than others. The most common office layout, of course, is the open office. Well, it's been a long time since I've worked in an office, but I presume that's still the most common office layout, of course, is the open office. Well, it's been a long time since I've worked in an office, but I presume that's still the most common one. And yeah, it is.

Starting point is 01:19:50 It's like an open office is if you decided, like, I am going to design the worst possible environment for programming. Like, I'm going to engineer the worst possible environment, and that's what you would end up with is the open office. People walking around, the bathroom door environment and that's what you would end up with is the open office people walking around the bathroom door opening closing people chewing on chips next to you people having a conversation like it's really hard to focus in open office you have that like running joke which everyone's heard of like oh i get all my work done after everyone's left the office all right well then then don't work in an office if that's the case so anyway that's one of the core principles of just like red hot lab company is a fully distributed company and of course i

Starting point is 01:20:31 was inspired by other companies that were doing it and having a lot of success with it just to see that was possible and you know it's worked very very well for us we basically have a morning meeting where we kind of sync up in the morning as a stand up. We also do a fun thing every morning. We do something called the question of the day. So every day we rotate and once your turn, you can either ask a question to the rest of the team, like something personal or just or you can also do share the day. Like you can just share something interesting about yourself or something you found. And, you know, we've been doing question day now for four years.

Starting point is 01:21:04 So the questions have gotten like really weird and esoteric, which is really fun. Can you share an example question? Oh, man, I think the recent one was like, what's an interesting, what's a story from your one of your parents childhoods? Yeah, I don't know if it tells you that much about the person, but it's an interesting thing. And you know, you know, people have really, I guess I would say we've heard some pretty wacky stories from people or like, I don't know. One I asked a long time ago, I remember was, what's, I asked this twice. So one time I asked, what's a mystery in your life that you haven't solved? And another one was, what's a mystery in life that you did solve recently? So all sorts of wacky stuff on that one.

Starting point is 01:21:47 And I think the idea behind question of the day is that, like, so when you're on a co-located team, you kind of naturally get, like, camaraderie because you go out for drinks after work or you get lunch together or whatever. So you naturally just have a lot of, like, socializing, right? Whereas a distributed team, you have to be more intentional about it because it doesn't happen naturally. So Question of the Day is a way for us to just be people to each other as opposed to just screen names, right? And likewise, that's also why I think pair programming is very important in a distributed software team. I don't think pair programming is really that important, at least not important as a regular everyday process on a co-located team it's like we repair every day and when we pair it's not it's not like

Starting point is 01:22:30 two people or it's not like you're working together as equals on the project it's one person that's driving the work it's whatever so like one person will drive the other one will follow and it's it's the driver's project right and as a follower you know you're there to see what they're doing and then to maybe help out if you can, maybe talk through some design issues. But mostly it's just about having individual face-to-face time with your teammate so that you build that camaraderie. So I say that's the main goal of pairing.

Starting point is 01:22:59 We pair for one time for 45 minutes every day. And the second goal is just knowledge sharing, right? So now you're learning about this aspect of the code and basically repairing with them. Knowledge sharing is like something that like I've thought about a lot in like ever since I started building a team. Because that's one of the primary problems you have to solve in building a company. So I've learned a ton about that. Like there's a lot of stuff we do for knowledge sharing. Some of the stuff we do for knowledge sharing, I think one of the best processes we use is something we call reverse story time, where that is where we rotate this

Starting point is 01:23:33 every month, whose turn it is. So we usually do this once every like, I'd say four to six weeks. But once your turn, you have to give a presentation on some part of the code base that you did not build something someone else built so this accomplishes two things first of all the best way to learn something is by teaching it so that just does that and it it gets that subject taught from a new perspective and more importantly from a beginner's perspective and so we record every reverse story time. Now we have like an archive of like tons of them. I don't know, probably more than 30 at this point. And also that's a really good resource for new employees to be able to learn, right?

Starting point is 01:24:13 They can watch these 30 minute reverse story times and actually learn the different aspects of the code base. That's super cool. I mean, if I know my code is going to get read by someone trying to trace back and to get blames, you know, it keeps me more on my toes, I think. I'm not doing anything too stupid. Yeah, so those have been good processes.

Starting point is 01:24:30 And, you know, and we've adjusted over time. Like, so for stand-ups, we used to just do it, like, live. So we just go around. Everyone has one minute, and you just give your update. And actually, recently, we changed that. So we still do the stand-up meeting but you give your stand-up update in an email beforehand to yeah and so now first of all that is it shortens the length of the meeting and it creates an organized place where you can have

Starting point is 01:25:00 further discussions about whatever the standard update is, right? So someone can update about some part of the system, like, oh, I'm taking this approach for data transfer, and then maybe someone will respond to that email and create a thread of like, oh, why are you doing it this way? Have you thought about this? Whatever, right? So that's been a great addition. This has been a great, like, small change to our process, but I think makes us work more, like, efficiently a little bit.

Starting point is 01:25:24 So now the stand-up update is more, or the stand-up meeting is actually about doing the question of the day and then deciding what is the pairing session is going to be for that day. So now it's like a 15-minute meeting or whatever, and then that's it. That's pretty cool. We tried that on our team a while back, and this was during the pandemic, where we switched our stand-ups from like all in person to like two days a week we would do over a call and three days a week it's like a slack bot will ping you like what you got done what you're doing today what you need help on and

Starting point is 01:25:54 whatnot we found as a team that was an extremely effective way to for both sides actually for people who were writing the update because they got a chance to think about what they did and what they're planning to work on and specifically ask help on certain parts like hey i want to discuss xyz and that they're exactly the thing you described like a bunch of threads after the stand-up and was very effective for people consuming that information too because when you're starting some people are sometimes not paying attention so they miss it anyway totally agree on that yeah oh this has been an awesome chat nathan and you thank you so much for being so generous with your time before we close is

Starting point is 01:26:31 there anything else you would like to share with our listeners man nothing's come to mind we talked about everything from to rama we'll include the blog post in the show notes. Oh, for sure. So people can visit. Yeah, sounds good. Can we hit Nathan with our question of, you know, what's your favorite software misadventure?

Starting point is 01:26:54 Sorry, it's very cheesy, but if you have any stories to share. Or a misadventure. Another way to think about that is what's a failure of yours that you learned the most? Yeah, I mean, in the development of RepLogLabs, yeah, so that thing I described about going the

Starting point is 01:27:11 wrong direction on the P-State API, and just, yeah, that was probably the biggest misstep, I think, or just like, you know, basically that whole four months could have been. Okay, so what made that like a particularly big misstep is that the whole path abstraction i'd already developed so when i so so rama's written closure and in just developing rama just developing a paradigm writing a compiler and whatnot i needed to develop this path abstraction just just to make it easier just to do just that regular stuff that regular programming stuff just to be able to just work with data structures more easily. And Clojure, it has immutable data structures. So you're always working with immutable data.

Starting point is 01:27:52 And it's really cool how it works. Like you take a map and you add a new element to it, it actually returns you a new map instance, but it's very efficient. Basically, it shares structure between the two instances, which is how it's efficient. But it has this implementation for vectors and sets and other data structures as well. And so a lot of stuff in developing Rama, I'd end up with a set inside of a map or whatever, right? And it was very cumbersome to manipulate structures like that. And actually in the compiler, it was much more,

Starting point is 01:28:19 it could get very complicated where the code that you're writing is actually a graph computation. It's a data flow graph, right? And sometimes some of those nodes actually have a data flow graph within them. So very, very complex structures that I needed to be able to do compiler analysis on where I have to do traversals and these very complex nested manipulation. And everything is immutable, right? I want everything to be immutable because there's so many advantages to having your data be immutable.

Starting point is 01:28:46 So I developed this library encoder called Spectre, which was this path API for generically querying and manipulating arbitrarily compound structures. And it's super fast as well. So I actually already had Spectre. I already had the path abstraction. And so I went down this path of the peak state design, where I was thinking in terms of like, okay, I'm literally going to have a map p state, and really have a set p state, and literally have a list p state, you compose them together, but then you manipulate it by calling gets, or get the nth element or whatever. And then I'm going to have reactive versions of all these queries. So I have get and get reactive, and so on. And then you try to compose it like that. And it just was not working at all when I was trying to actually use this approach to actually express my million use

Starting point is 01:29:34 cases that I had. And then ultimately, I realized like, well, P states are nested data structures. And Spectre and PaaS are all about being the most expressive, powerful way to do persistent data structures. Why don't I just use PaaS on P-states and bake the reactivity into PaaS themselves? And that was the big moment where suddenly everything fit together. And then I was able to scrap everything I did those three or four months and then take this new route and so um yeah like on one sense it's like feels incredible to to have this breakthrough where i like i just found like a fundamentally better way to to to interact with an index and it not it not only is generic and very concise and elegant,

Starting point is 01:30:27 regardless of the complexity of the index. So not only like every data model that exists, but any sort of permutation you could have a data structures, it's super elegant. And it has this new capability of arbitrary fine-grained reactivity. It's a major breakthrough in those two respects.

Starting point is 01:30:43 But then another sense, you feel really stupid that I already had paths for a long time and I went down this road, this long road, right? But yeah, I'd say that was a pretty big, that was definitely a misstep. No, but to your point, every time when I get stuck on a really hard problem, I think about how nice it will feel once I actually solve it. I gotta say, it was like when we first had...

Starting point is 01:31:11 When I had our Twitter scale Mastodon instance running for the first time at scale, very, very high performance, that felt good because there was so much that went into that. That must have been your realization of all the work that went in, I imagine. Yeah, it was kind of the culmination.

Starting point is 01:31:30 That was the culmination of the original vision, right? To be able to build an application like that, which is so costly otherwise, at such low cost and at such high performance was great. That's pretty awesome. Well, we'll add links to Red Planet Labs, Rama, and to your blog posts in our show notes too. And we'll also link your Twitter or X profile

Starting point is 01:31:52 where people can follow you and learn more. And for everything today, Nathan, thank you so much for taking the time. This was awesome. We learned a lot about Rama, about you, and I'm sure our listeners will too. And we highly encourage them to go check it out. Awesome.

Starting point is 01:32:08 Great talking to you. Thanks so much. Thank you. Hey, thank you so much for listening to the show. You can subscribe wherever you get your podcasts and learn more about us at softwaremisadventures.com. You can also write to us at hello at softwaremisadventures.com. You can also write to us at hello at softwaremisadventures.com. We would love to hear from you. Until next time, take care.

Software Misadventures - Nathan Marz - On changing the economics of building large-scale software with Rama - #23

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.