Software Misadventures - Nathan Marz - On changing the economics of building large-scale software with Rama - #23
Episode Date: September 22, 2023What does it mean to change the economics of software development? Nathan Marz joins the show to share how they reduced the cost of building Mastodon at Twitter-scale by 100X and the 10 years journey ...to build Rama, a new programming platform that made this feat possible. Nathan is the founder of Red Planet Labs. Prior to RPL, he led engineering for BackType which was acquired by Twitter in 2011. Nathan created the Apache Storm project and wrote the book Big Data: Principles and best practices of scalable realtime data systems. Outside of working, Nathan is a private pilot, loves going to stand-up comedy shows, and is forever trying to teach his dog new tricks. Show Notes: Nathan’s Twitter: https://twitter.com/nathanmarz What is Rama? https://redplanetlabs.com/learn-rama Reducing the cost of building Mastodon at Twitter-scale by 100X: https://blog.redplanetlabs.com/2023/08/15/how-we-reduced-the-cost-of-building-twitter-at-twitter-scale-by-100x/ Stay in touch: ✉️ Subscribe to our newsletter: https://softwaremisadventures.substack.com 👋 Send feedback or say hi: softwaremisadventures@gmail.com Segments: [0:00] flying [0:07] inefficiencies of backend software development [0:17] suffering oriented programming [0:23] AI programming? [0:25] RAMA’s programming model [0:33] deployment & monitoring with RAMA [0:36] building a twitter clone at scale with RAMA [0:43] migrations with RAMA [0:54] driving adoption for RAMA [1:01] fundraising [1:15] building a fully remote team
Transcript
Discussion (0)
Welcome to the Software Misadventures podcast.
We are your hosts, Ronak and Gwan.
As engineers, we are interested in not just the technologies,
but the people and the stories behind them.
So on this show, we try to scratch our own edge
by sitting down with engineers, founders, and investors
to chat about their path, lessons they have learned,
and of course,
the misadventures along the way.
All right, Nathan, welcome to the show.
Super excited to have you with us.
Great to be here.
So Nathan, we thought we would start with asking you about being a pilot.
Can you tell us more about how you got into flying?
So I think it was back in 2013, I had a friend who was taking flying lessons, and that just really piqued my interest skill to be a pilot and then just adventure like
it just feels really adventurous to just be cruising at 3500 feet and you can kind of just
go anywhere you want when you were taking lessons was there any oh shit moment that kind of made you
like question everything i was trying to learn how to paraglide and then i think after like three
classes i was like oh shit the tolerance for failure is narrower than when
I looked at it from the ground. Was there any close calls or anything like that?
I wouldn't say close call. I had one moment, which was a little spooky. This was my second
time I ever soloed. So I was taking lessons in Palo Alto airport. I decided on this flight to
fly like to the coast. And there's this like tiny little island with this like one
building on it that's like 500 feet off the coast and i i just wanted to fly there and just go down
to a thousand feet and circle it and then fly back that was actually on the way back i was flying
north along the coast usually pilots fly at some multiple of 500 feet that's like the altitude you
would keep but i figured i'll fly at not 3500
i'll fly at 3700 feet because why fly at the same altitude everyone else is flying
and so it's really hard to see it's actually pretty hard to see other planes from the air
like they can blend into the sky or the clouds and they're just really small you get that's one
of the skills you develop as a pilot you get better at like just seeing other planes so that you can avoid the traffic but anyway i was flying at at 3 700 feet
and then all of a sudden in front of me about 200 feet down from me there was another plane
and we we literally were completely aligned like if i was flying at like at 3 500 feet we
were like we would like hit each other like nose to, presumably avoid each other. So that was a little bit spooky.
Collisions in flying are incredibly rare,
especially now.
I think in 2020, there's a new regulation
that all planes need to have a new sensor in them.
It's called ADS-B,
so you'll actually get notifications
if you have planes near you,
as long as you're near a control tower,
which most places are, unless you're really out in the boondocks.
But that was a weird moment.
But it wasn't really a close call exactly, but it was a little spooky.
It certainly didn't stop me from continuing to fly.
So you mentioned something called a spin training in one of your blog posts.
What does that imply? Well, spin training, okay, so you don't have called a spin training in one of your blog posts. What is that in flying?
Well, spin training.
Okay, so you don't have to do spin training as a pilot, but it is a situation you could find yourself in.
So I did think it was important to at least experience it and know how to get out of it.
So a spin in a plane, it basically it's more of a tumble so it's basically when the plane stops
flying and it's just basically falling like a rock and it's tumbling like end over end and so
it's actually pretty hard to get a plane into a spin especially like the Cessna 172 that I was
flying so you like really have to put a lot of effort to do it it's pretty easy to get out of
one as long as you know what to do so you know you know, I asked my instructor to show it to me. And, you know, beforehand, I prepared myself.
I looked at like lots of videos on YouTube of people doing spins, but wow, nothing prepares
you for the real thing. So technically, it's a stall with rotation. A stall basically means
the wings are no longer flying.
And then rotation means you're just rotating every which way, including inverting.
And so you lose about, I think, 600 feet every couple seconds or something like that.
So I think we started the spin at 4,000 feet maybe.
And then by the time we got out of it, we were probably at like 2,000 feet.
But yeah, you pull about.
It just feels weird. It's like nothing else you've ever felt one of the expressions i heard about it that
i heard someone describe you guys is as a roller coaster without rails i think that's a really
but yeah i mean and then and then once you're when you're getting up so when you exit the spin
when you do the procedure you're actually just you're actually just vertically flying straight
towards the ground so then you have to pull up slowly to get out of the dive and you pull about two g's
so that means you feel double your body weight as you pull out so it's pretty intense i think
i've done i think on that lesson we did four spins and then you know i didn't throw up which is good
although i do i do in the future i do want to do like actual aerobatic training where you pull like
i think up to four g's i think four positive g's and sometimes you pull i think negative one and a
half g's i'll most likely i'll most likely throw up when i do that training that seems incredible
because i don't have a great stomach but aerobatic training just seems like super fun
is that where you can then like do shows or?
No, it would just be for fun.
I wouldn't be.
Impressed with it before dinner.
But that's what you do stuff like barrel rolls and hammerheads and,
you know, loops and all sorts of fun stuff like that.
Nice.
Well, I don't do very well on roller coasters,
so I'm not sure if I would be into trying that
necessarily. I definitely recommend you do just an intro flight lesson because the first time
you're not just in a small plane, but when the instructor tells you take the controls and you
can fly it, it's an incredible feeling when you take the yoke and you just move it a little bit
and the whole plane moves around you, like it's really incredible.
And I want to be clear, like a lot of people think small planes are really dangerous because the only time you hear about small planes is on the news when they crash.
But they're actually like very safe.
The main reasons a small planes crash are pilot error.
So one would be, believe it or not, one of those most common reasons planes crash is because they run out of fuel.
Wow.
It's so stupid to me because what you're supposed to do as a pilot is you have a checklist.
You have a pre-flight checklist.
And you religiously follow that every single time, doing all the checks of the plane, all the components, including checking the fuel.
Like what you do when you fly a small plane is you literally open the fuel tank and you look inside and you measure it.
There's a special stick you use to measure.
If you do that, you're never going to run out of fuel on a plane.
So it's really, really like stupid error for a pilot to make to just not go through the pre-flight checklist.
And the other reason planes crash is a pilot flying into weather they shouldn't be flying into, which, again, very easily avoidable if you just check the weather before you fly.
Great resources to check the weather.
Certainly that's not going to happen on an intro flight.
I love taking people up for their first time in small planes.
Yeah, Rane is now down, but I have a volunteer right here.
No, no, no, no spence.
Yeah, yeah, yeah, we won't do it. I. Yeah, yeah, yeah. We won't do it.
I might show you a stall.
So coming back to the more technical side of discussions.
So Nathan, we actually met at Insight back in 2015 when you came to talk to us about Apache Storm, which you built and Lambda Architecture, which you wrote a book on.
How did you go from that to building Right Planet Labs?
Yeah, well, at that time, yeah, I mean, I was getting a lot of demand
from people wanting consulting and support services for Storm.
So I very easily could have started a company.
That was a proven model, you know, successful open source project
with a ton of traction.
It would have been very easy to raise money and pursue that route.
And Storm, like, Storm did a lot.
It really advanced the state of scalable real-time computation.
It was the first project to do that in a fault-tolerant way.
And also just in an easy way.
It was very easy to use and do that kind of stuff.
But I was always thinking deeper.
I think what has always really motivated me and really interested me about just programming in general, like it's the only field of engineering where you can completely automate what you were doing before.
And instinctively, it seems like the work you should be doing should be whatever is unique to that thing.
And you should otherwise be able to reuse every other piece of it.
Right.
And with backend development, it did not feel like that at all and
like storm helped certainly helped for just doing creating the capability of real-time computation
but when you looked at just like what it took to actually build the back-end for a product end-to-end
storm didn't move the needle there um really when you look at like the end-to-end cost and that was
the part that like really
bothered me and it was through working on my book and just just developing like the theories of the
book beforehand that i started to see that there's this different approach that could be taken which
really would move the needle and make it so that what it took to build a back end was closer to
what it just what it took to describe
it. And like it just to like give an example of that, right? I like to use the Twitter example
a lot just because Twitter is a very well known product. And I used to work there. So I know what
went into building that product at scale. And so like the original Twitter consumer product,
they reached scale in 2011. It was started in 2006. That's a product you can describe
what it does. Timelines, social graph, follows, retweets, hashtags, search, et cetera, et cetera.
You can describe every feature of that product in a couple hours, max. It's not very complicated
to describe all the different user experiences and flows that you go through in that product.
But it literally took Twitter 200 person years to build that product at scale.
So again, we're in a field that's entirely about abstraction, automation, and reuse.
So how is it taking 200 years to build something you can describe in two hours?
And it's not just Twitter.
You look at any product, especially at scale.
There's just huge disparity between how long it takes to describe it and how long it takes to build it at scale. There's just huge disparity between how long it takes to describe it
and how long it takes to build it at scale.
And so this new approach
that I saw the broad outlines of
seemed to me something
that could really change that,
make it so that that cost would be much less
and just fundamentally change
the economics of software development.
And so that really interested me and seemed much more important
than building a company around Storm,
which would have purely been about monetization at that point.
I don't think Storm as a project was going to change the economics
of software development to that extent, not nearly to that extent.
So that's why I decided to pursue Red Planet Labs
and then just with Storm,
donate it to Apache
and let it just be a full open source project.
How did that vision of Red Planet Labs,
like did it evolve over time?
Yeah.
I mean, basically what I started with
and what took me years to figure out was
what is a common set of abstractions
that can express any application end to end with just that one tool?
So it's like completely inclusive.
It handles everything in the back end that you need, data ingestion, processing, indexing, and querying.
So basically what I started with and what I knew from writing my book and developing Lambda architecture. I think the most important thing which I explained in my book
was how to look at building software
applications from first principles. When you look at how
back-end development has been done since the 80s,
the gold standard has been the relational database.
The relational database is not it's not based on the true first principles of back-end development i know that's going to sound
sacrilegious to a lot of people who consider it the gold standard but it's really not like what
are the what what exactly are the principles of it like like the idea behind relational databases
that you have you have tables you have keys you have tables, you have keys, you have
comms, you have foreign keys. So that's the model, right? Now, can you say from that how that
encapsulates all possible systems you want to build? Like it is very unclear. There's not a
direct connection from that. So what I, like the first principle, which I showed in my book,
it's so simple and it's so clearly encapsulates every possible system to ever want to build.
And it's query equals function of all data. So any question, like a backend is all about
answering questions, right? What is Alice's current bank account with Alice? What is Bob's
location? What is the total number of page uses
URL or range of time and so on and so on, right? And the most general way to ask a question is to
literally run a function, an arbitrary function over all of the data that you've ever seen,
that your application has ever seen. So clearly that encapsulates everything you could possibly
ever want to do with any system, right's that's clear right that's a much better
starting point than the relational database model which is arbitrary right um and so in my book so
obviously you can't literally do that you can't literally run a function over your 10 terabyte
or 10 petabyte data set whatever size it may be every time you want to ask a question so in in my
book and with lambda architecture i showed well what, well, what is the smallest set of
trade-offs you can make to actually have a general model?
And all you have to do is add the concept of the index.
So query equals function of all data becomes indexes equals function of all data and query
equals function of indexes.
And that's actually, that does capture
every single backend system that's ever been built.
Now, with the way backends have been built since,
you know, basically for my entire lifetime,
you're using different tools for each one of those pieces,
data, functions, indexes, and queries, right?
And so what I was looking for
when I started working on Red Plot Labs is,
well, how can I generally meet that model, like build a general purpose system where you can have arbitrary indexes, arbitrary ways of computing those indexes, and arbitrary ways of doing those queries at scale with a simple set of common abstractions that compose together into any application you want to build, whether it's Twitter or Google Analytics
or a bank or what have you.
And it took me a long time to figure that out.
But that's basically what I started with.
So I had the general model, right?
Indexes equals function of data
and query equals function of indexes.
I also, the other thing I had figured out by that point
was just like, like specifically on indexes so when
you look at how databases work they all they're actually all narrow like there's no there's no
such thing as a general purpose database now they all have what they call a data model and that's
that's all it can do right so you can have relational document graph column oriented like
and so on and so on and so that indexes data in a
very specific way, and then it has very specific ways in which you can go about querying those.
And what I realized back then was that a much better way to express indexing is as data
structures, not data models. And in fact, every data model is just a particular combination of
data structures. Key value is just a map. Document, a map of maps, column oriented is a map of sorted maps,
and so on and so on, right. And so I knew at that point that the right way to express indexes was as
data structures, so that, you know, to build an application, you'll need to build many indexes,
and each one of those can be shaped exactly as you need it to meet every
individual use case of your application, which is a big problem that you see with backends that are
using databases, which is every backend right now. But as soon as you choose to use a database,
which you have to now, not that it's really a choice, but the moment you choose to use a database you have now created a
lot of inherent complexity in your application because you have to twist your application to
fit that data model and there is no data model that will that will fit your application perfectly
and this is like the first and possibly the biggest impedance mismatch which you take on
once you like at the start,
like as soon as you start your application. And so that's like a huge problem and a huge
contributor to complexity. The fact that you can't actually model your indexes exactly like
you need to for your application. So that was a starting point. So I knew the general model
indexes equals function of data or equals function of indexes. And I knew that indexes should be expressed as data structures.
And I knew that, you know, to actually build a full application, you'd be materializing many
indexes of many different shapes. And probably wasn't until I'd say 2016, that I had, I was
confident that I, that it was possible, that I had figured, that I had figured out what the abstractions were.
So it was pretty difficult, to say the least.
So in some of your posts, you mentioned this idea of suffering-oriented programming.
Was that one of the key principles that you thought about, too, when you were starting
to build at Planet Labs?
How hard it is to build these back-end systems?
Yeah, well, the suffering phase was everything i did before replant labs so everything i'd ever built and also everything i'd ever helped people build right through my open source work the idea
behind suffering going into programming is that like don't build abstractions in the abstract
like don't build an abstraction until you have suffered through the pain of not having an
abstraction and you know what are all the use cases that such an abstraction would need to
fill and all it's like weird permutations so certainly i had experienced that beforehand
just building scalable systems i call it i call like the current approach the a la carte model
right which is that you pick you pick different tools for different parts of your system and then you get them
to fit together and I had certainly
experienced how fundamentally
flawed the Alucard model
according to some of the things I just described
with the fact that a database is inherently
flawed, how you have to
twist your application model to fit
the data model and so on
and just other problems with getting things to integrate together. So yeah. So basically building Reptile Labs and building Rama was suffering oriented
programming on steroids, where my list of use cases was literally every application
that I'd ever worked on or ever helped working on. So it was a pretty expansive,
just like set of use cases I had in my mind trying to unify them together into a concept of abstractions.
So you recently announced Rama, which as you describe, it's the 100x programming platform.
And we want to talk more about Rama.
Before we get there, you started the company in 2013.
And you mentioned that around 2016 is when you kind of found the right abstraction to start building on top of.
What did that 2013 to 16 period look like?
Because if I remember correctly, you hadn't started a team around that.
Like you were the one doing all the research yourself.
Can you tell us more about what that phase looked like?
Well, I didn't fundraise and hire anyone until 2019.
So 2013 to 2016 was a lot of me
sitting at my computer staring into space,
thinking, and a lot of just,
I had like this one big text file
where I was exploring stream of consciousness,
just like, here's an idea, let me work through it,
let me test this idea,
let me test this idea for an abstraction
against all these different use cases
and see what happens.
And just like very slow process of kind of getting closer to what the right abstractions
were.
I think the main thing I had to figure out from 2013 and 2016, there's actually a new
programming paradigm underneath Rama.
And actually when you use Rama, like Rama has a Java API, a regular Java API, but that
API is actually expressing a subset of that
programming paradigm basically the programming paradigm it's actually like a general purpose
programming paradigm a new one it's basically like data flow programming but generalized into
a general purpose language so you can do all the things you can do with a regular language like
variables and conditionals and loops you can do in data flow but it's expressed differently
and data flow is a great abstraction for doing distributed programming, as we already knew at that point from Dataflow tools built on top of things like Hadoop.
And so what Rama is doing is it's way generalizing the idea of Dataflow.
And so a lot of that work from 2013 to 2016 was discovering that program paradigm and seeing how that you
know how things would fit together to be able to just express arbitrary applications were there
stuff that happened in that time period all the way leading up to 2019 like new technologies that
came along were getting more adoption i did that kind of impact your interests as well as like how
you think about the problem because right like i feel like one of the reasons why lambda architecture caught on and all these applications became a thing is because
storage got so cheap right and did you would compute like without those sort of advances
it's very hard to kind of ditch the database right so to speak yeah yeah yeah for sure and i don't
think i think i think the big thing was storage becoming cheap for just enabling these other ways of building things.
And that was the case well before 10 years ago, right?
That was the case 20 years ago,
like when MapReduce became a thing,
when that was a thing, right?
I don't think there's been,
there has not been any fundamental advance like that
in the past 10 years.
Obviously, there's been a lot of innovation,
a lot of new tooling,
but everything is still working under this, everything is still doubling down on this a la carte model of to build a back end you're going to have a dozen or two dozen or
more different tools that you're actually using and fitting together in some way where you're
still like everything is still very narrow very specific and forces you to take on these impedance mismatches.
Like I described with data models for databases.
So nothing,
nothing in the past 10 years chain,
ultimately like what I was doing was replant labs and Rama was a really
fundamentally different approach to the way software has been designed for
like my entire lifetime to actually have one tool based on a simpler set of primitives
that can compose to all these other things that people are doing with specific tooling
and able to build your backend end-to-end on a single platform instead of a dozen different
platforms. So Ranec asked the serious and the good questions here and I asked like the really
trolly ones. So to stay on brand, like maybe generative ai gets so good that now you have this blow to the software but it
still somehow works right like you just basically be able to write all this like crappy code but
maybe you have some ways of like performance tested such that you can still kind of package
it up such that you don't necessarily well okay, okay, I think I sounded smarter when I started.
So you're wondering what's like AI programming? How does that affect?
That's the smarter question to ask.
Yeah. Well, AI programming is still in its infant stages.
It's certainly not capable of building a back end end-to-end for you.
I mean, I think ai is ultimately going to be
limited by the same things that limit human intelligence i don't think it's like magical
and i think if you have a much simpler set of primitives that you're building upon i think ai
will do a lot better and if you're using a dozen different tools that creates all this complexity
that's going to make it a lot more difficult for ai to reason and it's still going to be difficult to to you know to operate in production like one
of the cool things about rama because it's such a cohesive general purpose platform it's a much
better target for ai for building back ends than you know the hodgepodge of a million different
tools that you have to use currently so i'm actually really excited to explore that in the future.
After paying for the ChatGPT premium,
that is one thing I noticed quite a bit is when you ask you to do stuff,
it's a lot better at writing out the intermediate steps, which to our point here translates into those more generic sort of models
before you continue on.
So one way I'm thinking about Rama is that you're saying instead of an engineer going
and saying, I want to use a relational database, or I want to use the specific messaging queue,
or whatever the technology may be and build around that, you're saying, let the engineer
describe with the abstraction that Rama provides, what it is that they're trying to do.
And the tool behind the choice,
whether that's a relational database
or a key value set or whatnot,
that's an implementation detail.
But what stays the same is the abstraction
or the use case that the user is describing.
Is that the right way to think about this?
Yeah, let me describe what Rama is.
Let's take a step back.
So you're not using database when you use Rama.
Rama's doing all that stuff with a simpler set of abstractions.
So everything that a database does, Rama's doing for you.
But you're not using any of the tools.
Rama's doing everything.
So I'll describe the Rama's programming model.
So again, I described the first principles of building backends.
Indexes equals function of data. and query equals function of indexes.
And that's basically the program model of Rama.
So you have four concepts, right?
Corresponding to each of those things and those first principles, right?
So you have the first thing in Rama is called a depot.
That's how data comes into it.
And a depot in Rama is a distributed log of data.
Think of it, it's actually exactly like
Apache Kafka, but built in and integrated into the system. Then you have ETL topologies. Again,
all this stuff is inherently distributed. So ETL topologies consume data from depots as a stream,
and then do computation on it, and then produce indexes, which are called partition states,
which is the next concept.
And we usually refer to partition states as P states.
And partition states are how you do indexing in Rama, which as I described before, it's
in terms of data structures.
So to build an application, like if you look at our Twitter scale, Mastodon instance that
we open sourced in one of the modules that's doing like the core stuff.
So profiles, timelines, fan out and all that stuff.
I think there's 33 P states
with a huge variety of data structure combinations
between them.
So when you're building an application in Rama,
you'll materialize potentially many, many P states.
All again, completely fine tuned and shaped
precisely for your application.
And the last concept is querying.
So how do you actually query your P states?
So there's two ways to query in Rama.
So one is called point queries.
So that's when you just want to fetch information
from one partition of one P state.
And it uses what's called a path-based API.
So these P states are arbitrary combination data structures.
And so we have a mechanism
so it's very, very easy and concise
to reach into a P state, regardless of how complex
the structure is, to retrieve a value or some aggregation of values.
The second kind of query is called query topologies.
And these are predefined queries that can look at any or all of your P states and any
or all the partitions of your P states.
And it's basically a real-time, on-demand,
distributed computation looking at all that stuff.
So you can do some really powerful stuff with query topologies.
So query topologies would be analogous to a predefined query
in a SQL database, for example,
except it's defined using the exact same API
that you use to define ETLs, which is the regular Java API.
And it lets you reuse code between both contexts
as well as being generally a lot easier to just manage
because it's not using some bespoke system
or registration system like you would in a SQL database.
So those are the main concepts.
And you can see how it's literally just the first principles, right?
So indexes equals function of data.
So that would be depots, ETLs, and P states.
And then queries equals function of indexes,
which is just how the two different ways of putting P states.
What was your original question?
I was basically thinking,
trying to think about how, as an engineer,
like when we're building these backends,
we are so used to just thinking about,
okay, think about a data model,
because, well, that's what we have been doing.
Well, if you pick, let's say, a relational database, well, you've got to have primary keys,
foreign keys, rows, and columns. If you pick a key value store, then, okay, depending upon the kind of data store you choose, you will be limited by some of the features you can build, or you
might have to kind of reimplement some of that in their code. So when someone's starting to now use
Rama, for example, in a way, should they just stop
thinking about databases, for example, and think about what do I want my applications to do? And
let me define that data structure and whether Rama stores it, how it stores it, what data
structures it uses under the hood. That's implementation detail for the engineer.
So that's what I was trying to figure out. Yes. So you're liberated as a
programmer using Rama because you're no longer
restricted by your data models, no longer
have to twist your application
to figure out how can I fit it
into this data model. So
again, Rama's P states, it's just
data structures, right? So if you want to
use those data models, okay, well,
that's fine. Use that data structure combination.
If you're going to do something else, if the way to to meet that use case is different data structure
combination well now you can do it instead of having to twist your application so like for
example so basically the way to approach developing a rama application i actually have a great on the
rama documentation which is on our website on the last part of the tutorial, I actually,
or it goes through this process. The last part of the tutorial is building like a Facebook style social network from scratch. So like bi-directional relationships, like a post, like a wall for every
user with posts and stuff like that. It only ends up being 180 lines of code at the end of it to
fully scalable, like social network. and it goes through the process right so
the way you start is well what are the queries i have to do what are the questions i need to be
able to ask and what's the data coming in so for something like a social network it would be like
who who are the followers of this user and i need people to ask that in a way such that i can
agitate through them right because someone might have a million followers.
Or likewise, who are the friends if you're looking at like a Bidrack thing?
What are the friend requests?
What is a page of posts on someone's wall?
Like what is someone's profile, right?
What is someone's age, whatever, right?
Or maybe you have an analytics query in there or something, right?
Like how many users signed up per day or something like that.
So you start with your questions, right? And because ultimately what an applicant is, right? What are the queries
I need to support? And then you think in terms of, okay, well, what's the data I have coming in?
So you have things like account registration, friend requests, accepting a friend request,
making a post and so on and so on, right? So then the first step is to actually start with the P states.
So, okay, well, what are, what set of P states do I need?
What data structure combinations do I need to be able to answer these questions?
And then what does it look like to ask those questions on these P states?
So it might be that one P state you create can answer 10 of your questions and another piece date might exist only to answer one particular kind of question.
So to give you an example, like if you look at looking at who someone's followers are and you also need to be able to say, like, how many followers does someone have?
Does not just looking at someone's followers, but also just asking, does this person, does user A follow user B?
Like these are the kinds of social graph questions.
So all that can be supported with a data structure,
which is a map to a linked set.
So a linked set is a set
that also remembers the order of insertion.
And so you can do this with Mama.
So like in our master implementation,
we have the followers.
P state is a key to linked set. It a map of linked sets when you want to get
the number of followers you just get the size of the inner set and even if the inner set has 10
million elements in it it's still a fast like you know less than a millisecond operation to get that
if you want to paginate through it then you're just querying that in order you're doing range
queries on that inner set if you want to ask if user A follows user B, well, that's just a
set membership query, right? And all the stuff you can do very, very easily with Java. Whereas
if you look at like a different part of the application, like personalized follow suggestions,
well, that's a completely different P state with totally different indexing, right?
And then once you figure out what your P states are, what the queries look like,
then you look to see, all right, how do I materialize and maintain those indexes from my data that's coming to my depots?
Like follow requests, accept follow requests, making posts, et cetera, right?
And that's where you write your ETLs to actually, right, that's you're making a function from data to indexes.
So that's the mindset, I guess, you use when using Rama.
And the great thing is that you're able to do all this stuff,
all this flexibility on top of a single platform.
And one of the really nice consequences of this
is how much it simplifies deployment.
Like when you look at companies building applications at scale,
like deployment engineering is no joke.
Like it is really costly.
Like you often have entire teams only doing that,
writing sometimes millions of lines of code and conversion.
And none of that code has,
there's no business logic in any of that code.
It's pure complexity.
It's plumping and putting bits on boxes.
It's crazy.
Like it's really wild.
Now with Rama being such an integrated platform,
when you can deploy your whole thing on one system, well, Rama understands how to deploy and update Rama applications.
That's like the core parts of the platform.
So it's able to do it in a general purpose way with all the best practices just built into the system so that you can take an existing application that you have running, which is called a module in Rama.
And you can say, I want to update it.
And we spend a lot of time on module update.
So it's completely fault tolerant.
It does the transition very, very smoothly
to transition responsibility between the two versions.
All the stuff people are doing manually
and in a very complex way currently.
So all that stuff is just free.
Because it's a general purpose platform,
it's able to implement it in a general purpose way.
And then boom.
Now as a developer,
you don't really have to worry about that anymore,
which is like,
I think one of the most like brutal costs of the a la carte model is,
is the fact that you have engineered deployment yourself.
Oh,
for sure.
Now the other,
yeah,
the other really nice consequence is monitoring,
right?
So Rama being such a general purpose platform, it's able to implement monitoring.
Like monitoring is the same thing, right?
Monitoring is you're collecting data, you're materializing using that data, and you need to have a way to query that data.
So Rama actually implements monitoring using itself.
So there's a built-in Rama module, which collects data and then materializes telemetry on that data.
Sorry, it's collecting the data from all the other modules and then materializing using that data.
And then it has a built-in cluster UI where you get very deep and detailed telemetry
on all aspects of your module or of all your modules, which I think it's really cool that
it's able to just be like recursive likeive like that. Rama is just using itself to implement telemetry.
It's not doing anything special.
The telemetry module is exactly the same like any other module.
So how much Rama helps on deployment and monitoring?
I was always thinking from the start in terms of end-to-end cost, and deployment and monitoring
are a very substantial part of the end cost.
So that's one of the things that really excites me about Rama,
that all that complexity is just gone when you're using Rama.
And it's so much simpler, so much easier.
I think it's super cool that you guys literally built a Twitter clone in order to just show how powerful it is.
And just to quote some numbers from the blog,
that it was only
10,000 lines of code comparing to like the 1 million that Twitter wrote to start with.
And then this is having 100 million bots posting 3,500 times per second at 400 average fan out,
which sounds like super impressive. The other aspect you measured is to quantify this 100x improvement
is how long it took to do it.
So nine months versus the 200, sorry,
nine percent months versus the 200 percent years.
Yeah, so I'm very curious to learn more about the trade-offs.
So in terms of the pros,
so in addition to deployment and monitoring, as you mentioned,
I imagine this is a bit harder to do, but I imagine bugs are also become like less and
much easier to fix, right?
A lot of the production bugs happen in between kind of systems that sort of come out of this
a la carte menu that you described, right?
So if you actually start from scratch, if i may i feel like a very maybe
bad example is like you know going from a like dynamic language like python you don't where you
don't have to learn to know a lot right to a compiled actually strongly typed where you have
to kind of specify all the things up front like you get that trade-off like now there's way less
you know random things can happen right like yeah like what do you think about like the debugging
and then, you know, the outages, like that aspect of things?
Yeah, well, like, first of all, bugs become much less,
when you have 100x less code, you're going to have a lot less.
Actually, no, that's right. No code, no bugs.
It's not just the line, it's not just another line of code.
It's just the reduction in complexity.
And again, when your code doesn't have impedance mismatches,
when you're able to actually represent data in a way that actually is optimal
and makes sense as opposed to having to twist it like you do with databases,
that reduction in complexity just helps a ton, right?
But of course, you're still going to have bugs.
We had bugs in our master implementation that obviously we worked them out before we deployed
it, but you work them out in the same ways that you would work out bugs in any system,
like through testing, right?
That's actually another aspect that really helps with Rama is just how much easier it is to test because you don't
have to to think about oh how do I start up these 10 different components create them with mock data
and all this stuff and get them to work um in in a test environment which is all within a single
process so Rama provides something called in process cluster where you can simulate a Rama
cluster in process and you can use that just like you would a regular cluster. Deploy modules
to it, add data to it, do
queries, whatever. So that's how we test it.
If you look at our master implementation,
we've written
a lot of test code using that exact same
approach. We deploy modules, we
append data, and then we do assertions on what
happens afterwards. So that helps a ton.
And of course, when it comes time, if you actually
have a bug in production, well, again, it it has module update built in. So you just update your
module to fix the bug. So I'd say that all that stuff on multiple different fronts, reduction
in complexity, the ease of testing makes it much easier to build like quality software.
Like we actually, we actually look at how like backends are built today. Like back end
programming, especially at scale, it's gotten so complex. It's beyond the realm of human
understanding. Like there's no one working at Twitter or Google or Facebook or any of these
services that really understands what's happening. Not to say they don't understand, but it's so complex
that you can only understand things empirically. So based on what I'm observing now, that's what
I understand. So it's not a surprise at all how buggy all these different platforms are.
Isn't it crazy that every one of these services you use, they have bugs on them all the time.
And they have billions of dollars and thousands of engineers how come they can't fit
sometimes the bugs are like brain dead like how is that even a bug you know and these are these
are the companies with good engineers right not to mention the companies that that are you know
weaker engineering teams how are they so buggy like such visible bugs in a consumer product and it's just
because no it's impossible to to comprehend these systems because they're so complex because you can
describe what these applications are doing again in two hours but they've invested hundreds of
years person years in developing them so like like there is a big complexity problem in in
software and more than anything else that's what rama is tackling is this extreme
reduction in complexity like rama's like what rama's doing is it's just you're able to do
things in a way that avoids the complexity that people have been taking on for decades
um and it's it's it's not more it's not like it's doing anything magical it's it's really about what
you're not doing with rama which you're forced to do when you're kind of doing stuff
the traditional way with the alacrity.
Going a little bit meta,
so in the process of building out this Twitter clone,
did it change some, I don't know,
like code in Rama itself of basically
where you draw the boundaries between different concepts
and things like that?
Did it have any...
No. We basically just
used Rana as is to develop the proof.
Just because the
model is... I mean, the model
makes sense. It is general purpose,
right? Core equals functional data,
right? Expanded to function
equals... Sorry, index equals
function of data and core equals function
of indexes. Like, that's the model, model. It makes sense. It's general purpose.
It clearly encapsulates all systems. And that's the model using the program
in Bamba. And so building Twitter was one application of that
model. We could have built anything else. We specifically chose Twitter just
because we were familiar with the actual implementation of Twitter.
And so we could do a true comparison against cost and when you actually look yeah when you actually look at like twitter like
why was it so expensive to build and the reason was like the like one of the main reasons was
because of just how much specialized infrastructure they needed to build over the years because they
needed to represent their social graph and there was no tool which could do that the way they needed to represent their social graph and there was no tool which could do that the way they needed it so they had to build it from scratch they had to they had to build multiple
other databases and services from scratch and in all of these things they were like a lot of stuff
is repeated right so when you build the database from scratch well you're repeating a lot of stuff
you got to build replication again which is insanely expensive to build by the way and also
insanely difficult and you have to figure way, and also insanely difficult.
And you got to figure out durability and distribution, how things network and talk to each other.
That's just being repeated over and over, right?
The one mantra everyone knows from programming, from really the start being a programmer,
is do not repeat yourself.
D-R-Y.
And it's completely non-existent in back-end programming.
As an industry, we are repeating ourselves constantly by the fact all these systems are actually doing the same
things or they have a lot of their subsystems such as replication are having to do the same
things sometimes they're doing it in different ways sometimes but they're trying to solve the same problem. So with Rama, with a true
general purpose system, we're able to implement replication for Rama, which is something we spent
a lot of time on. And now it encapsulates all possible ways in which you might specify these
computations or indexes, if that makes sense. So one thing I'm thinking about is when we
look at any of these apps today, their complexity grows over time because a majority of applications start with an API backed by some database
that is fine for the prototype.
If the application works out, you need to grow out of that single database.
Either you shard it, you build different views for the same set of data.
Sometimes, as you mentioned, you want to represent it as a graph, sometimes a key value store
and whatnot. A lot of time back in engineering teams spent doing migrations because
either the amount of scale you're hitting, your app cannot handle anymore, or you want to provide
a new feature. For that migrating to, let's say, a new system is going to be much better.
What does that look like with Rama? Like if you wanted to
represent your data differently, would you just go create a new P state or like, does it also ease
the migration piece? So yeah, so if you need a new view of your data, then you can just build a new
P state. You can always read whatever, you know, read as much, you can start constructing a piece state by looking at the start of a depot or just at some point in the past of a depot.
There are other cases where you might want to actually change your existing piece states because you want to change the format of something.
So you can do that manually now with Rama, like through a module update, although we are currently working on a first class migrations feature where you would be able to just take an existing P state and then just change the structure.
Right. So maybe like instead of using this data type for this value, you want to use this new data type with maybe more fields in it or less fields.
So that's coming soon. And then there's like another level of it where you might want to not just migrate each partition of a P state, but you might want to actually
include some repartitioning of that during
the migration. So actually change
where stuff is stored, not just how it's stored.
So all that stuff is coming.
Still doable to do. Yeah, it's still
doable to do right now.
It's more manual, but we'll have some first class
people do that soon. I see. And so double
clicking on deployments for a second, and this is just
me trying to understand this better.
So if I look at
an existing application,
it's like we went through
this whole microservices
and whatnot,
but eventually what it means is
you have your data store
running on a set of machines,
your web services
running on a set of machines,
your Kafka queues
or some other queues
running somewhere else.
And maybe you add other pieces
to your ecosystem
as the app grows. So if I think
about building a Rama application, if I just look at it from bits and boxes perspective,
what binaries get deployed where and how do you scale this thing out when once your app keeps
growing? Right? Yeah. So first of all, I think before we go further, you could, Rama doesn't have to be
used in isolation. I think some people may get the wrong impression of that. Like when you're
using Rama, it doesn't mean that now everything has to be built on Rama. So Rama can very easily
integrate with other systems, just like, just like you do with any other system using Alucard
architecture. Very, very simple to, to, if you want to integrate with, you want to use a database from Rama, very, very easy to do. Or likewise, you can also, Rama can consume data from external queue
systems, right? So we actually have an open source project called Rama Kafka, where you can use Kafka
as a data source for your ETLs. And it works exactly like you'd be using a depot. But more
generally, in terms of deploying Rama itself, it's Rama runs as a cluster. So it has a central node
called the conductor, which is the conductor doesn't do any sort of, it's not involved in
data processing. That's just how you do module operations, like deploying a module, updating
a module or scaling a module. And then there's a cluster of worker nodes that all have daemon on
it called the supervisor, which just listens to the conductor for assignments. So what workers from what modules
should it be running on the machine, the supervisor responsible for starting and stopping
worker processes as dictated by conductor. And so when you deploy a module, it's just a one liner at
the terminal, where you tell conductor, here's my jar with my code. Here's the module I want to run,
go right out. And here's the palettes and I bought and the conductor will figure out which supervisors to run that on. And then likewise, when you want to update, it's the palettes my bot and the conductor will figure out which supervisors
to run that on and then likewise when you want to update it's the same thing you tell the jar
you tell the conductor i want to update this model here's the jar and goes ahead and does that
process and same thing with scaling where this time you don't have to give it a jar because
you're not changing the code you tell the conductor here's a new problem settings i want
and then that launches the scaling process.
And when you're scaling, when you do an update,
it's going to deploy to the same set of nodes it's already deployed on.
So it's a co-located update.
When you scale, you actually need to move data across nodes because now you're spreading nodes.
But again, all that stuff is behind the scenes and transparent.
So scaling will take longer because of the data transfer step,
but it's all very, very simple to do.
It's just literally just like you just say, here's how many more resources I want, go, and then it takes care of the rest.
And by the way, I'm just trying to figure out, is there a limit to how big a cluster could be?
Perhaps looking at the open source master and example you have, is it possible to share on how many nodes is that running today?
Well, a full Twitter scale would take about 600 nodes.
That's it?
That's little for what I was thinking with that schedule.
Yeah, yeah, yeah.
And that full Twitter scale would be 7,000 tweets per second at 700 average fan out.
Again, with a very unbalanced social graph
but actually the key thing for twitter is in terms of scalability or just in terms of performance
usage is the average fan out as well as the number of tweets per second so i mean we went to this in
depth in our blog post about our mess on incidents so there's a lot of stuff you have to do in regards
to having unbalanced social graph in terms of achieving fairness and whatnot. But in terms of
resources needed, it's really just about
average fan out and number of tweets
per second. And yeah, it'd be about 600 nodes
to do that whole product. The consumer
product. We're just talking about the consumer product.
I'm not talking about all the other stuff
that Twitter does. So we're looking at Twitter
like 2015,
let's say, on the consumer side of things.
So the dominant side of things.
So the dominant cost of Twitter or that deployment would be storage because you'd be absorbing like, I think it would be like
five or six gigs per node per day of like new tweets.
And so, you know, you'll need some pretty big disks on those partitions.
So like when thinking about scaling, usually today, teams scale different parts of the application depending upon where the bottleneck is.
It's like, well, if your API isn't performing, let's look at the bottleneck.
Is it database or is it just you don't have enough instances and you're chewing through too much CPU?
When with a Rama application, like what does that look like? Do you also look at the same factors you would in an Alakarta model,
or you just scale the entire cluster,
and Rama figures out what to put their storage and compute and work on?
Oh, yeah.
Well, I mean, it's all about telemetry, right?
Actually finding where the bottleneck is.
So that's where Rama's built-in telemetry is really useful.
And that was very useful for us developing our Mousetron instance
to actually find what were the know what were the hot spots in performance so you look at things like
we just look at like for a p state well how many rights is it having per second and what is the
average time of those rights or what's the distribution of it and that stuff helps a lot
to find where the bottlenecks are you can also look at skew search telemetry let you look at not just like the overall picture for a P state, but also partition by partition.
So one really common reason for a bottleneck would be skew.
So this one partition has the load of another one, and that's going to slow down the whole system because you have some resources idle while another one is very hot.
Right. And that is the main issue with the unbalanced social graph is it's inherently like extremely skewed so that's why we put a lot of effort to balance
the processing even when the the social graph is so skewed and that by the way this twitter
obviously went through the same thing a lot of a lot of their implementation is also trying to
deal with with that inherent skew so a lot of things we did was similar to what twitter did
but obviously in a more integrated way,
in a much simpler way.
And we ended up getting, yeah, like, you know,
like we did one optimization to reduce variance
between tasks in terms of like, okay, so like,
let's say you have someone with 20 million followers.
If you're processing all their fan out
just from one partition, that's going to be very skewed. Because whenever that person posts a status, suddenly you have 20 million units of
work. Whereas with normally per second, you have whatever 7,000 times 400 is, right? Which is a lot
less than 20 million. So that person creates a huge burst of load, right? And it's creating a
huge burst of load on one task, right? So if you do the naive thing of just processing all of someone's followers from one task,
you're going to be super skewed.
That's going to massively slow things down.
Because now that one task needs to work through the queue of 20 million things,
whereas everything else is much smaller than that, right?
And so one of the things we did in our implementation is we basically have a different view of the social graph
specifically for fanout.
So when someone has a lot of followers, their followers get spread around all the partitions
so that when you want to process a person's followers, you do it in parallel for some
partitions and you balance the processing. For fairness, such that that person posting
doesn't delay everyone else, per iteration of fan out. We only will process up to an
argumentation with 64,000 followers per user per iteration, right? One iteration takes like 500
milliseconds on average, right? So it's processed. All of someone's 20 million followers will take
a few minutes, right? But that's, that's, that's the trade-off you have to make because the number
of resources you have is fixed at any given point in time.'s and that's you know again that's not on the rama side that's on the
macedon implementation side that's how we utilize rama we're able to do these things like like
materialize multiple views of the social app for different purposes and you know all these things
we did there were other things we did to reduce variance when you reduce variance you increase
throughput because you have more balanced processing and so you have less situations where you have some resources idle while another one
is really, really busy.
And there's actually more we could do on the massive implementation to actually reduce
variance even further.
But obviously, we got the performance to a point where it was as good as Twitter, actually
better than Twitter in many respects, so that we didn't feel it was necessary to keep developing
it.
But we could make it, I bet with a little bit more work,
we could probably squeeze another 5% to 10% more throughput
out of a family.
But obviously, it's already a very, very high throughput.
So this is a different way of thinking
about how you build backends.
And as you mentioned,
this is very different from Alakai to Marvel.
You have a platform that is truly generic.
So how are you thinking about adoption here?
Like for companies to adopt this to build massive scale systems,
usually the thing that many teams or engineers start with is like,
let me just hack on something, put a database in front of it, and I'll go.
So how are you thinking about Rama getting adopted?
Yeah, that's a great question.
And actually a lot of it I learned,
a lot of this, how I'm approaching this,
I learned from the open source work I did before,
especially with Storm.
So my general approach to adoption
is the bottom-up model,
which is as contrasted to the top-down model.
The top-down model, which would be where
you try to drive adoption through,
you talk to CTOs and
VPs of engineering, and you try to convince them to give it a shot. So it's outbound sales. It's
very expensive because you got to literally do it one by one, right? So what's much more efficient,
and I think more effective, and this is what I did with Storm, is the bottom up model, right?
So you make something that's really compelling, that gets engineers really interested in it,
such that they try it out themselves without you even knowing about it.
And they become your salespeople for you at the company that which they work.
That's what I did with Storm.
That's how I was able to get Storm to be such a big project just by myself.
And that's what I'm doing with Rama.
Now, there's a big difference between Rama and Storm.
Storm was like, it had like two concepts in it.
And it wasn't that different from what other people are doing.
It was just doing it in this, you know, fault tolerant way.
It was very, very easy for someone to pick up Storm and try it out just because there wasn't that much to learn to pick it up.
Rama is different.
Rama is a paradigm shift.
Rama is a major, major paradigm shift.
So it has a much, much higher learning curve than something like Storm or really anything
else that you would look at.
So the high learning curve makes it more difficult to get adoption just because there's this
upfront cost.
We actually have to learn it before we can really start using it.
And so that's why we launched the way we did.
We started with this Twitter clone running at scale and this ridiculously
small number of lines of code to basically create the motivation to put yourself through that
learning curve where, oh, here's this thing. It's doing something really unusual, like really
unusual, like literally reducing the cost of building this major service by a hundred X.
And so, you know, my theory is that that would compel a lot of people
to want to benefit from that.
And what I've been seeing in these first few weeks
since we've launched is that message has gotten through
to the early adopter type of people
who base their decisions on the technical merits of things
and what is the value it provides
as opposed to like a later adoption kind of crowd where they base their decisions largely on social
proof so i want to use like i've been like someone like that is saying oh i want to do something
that's similar to what someone else is doing like i literally want to see my use case done in a
similar way already before i use that thing. Obviously, that's a much less technically savvy crowd.
It is a big portion of the crowd, and that stuff is important.
You do need to be able to show those things.
Right now, obviously, we're focused on early adopters.
And the main things I've been seeing from early users and the early enthusiasm I've been seeing is from people who – it's like two things.
So people who have systems that they need to scale and maybe
they've been through it before so they have a lot of anxiety about what they're going to go through
to scale the existing systems so they know how painful it's going to be to do the alicart model
to use a dozen different systems and so on and the other thing other thing i'm seeing is people who
are sick of these impedance mismatches for their whole careers for
20, 30 years, they've been having to twist their model into these data models. And they understand
how much complexity they're taking on from the get go. And so that the idea of P states of being
able to tune and shape your indexes to what you need as opposed to the other way around it's very
compelling for them and so that's like the general approach i'm taking and so like over time like
right now we have as a demonstration we have this one example right this twitter scale mass on
instance which actually inside of it is actually like 20 examples because there's like 20 different
or you know i mean there there are so many people in Mastodon, right?
And they all work completely differently, right?
But it's always just as one product, right?
So over time, like as we are working with early adopters
and helping them achieve like massive success,
well, now we're gonna have more examples to show.
And I expect that'll help Rama break through
to kind of later adopters who need that social proof
or need to see like ramen used in a way
that's similar to their needs before they can give it a shot so that's that's like the way i see
adoption so i don't anticipate it being that fast just because it takes time to build that social
proof but i do i do especially with like how much early enthusiasm i've gotten. I do expect us to get there. Taking a step back, for you personally,
how do you see this transition going from like,
you're somewhere by yourself going through that big text file that you have, right?
And then really thinking really deeply about these like really challenging problems
to now more kind of day-to-day running a company,
having to manage people and then having to think about marketing, right?
Talk to developers. How has that transition been? Yeah, and then having to think about marketing, right? Talk to developers.
How has that transition been?
Yeah, well, there's basically two transitions, right?
I went from by myself in 2013,
and in 2019, I fundraised and built a team.
So that was a big transition,
learning to, like, manage a team and do all that stuff, right?
And I learned a lot.
In the past four years, I've learned a lot about that subject.
Can you share, like, the aspects that you learned there?
About management and just building a team.
I mean, there are a few parts involved, right?
Like you went from having a problem you wanted to solve.
You spend some years researching, trying to kind of better frame the problem and figure out a direction you want to take.
Then you're at a place where you're able to describe that problem and fundraise, which is not an easy thing to do.
And you're trying to solve something which wasn't done before.
This is something new, a new paradigm in how you build software applications.
So part of it is also selling in a way.
You're trying to show what you're trying to build, fundraise,
and then build a team.
So there are many pieces there.
Well, learning to fundraise was its own thing. And it's kind of a weird thing because it's kind
of fake. Fundraising is you're selling a product that doesn't exist yet to people who are not in
your target market. That's what fundraising is, especially for something like this, deep tech
intended for
serious software engineers i mean some of my investors have a software engineering background
but they're certainly not doing that anymore i don't think any investors were like that hardcore
of engineers except for maybe max levchin max levchin is pretty hardcore and like yeah he was
he was a very difficult one to fundraise from. But he probably grilled me harder than anyone else in terms of like the technical details
of Rama.
He actually was very interested in the underlying language behind Rama, which no one else took
an interest in.
But anyway, still, the principle remains, right?
Investors you talk to are not in your target market.
And also, they just don't really understand what you're doing.
They don't really understand.
I can describe at a high level, level, 100x cost reduction, build
Twitter or any other application on one
platform instead of a dozen. They understand that,
but they don't really understand.
Fundraising is its own art,
its own...
My fundraising went really well, but I will say
until it started going well, I thought
I was going to fail.
That sounds like fundraising.
Yeah.
I was as successful as you can possibly
be. I raised more money than I
wanted at a much better valuation than I was
initially seeking.
And I got every investor I wanted.
Why was that? Do you know what aspects played a role
in fundraising being
as successful as it was for you?
I know every aspect that went to being
successful, but again, I thought it was going to fail
until it started to go well.
And also, it's not like I thought
it was going to fail for a long time.
I got the whole round done in basically one month.
But man, it was not looking good for a while.
Because it's like, yeah, I mean,
this is a whole topic, right?
But especially with venture capitalists
who are not investing their own money,
they're investing the money of their investors, the LPs.
So the motivations of a venture capitalist are,
they have other motivations besides what you're doing, right?
So like you'd think with fundraising,
all it comes down to is tell a story
about how you're going to build this product,
which is going to have this multi-billion dollar market for it.
And then the second thing is
be credible in that story, right? So my story was pretty simple. General purpose platform reduces
the cost by 100x or more. And it unifies these things, which are currently done as like a dozen
different tools. And I think I was pretty credible with Storm and my book and just being like a
fairly well known and respected person in
this field for which i'm building a product right yeah so it turns out those are maybe the two those
are those are important those two things but they might be the least important things
look at why investors actually invest so and and what's funny is that i had already read all this
stuff before about why investors actually
invest, but I didn't really understand until I went through the process and went through these
meetings where, and it was actually specifically Paul Graham that wrote about this stuff. Like
he was writing about this stuff in like 2008 or something, or maybe even before that. Right.
But the issue with investors, so there's other motivations with a VC. So first of all,
they have their investors and you don't see a return on investment for a long time maybe 10 years but you need to
make sure that your your investors your lps think you're doing a good job in the meantime before
you get a return right on that investment so you need to you need to explain to them well here's
the investments i made and why i made them and so they have to be able to tell a good story and
what they're looking for is traction of some sort either traction in the market which is obviously something i did not have in 2019
because i was still building the product um or you're looking for traction in another way such
as oh this other big well-known investor who's very respected invested them also so i co-invested
with them right so that's like the whole momentum thing that you hear about with
and it's something that paul graham wrote about a lot and so that's like the main reason why
investors invest especially vc is that they want to be able to they want to see that traction so
they can tell that story to their lps now as a founder where i'm just trying to like build
something cool and change the economic software development it's very frustrating to have to play
this weird game we need to like generate this momentum and social proof so that i can build momentum and finish my
round so that's like yeah yeah it drives you crazy because like ultimately what like a lot of these
early conversations i would pitch them and be like oh this is great can you talk to these other
people to see what they think and they'd be really slow like all doing all this is great. Can you talk to these other people to see what they think? And they'd be really slow, they all do all
this like due diligence, which also seems completely
unnecessary, because like I told the story, I'm credible, what
else do you need, but really, they're just delaying things
because they want to wait to see some, you know, someone else
invest so they can they can tell that story to their LP.
So the first year, that's the most important, like once you
get the first year, the the wants after that become relatively easier.
Yeah, so the day I closed my lead,
which was initialized,
Gary Tan was the investor.
Gary Tan is now the president of my combinator.
But yeah, the day I closed my lead,
everything changed.
So now I was able to go back
to all the other investors who were being slow.
And I just wrote them a very polite email
where I said,
it's great that you guys are interested,
but I'm looking to close the round now.
Initialized is leading. Here's the price. Let me know how much you want to invest. You can invest anywhere
between this amount and this amount. And let me know by this date. If it doesn't work for you,
that's fine. No worries. And basically what I'm telling to them is I'm not doing any more
due diligence for you. I'm not going to do any due diligence for you at all. So either you invest
or you get out. This is a polite way to say that
or another way is to say like stop wasting
a polite way to say that to them right and man it feels good to be in that position
you have the leverage and fundraising where you're no longer playing that game feels great right
so yeah that was good and what's
interesting is that it was it was four days into my fundraising that i hit that point so again i
wasn't like bogged down i was not like i was bogged down six months like a lot of founders are right
so again it went very well but yeah it was not looking good up until that four-day mark and
that's and that's i'll say a lot of that credit goes to Gary Tan
for not being like that as an investor,
for really seeing, being able to understand
the merits of something on its own
and not needing all this silly momentum stuff first, right?
So, yeah.
Congratulations on the successful fundraise,
even though back in 2019, but congratulations.
It's a huge deal for a company.
So you mentioned, like, just during the fundraising period, you were thinking you
might fail. Keeping fundraising aside, it took 10 years in a way to build drama, starting from
2013 to now. Was there any point in this 10-year period where you just wanted to stop and do
something else? Yeah. Well, I wasn't confident it would be possible until 2016.
I think what kept me going is,
the main thing is just the opportunity
to have this big of an impact on the world.
I wrote a blog post about this,
about why I started at Planet Labs.
And the thing that I think,
something that really inspires me is the,
like the Apollo program in the sixties,
the space program.
Cause it was like,
it's really incredible what they did.
Like that speech that JFK gave at Rice university.
Like I've listened to that probably 50 times.
Like,
I love that speech.
It's so great.
And it's so like audacious.
I mean,
JFK was like,
first of all,
like the space runner at that point,
like they couldn't, they couldn't even launch a rocket reliably. audacious. I mean, JFK was like, first of all, like the space runner at that point, like,
they couldn't, they couldn't even launch a rocket reliably. I think like, 25% of them were exploding at that point, something like that. I may have the number wrong, but a lot of them
were exploding. And to say that before the end of the decade, I think that speech was 1962,
if I'm not mistaken, something like that. But to say before the decades out, we're gonna have men
walking on the moon, and we can't even launch a rocket yet and to say that on a national stage
like that like that is unbelievable and then they did it they did it and just like the way they did
like the engineering was incredible that they did it was i mean it was brave what they were doing
like those astronauts were like man at that time yeah it was just the whole thing was
incredible and they really i mean they developed space travel and they figured out everything that
goes into doing that and working in space and doing all that stuff you know setting the stage
for all the stuff that came later all the stuff we use space for right how much that's improved
society at large and so on.
So I just find that super inspirational,
like pushing this frontier, not being afraid of it
and really advancing human potential.
And like back in 2013, when I started working on Rama,
that's what I saw with Rama,
a way to fundamentally advance human potential.
So more than anything else,
that's what kept me going, that thing.
As well as it just being a really interesting thing
to work on just as a programmer, right?
This new programming paradigm,
figuring out how to enable abstraction,
automation, and reuse in this aspect,
this major aspect of software engineering,
which has suffered so much in the past,
you know, forever in these aspects.
So at no point was I ever going to stop.
I mean, I would have stopped if I determined it wasn't possible, obviously.
But at no point did I think it.
At no point did I get to a point where I thought it was not going to be possible.
I was making progress over those first three years. Like it was definitely slow to be possible. I was making progress over those first three years.
It was definitely slow at some points.
There were definitely some wrong directions,
some very wrong directions I went.
But I don't think, I think probably,
I remember when I was working on P-states,
the abstractions for P-states,
especially for reactivity.
That's something I understood from the beginning,
just the importance of reactivity
and how reactivity should be fine-grained,
where you get, when something changes,
you get a very precise information
of what changed,
which is actually very different
than how databases work,
which are coarse-grained.
Like, at best, you would know that, like,
oh, this row changed,
but it doesn't tell you what changed in that row.
The fact that, oh, maybe this, like,
this one value changed in
this one way incrementally like this one value could be a set so it could be this this one column
in this one row this one set had this one element added to it but actually all you know is that this
one row change right so that's course ring rama's fine grained right so you get precise information
so regardless of the complexity of data structures you can do these reactive queries where it actually tells you that this set inside this map, inside this list had these two elements
added to it and this one element removed.
So that's find information, which is really powerful, can power some really interesting
stuff on top of it.
So it was working on especially the API for P states and how reactivity could work work which really had me stumped i think it was for like three or four months and i went down the
completely wrong path and eventually figured out that like i have to take a step back and question
my assumptions here and then i and then i ended up figuring out what is the right way to do it which
was paths this path model which can do both non-reactive queries very efficiently and reactive queries it's like it was just like once i i remember the moment where that like clicked
in my head and just being like like just like an explosion in my head it was like it's so elegant
it's so perfectly enables everything that you would want from from indexing which is a very
obviously broad topic and there were other moments there were other things like that, maybe not as extremely like that,
of just this two steps forward, one step back kind of process.
But at no point did I think I wasn't going to succeed.
That is very inspiring, to be honest.
And kudos on going that long and actually building grammar.
It's really impressive.
By the way.
Yes. Let me say one more thing about that. I'd say the thing that enabled all this was the fact
that in 2013, I had achieved enough financial freedom that I knew I would be able to pursue
this. I knew it would be difficult. It could take a long time. I didn't necessarily think it would
take 10 years at that point, but I didn't really know how long it would take because it's such a broad thing to be working on. So the fact that
I had some financial freedom, like I wasn't like, I wouldn't say I was like super rich,
but I had made enough money from, I was part of a startup called Backtype, which was acquired by
Twitter. That's why I know so much about Twitter. So I made enough money from that, that I was in a
position where I could pursue a crazy thing like this. And that was a lot more appealing to me than doing something like a storm company, which is just about monetization, just about making more money, right? But we only have one life. So let me do the thing that that would make me like proud of my life, which is to expand human potential, right, as opposed to just making more money than I already had. So that was, I think, the foundation that enabled this process.
That is incredible.
I mean, it shows the path of higher resistance than the least resistance.
It would have been easy to be on a path, like either start a company with Storm
or even like, let's say, be a distinguished engineer or a technical fellow
at one of these big companies and still make a lot more money with that. I mean, you had the
credibility to back that up. Well, I would have made more money in the past 10 years going that
route. I think Red Font Labs is worth way more money than that could possibly be. But again,
like resistance, it depends how you define resistance for yourself and really what you
want to see from your life. Like for for me like being so inspired by something like the
apollo program like like it wasn't even a question which path i should take right like it's like
doing i actually feel that doing the red plant labs path was less resistance for me it's just
like like like internal resistance if i didn't I had this idea for Red Plot Labs
and I decided to pursue a storm company,
well, then that would be eating at me forever.
This is the regret minimization framework
that you refer to in the blog post, right?
Yeah, that's the thing that Jeff Bezos said, right?
Where he was in,
like his situation when he started Amazon
was similar, right?
He had a really high paying job,
very comfortable job.
I don't know where he was working before he started Amazon. Then he had? He had a really high paying job, very comfortable job. I don't know where he was working
before he started Amazon.
Then he had this crazy idea
for this online bookstore.
And then, yeah, he said that like,
so which way do I go?
Do I take this leap
or do I just stick with my comfortable
lifestyle and job
where I'm making a lot of money, right?
And then he framed it in terms of regret, right?
Where he said,
I would never regret trying the bookstore thing,
but I regret it forever if I did it.
And I'd say the same mental process went through me as well.
So a couple more questions before we close off.
You've been building Rama, which is a deep technical platform.
At this point of the company, it's about, as you mentioned,
you started building the team in 2019,
and now it's about scaling the team, building the company, it's about, as you mentioned, you started building the team in 2019, and now it's about scaling the team,
building the company culture.
Can you share more about how you are thinking
about the company building aspect of things,
which is slightly different from building deep tech?
Right, yeah.
Well, I've learned a lot.
That's something I've learned a lot
over the past four years.
So from the start, I did decide
to do a fully distributed team.
I just think that's,
I actually think that's a much better way to run a company. Obviously, a lot of people had to experience it during the pandemic. And unfortunately, they experience like a distributed team,
first of all, requires people who actually want to be on a distributed team. So one of the reasons
that the fourth distributed team pandemic didn't really work that well was because those people wanted to be in an office, but they weren't, right?
So that's a really big aspect of it, right?
But I do think that productivity and collaboration and whatnot are better distributed, presuming that everyone wants to be distributed.
Do you get to work in the same time zone where everything's in writing?
No.
So, yeah, I do find it important to be in close enough time zones that you can still do
video calls so yeah so so we don't hire we don't really hire globally initially i was intending
that but i do think it's important to be close enough but it's still a pretty wide range right
so i actually changed when i moved to hawaii i changed my schedule so now I wake up at like 5am. Right. So but it actually turns out
to be a great lifestyle to lead in Hawaii because mornings mornings in Hawaii are incredible. It's
not too hot yet. Doing stuff outdoors is amazing in the morning. But basically, we work on an East
Coast time zone, right? So whenever we talk about like times internally, we're always talking,
we just assumed
east coast right so 2 p.m means 2 p.m eastern which would be 8 a.m hawaii time so but yeah
obviously across that range of time zones like it's still a huge portion of the globe that you
can hire from right i do think that's one of one of the like really major advantages of distributed
over co-located teams is the fact that you're able to hire from a huge one of the like really major advantages of distributed over co-located teams
is the fact that you're able to hire from a huge portion of the globe instead of just this one city
right whereas you can only hire from and co-located you can only hire people who are already in that
city or or or who are willing to move there if you're doing if you're looking for experienced
engineers well chances are they don't want to move because they have a family in the suburb where they are.
Right.
And I have found that the best engineers kind of come from the most random places and live in the most random places.
Yeah.
Yeah.
That's just something I observed over my whole career.
And so when you say I want to be co-located in, let's say, San Francisco, which is obviously a very common place to have a startup.
Well, you are severely limiting your talent pool um as a consequence it means that the quality
of engineers will be less than they would be if you're distributed just because you have access
to so many less engineers so i think that's a major thing although i do think distributed is
still better even without the recruiting aspect being such a huge advantage yeah like like some of the problems in a co-located team,
like if you're co-located in an office,
it's just like distractions, right?
Programming requires focus,
and an office environment is kind of inherently unfocused,
like with distractions.
There's office layouts which are better than others.
The most common office layout, of course, is the open office.
Well, it's been a long time since I've worked in an office,
but I presume that's still the most common office layout, of course, is the open office. Well, it's been a long time since I've worked in an office, but I presume that's still the most common one.
And yeah, it is.
It's like an open office is if you decided,
like, I am going to design the worst possible environment for programming.
Like, I'm going to engineer the worst possible environment,
and that's what you would end up with is the open office.
People walking around, the bathroom door environment and that's what you would end up with is the open office people walking around the bathroom door opening closing people chewing on chips next to you people having a conversation like it's really hard to focus in open office you have that like
running joke which everyone's heard of like oh i get all my work done after everyone's left the
office all right well then then don't work in an office if that's the case so anyway that's one of the
core principles of just like red hot lab company is a fully distributed company and of course i
was inspired by other companies that were doing it and having a lot of success with it just to
see that was possible and you know it's worked very very well for us we basically have a morning
meeting where we kind of sync up in the morning as a stand up.
We also do a fun thing every morning.
We do something called the question of the day.
So every day we rotate and once your turn, you can either ask a question to the rest of the team, like something personal or just or you can also do share the day.
Like you can just share something interesting about yourself or something you found.
And, you know, we've been doing question day now for four years.
So the questions have gotten like really weird and esoteric,
which is really fun. Can you share an example question?
Oh, man, I think the recent one was like, what's an interesting, what's a story from your one of
your parents childhoods? Yeah, I don't know if it tells you that much about the person,
but it's an interesting thing. And you know, you know, people have really, I guess I would say we've heard some pretty wacky stories from
people or like, I don't know. One I asked a long time ago, I remember was,
what's, I asked this twice. So one time I asked, what's a mystery in your life that you haven't
solved? And another one was, what's a mystery in life that you did solve recently? So all sorts of wacky stuff on that one.
And I think the idea behind question of the day is that, like,
so when you're on a co-located team, you kind of naturally get, like, camaraderie
because you go out for drinks after work or you get lunch together or whatever.
So you naturally just have a lot of, like, socializing, right?
Whereas a distributed team, you have to be more intentional about it because it doesn't happen naturally.
So Question of the Day is a way for us to just be people to each other as opposed to just screen names, right?
And likewise, that's also why I think pair programming is very important in a distributed software team.
I don't think pair programming is really that important, at least not important as a regular everyday process on a co-located team it's like we repair every day and when we pair it's not it's not like
two people or it's not like you're working together as equals on the project it's one
person that's driving the work it's whatever so like one person will drive the other one will
follow and it's it's the driver's project right and as a follower you know you're there to see
what they're doing and then to maybe help out if you can,
maybe talk through some design issues.
But mostly it's just about having individual face-to-face time
with your teammate so that you build that camaraderie.
So I say that's the main goal of pairing.
We pair for one time for 45 minutes every day.
And the second goal is just knowledge sharing, right?
So now you're learning about this aspect of the code and basically repairing with them.
Knowledge sharing is like something that like I've thought about a lot in like ever since I started building a team.
Because that's one of the primary problems you have to solve in building a company.
So I've learned a ton about that.
Like there's a lot of stuff we do for knowledge sharing.
Some of the stuff we do for knowledge sharing, I think one of the best processes we use is something we call reverse story time, where that is where we rotate this
every month, whose turn it is. So we usually do this once every like, I'd say four to six weeks.
But once your turn, you have to give a presentation on some part of the code base that you did not build something someone else built
so this accomplishes two things first of all the best way to learn something is by teaching it
so that just does that and it it gets that subject taught from a new perspective and more importantly
from a beginner's perspective and so we record every reverse story time. Now we have like an archive of like tons of them.
I don't know, probably more than 30 at this point.
And also that's a really good resource
for new employees to be able to learn, right?
They can watch these 30 minute reverse story times
and actually learn the different aspects of the code base.
That's super cool.
I mean, if I know my code is going to get read
by someone trying to trace back and to get blames,
you know, it keeps me more on my toes, I think.
I'm not doing anything too stupid.
Yeah, so those have been good processes.
And, you know, and we've adjusted over time.
Like, so for stand-ups, we used to just do it, like, live.
So we just go around.
Everyone has one minute, and you just give your update.
And actually, recently, we changed that.
So we still do the stand-up meeting but
you give your stand-up update in an email beforehand to yeah and so now first of all
that is it shortens the length of the meeting and it creates an organized place where you can have
further discussions about whatever the standard update is, right? So someone can update about some part of the system,
like, oh, I'm taking this approach for data transfer,
and then maybe someone will respond to that email
and create a thread of like, oh, why are you doing it this way?
Have you thought about this? Whatever, right?
So that's been a great addition.
This has been a great, like, small change to our process,
but I think makes us work more, like, efficiently a little bit.
So now the stand-up update is more, or the stand-up meeting is actually about doing the question of the day
and then deciding what is the pairing session is going to be for that day.
So now it's like a 15-minute meeting or whatever, and then that's it.
That's pretty cool.
We tried that on our team a while back, and this was during the pandemic,
where we switched our stand-ups from like all
in person to like two days a week we would do over a call and three days a week it's like
a slack bot will ping you like what you got done what you're doing today what you need help on and
whatnot we found as a team that was an extremely effective way to for both sides actually for
people who were writing the update because they got a chance to think about what they did and
what they're planning to work on and specifically ask help on certain parts like hey i want to discuss xyz
and that they're exactly the thing you described like a bunch of threads after the stand-up and
was very effective for people consuming that information too because when you're starting
some people are sometimes not paying attention so they miss it anyway totally agree on that
yeah oh this has been an awesome
chat nathan and you thank you so much for being so generous with your time before we close is
there anything else you would like to share with our listeners man nothing's come to mind we talked
about everything from to rama we'll include the blog post in the show notes. Oh, for sure. So people can visit.
Yeah, sounds good.
Can we hit Nathan
with our question of,
you know,
what's your favorite
software misadventure?
Sorry, it's very cheesy,
but if you have any stories
to share.
Or a misadventure.
Another way to think about that
is what's a failure of yours
that you learned the most?
Yeah, I mean, in the development of RepLogLabs, yeah, so that thing I described about going the
wrong direction on the P-State API, and just, yeah, that was probably the biggest misstep,
I think, or just like, you know, basically that whole four months could have been. Okay,
so what made that like a particularly big misstep is that the whole path abstraction i'd already developed so when i so so rama's written closure
and in just developing rama just developing a paradigm writing a compiler and whatnot i needed
to develop this path abstraction just just to make it easier just to do just that regular stuff that
regular programming stuff just to be able to just work with data structures more easily.
And Clojure, it has immutable data structures.
So you're always working with immutable data.
And it's really cool how it works.
Like you take a map and you add a new element to it, it actually returns you a new map instance,
but it's very efficient.
Basically, it shares structure between the two instances, which is how it's efficient.
But it has this implementation for vectors and sets and other data structures as well.
And so a lot of stuff in developing Rama, I'd end up with a set inside of a map or whatever, right?
And it was very cumbersome to manipulate structures like that.
And actually in the compiler, it was much more,
it could get very complicated where the code that you're writing
is actually a graph computation.
It's a data flow graph, right?
And sometimes some of those nodes actually have a data flow graph within them.
So very, very complex structures that I needed to be able to do compiler analysis on
where I have to do traversals and these very complex nested manipulation.
And everything is immutable, right?
I want everything to be immutable because there's so many advantages to having your data be immutable.
So I developed this library encoder called Spectre, which was this path API for generically querying and manipulating arbitrarily compound structures.
And it's super fast as well.
So I actually already had Spectre.
I already had the path abstraction. And so I went down this path of the peak state design, where I was thinking in terms of like, okay, I'm literally going to have a map p state,
and really have a set p state, and literally have a list p state, you compose them together,
but then you manipulate it by calling gets, or get the nth element or whatever. And then I'm
going to have reactive versions of all these queries. So I have get and get reactive, and so on. And then you try to compose it like that. And it just was not working
at all when I was trying to actually use this approach to actually express my million use
cases that I had. And then ultimately, I realized like, well, P states are nested data structures.
And Spectre and PaaS are all about being the most expressive, powerful way to do persistent data structures.
Why don't I just use PaaS on P-states and bake the reactivity into PaaS themselves?
And that was the big moment where suddenly everything fit together.
And then I was able to scrap everything I did those three or four months and then take this new route and so um yeah like on one sense it's like feels incredible to to have this
breakthrough where i like i just found like a fundamentally better way to to to interact with
an index and it not it not only is generic
and very concise and elegant,
regardless of the complexity of the index.
So not only like every data model that exists,
but any sort of permutation
you could have a data structures,
it's super elegant.
And it has this new capability
of arbitrary fine-grained reactivity.
It's a major breakthrough in those two respects.
But then another sense,
you feel really stupid that I already had paths for a long time and I went down this road, this long road, right?
But yeah, I'd say that was a pretty big, that was definitely a misstep.
No, but to your point, every time when I get stuck on a really hard problem, I think about
how nice it will feel once I actually solve it.
I gotta say,
it was like when we first
had...
When I had our Twitter scale
Mastodon instance running for the first time
at scale,
very, very high performance, that
felt good because there was so much that
went into that.
That must have been your realization of all the work that went in, I imagine.
Yeah, it was kind of the culmination.
That was the culmination of the original vision, right?
To be able to build an application like that, which is so costly otherwise, at such low
cost and at such high performance was great.
That's pretty awesome.
Well, we'll add links to Red Planet Labs,
Rama, and to your blog posts
in our show notes too.
And we'll also link your Twitter or X profile
where people can follow you and learn more.
And for everything today, Nathan,
thank you so much for taking the time.
This was awesome.
We learned a lot about Rama, about you,
and I'm sure our listeners will too.
And we highly encourage them to go check it out.
Awesome.
Great talking to you.
Thanks so much.
Thank you.
Hey, thank you so much for listening to the show.
You can subscribe wherever you get your podcasts and learn more about us at softwaremisadventures.com.
You can also write to us at hello at softwaremisadventures.com. You can also write to us at hello at
softwaremisadventures.com. We would love to hear from you. Until next time, take care.