The Infra Pod - Owning both software and hardware to run the best cloud for deployment - Chat with Jake from Railway
Episode Date: May 19, 2025In this episode of the Infra Pod, Tim and Ian sit down with Jake, CEO of Railway, to explore how Railway is reinventing cloud deployment. Jake shares the origin story of Railway, their unique approach... to simplifying cloud deployments, and the innovations that separate Railway from traditional cloud providers. With a focus on developer experience, cost efficiency, and leveraging hard-to-build technologies, Jake outlines Railway's journey from supporting small projects to attracting large-scale enterprises. The discussion delves into organizational strategies, technical challenges, and the potential future impact of AI on cloud infrastructure management.00:22 Railway's Mission and Approach22:45 Deep Dive into Orchestration and Kernel-Level Work30:07 Leveraging AI and In-House Tools31:55 The Spicy Future: Hot Takes and Predictions
Transcript
Discussion (0)
Welcome to the InfraPod.
This is Tim from SNSVC and Ian, let's go.
Hey, this is Ian, lover of bare metal compute apparently.
Jake, I'm so excited to have you on the podcast.
CEO of Railway. Why don't you introduce yourself, tell us how Railway got started and what in the fact Railway actually is. like really, really trivially easy to use a cloud platform. So as you're building and deploying,
your complexity on traditional cloud providers
becomes exponential.
You've got to parse in Terraform.
You've got to parse in Helm.
You've got to figure out how all of these pieces
work together.
We give you a really, really easy
to use canvas where you just basically say,
give me temporal.
Give me Postgres.
Whatever.
It'll spin that thing up, provide its open source.
And then we'll straight that and make sure
that that thing scales for you. So that So that's the kind of like larger end goal of it
started about five years ago, because we were just, I guess,
a little bit disappointed about the cloud landscape, like, you
know, a lot of power kind of like locked behind a lot of kind
of pain to actually kind of access these things. So we kind
of like, as much as I hate the term, like democratizing,
whatever, right, it's like, it is kind of like, you want to make that way easier for people to go in and things. So we kind of like, as much as I hate the term like demurrocratizing, whatever, right?
It's like, it is kind of like, you want to make that way easier
for people to go in and use.
So that's like probably the most applicable term, you know?
So, you know, one of the things that you focus on with,
with Railway is it's just like the simplicity.
Like you go and look at the website
and the first thing that pops in your face is like this demo
of how you can drag and drop infrastructure and just create it.
Like, what was it about the way the infra was?
And we were like, no, this is what I want on my front page. Like, how would you get to that conclusion and drop infrastructure and just create it. What was it about the way the infra was?
We were like, no, this is what I want on my front page.
How would you get to that conclusion
that this is how you wanted to build the railway interface?
Yeah.
So I guess from a go-to-market perspective,
I don't know if you've heard the saying,
if you're everything to everybody,
you're nobody to anybody.
And so for us, we wanted to focus
on a segment that was disappointed like us
with these cloud primitives. So for us, we wanted to focus on a segment that was kind of disappointed like us with these cloud primitives.
So for us, we put simplicity front and center.
We put support front and center.
And so things, again, that AWS or GCP or whatever just
fundamentally isn't.
And so that's why we've gone with the branding
that we have currently.
It's also a future that we believe in.
We want to say that these things need
to be significantly easier so that everybody can access them
and then spend less time doing infrastructure.
Because that's the whole end goal,
is you spend less time gluing and more time doing.
And so I think for folks that probably, most developers
probably already seen or heard what you guys do.
But I'm sure there's probably still
a number of folks that don't know what Real Way truly is.
And this sort of back end as a service,
for a lot of developers like us,
sometimes we don't really know what the difference is
between most of these things.
So maybe you can give us at least an introduction
of Real Way.
Like hey, this is what we do
and this is one of the most special things
people always talk about really.
Why is this effing cool to use it?
Maybe some of that color would be great, man.
Yeah.
So if you look at like traditional kind of back end
as a service providers, they're pretty vertical in terms
of what they allow you to go in and do, right?
Like Heroku is, you know, they'll
outsource a lot of their state to kind of their marketplace.
Versel kind of does the same thing generally.
You know, even you mentioning it's
like a back end as a service, it's, you know,
we do front end, we do anything, right?
So we want to be basically a next gen hyperscaler
where you can do anything, right?
But it is significantly more trivial
than the larger cloud providers.
So it's not like kind of locked in.
You're not going to get to a point where you're saying,
oh, no, you just, you can't do that on railway, right?
You can do anything on railway because we've built a lot
of the kind of like primitives to solve a lot
of these hard problems, right?
Like we will manage the staple storage for you.
We will go and manage the containers for you, right?
Like you don't have to change anything about your application.
You just hand it to us and then we'll go and figure out how to go in and deploy it.
Right. So I would say that's the kind of like main difference with Railways.
Like we try and really, really meet you where you are exactly.
You're not going to have to go and make changes or anything else like that.
You can kind of just deploy anything you want.
I've been following you guys' products and I'm reading a lot of like the latest announcements.
And I feel like everybody's backend or whatever we call this as a cloud,
everyone have a different take. What is the most important parts of the stack is. And
there's a lot of emphasis, I realize, for you on really getting developer experience,
right?
But also like taking away the abstractions of infrastructure required.
And every platform sometimes trying to claim this, but they do it so differently, right?
There's some different taste here that I think is really hard to nuancedly describe.
Can you talk about like, what are the things like the sort of infra infra lists approach I saw in your blog point talk about infra lists, right?
Like I don't want to feel like there is actually in front involved
But you know for a lot of people that build
Backends they actually want to know what infra is because these are maybe they make the right choices and so there's like this weird blend
So we talked about your taste
What has been the driving thing that makes
the design really what it is so far? And what are some of the maybe unique things you guys
do or choices?
I think we sort of did this in the early days where we like hid a bunch of things from you
where we basically said, oh, you don't need that, right? You don't need like persistent
volume claims. You don't need whatever. Like all of these, you don't need that, right? You don't need persistent volume claims. You don't need whatever.
Like all of these other, you don't
need like Bazel-style rollouts or anything else like that.
And our ethos has kind of changed over the years
to basically say, well, you do actually
need some fundamental way to go in and do that.
But how do we compress that primitive?
So for things like rollouts, we'll say, oh,
if you have a dependency on any other service,
like if you're consuming the URL or the IP address
or anything else like that, we can actually just
construct a dependency graph for you.
And so instead of you actually being
to go and define any of these things
and kind of manage that complexity yourself,
what we're doing is saying, that complexity still exists,
here's how you wrangle it.
And here's how you wrangle it in a way that's
you go from exponential complexity
to linear complexity.
And so that's what we mean when we're
talking about the canvas.
For people who haven't used the Railway UI in general,
it kind of looks like a Figma UI in Canvas.
It's like an infinitely sprawling thing.
It's a bit different than traditional infrastructure
code.
But the reason we go and do that is actually twofold,
because it allows you to drill into your service
and only think about your service
as that level of abstraction.
And then you can zoom out to say, oh, I want more and more
of this context.
But then you've linearized the complexity of your system
because you don't have to think about any of those
spanning microservices across my flat network or address
space.
And then what we also do is the Canvas will actually
version these changes for you.
So if you go and make those changes,
it will keep track of them. And then if you go and merge them in, then it will actually just version these changes for you. So if you go and make those changes, it will keep track of them.
And then if you go and merge them in,
then it will actually just replay those changes for you.
So the kind of complexity of like,
I have N microservices, I have to think about all of them,
I have to digest them as part of this Terraform
or break them up into modules, we manage that for you.
And that versioning of, oh, let me try and make a change.
Every DevOps engineer will tell you,
you go and you make the change in staging,
you test it, and then you try and figure out
how to make it actually either import or reproducible
in Ansible or anything else like that.
We just skip all of that.
And we say, you make the changes in the canvas,
we record those changes for you,
and then we'll go and apply.
So that's another way that we kind of take that complexity.
And we don't try and destroy it, right?
There's a whole, like, complexity cannot be destroyed.
But I do think complexity can be compressed, right?
And so we try and take the approach of,
how do we compress all of that complexity
to give people primitives that feel like, once you learn it,
you can kind of compose it together
with all of these other things and move really, really
quickly.
So you kind of feel like you're leveling up
every single time you learn something
new about the platform.
So I think this would be a good time.
If you think about the standard platform engineer person at like your mid market company, you know,
the tools in their tool belt are like Terraform, Helm,
Kubernetes, and some cloud or series of clouds.
And like the workflow there is I package up some services
into Helm charts.
I publish some Docker containers and Helm charts.
I use Terraform to build up the cloud,
the base bottom level cloud infrastructure.
I get the RDS instance, get the EKS cluster, get all the IP address space, all the other stuff.
And I use that to set up my staging, set up my production. And then I use Helm and maybe I'm
using Customize or something else on top of Helm to configure the KOOPS cluster.
What's the workflow with Railway Lite?
Like what's the Delta there?
Because you're talking about this canvas,
but I'm curious as an uneducated user of Railway,
like how you talk about the complexity compression,
but like what's the workflow of this canvas
and how do you have multiple environments with a canvas?
Like is there one canvas into one environment
or is it like broadcast out?
The past production is an incredibly complex beast today.
It was an incredibly complex beast 20 years ago.
So how do you compress that path production complexity?
Yeah, so I think Git does a lot of things right here.
And so there's a lot of things to learn in general.
So allowing you to start with one singular environment
and then make changes to that environment,
and then go and say, actually, I want a staging environment.
Let me take that config and let me just pull it
into a new parallel environment that I can go and make changes
for.
You'll know I start with an application, API server,
a worker, a Redis cache, and a Postgres instance.
I'm hacking along in prod, and at some point, I say, OK, well,
we've got to get ops in here because we're just
committing directly to main.
And the same thing happens with infrastructure.
The problem is that you have to go back and refactor
all of that stuff.
Everything has to become a module.
And then you have to figure out how to version those modules
so that you can push the staging and go
and do all those things.
And so you have to set up a ton of different infrastructure
when you want to move faster as a team, right?
Versus with Railway, we just build it in right out
of the gates
and we've kind of built that versioning into the canvas
so you don't have to go and kind of like,
almost replay all of those things, right?
And so the workflow for users is basically just saying,
I want this thing, I want this thing, et cetera,
and we will keep track of all of those things for you, right?
Got it.
And so I think I want to really kind of figure out
what is the type of audience you usually focus on
when it comes to developers?
Because I feel like there is a lot of cool stuff
you're announcing latest launch week,
a lot of performance in metal, right?
Performance on network, cause, trying to bring it down.
I think focuses on folks that really want to run
at scale production, because folks really care
when it comes to the margins
and how well I really want to like bet on this.
But like I said, there's also like a huge amount of like the templates,
the canvases.
I'm sure a lot of developers when they're just getting started,
they don't know backend that well.
They have just like, I don't even know what to do, what to start, right?
There's a lot of this kind of like developers.
So I'm sure you probably don't focus on like,
I've just learned backend type of folks, right?
I'm like, okay
I have something that needs to run in production and skill from there if I'm right like maybe talk about like a typical
Persona and like the type of folks that you think is the type of developers you care the most
Yeah
We have what we call the go-to-market master plan
Which is an internal documented railway that we've kind of like built out. And it's essentially about almost stretching your ICP.
And so instead of shifting a market,
you stretch a market and you go and build for those.
So our current ICP is 50 to 250 person teams.
And so our previous ICP was actually
15 to 50 person teams.
And that's kind of interesting because it's
the scale at which you're like, OK, well,
maybe we need to get a DevOps person involved, right? And that's kind of interesting because it's kind of like the scale at which you're like, okay, well, maybe we need to get like a DevOps person involved,
et cetera, right?
And so we've kind of like shifted that ICP from,
you know, five years ago when we started the company,
you know, we were saying like,
hey, it's literally anybody with a pulse.
Like you have a, you know, tiny Discord bot
that you want to go in and run.
Like, like you're our ICP, like get in, get in here.
We'll help you go in and do that, right?
And then we've kind of slowly shifted to people who want to move fast to startups,
and then Series A, et cetera,
kind of like companies who are not interested
in kind of like trying to go and find
and hire these DevOps people,
like go and build out this kind of like machine
on like a larger kind of like time horizon
to now kind of like these Fortune 500 companies
who are like, hey, we need like RBAC
and like a way to do like granular permission structuring
and stuff like that.
So that's again, like kind of a good example of like, hey, the complexity is RBAC and like a way to do like granular permission structuring and stuff like that, right? So that's, again, like kind of a good example of like, hey,
the complexity is going to come at some point.
How do you like make it really, really trivial for people
to like go in and add people to your teams, right?
So you're not like waiting, you know,
three minutes for Terraform to like go in and get
its projected state.
And then you apply it and it says, congrats,
you don't have any permissions.
Like, you know, go and ask your manager, right?
So we've kind of like taken that approach in terms of like
building out and layering it. So I would say that've kind of like taken that approach in terms of like building out and layering it.
So I would say that's kind of like where we're at right now.
It's a very interesting spot
because you get to these conversations, right?
And this is kind of why we rolled out the metal stuff,
right?
Because even if our workflows are really, really good,
you know, the conversations on the other side
of the table ends up being, oh yeah, it's great
and we'll move faster,
but it's still a bet on our Ench team
and it's more expensive, right?
And so we were like, okay, well,
we have to be better and cheaper, right?
The burden of proof is on us, right?
Because we have these massive cloud providers
and they have a really large track record.
Look at the big three and then CloudFlare, right?
People are like, oh yeah, I wouldn't host whatever
on my important stuff on CloudFlare.
CloudFlare has been around for ever.
It powers half the internet.
Anytime that there's anything,
it's like an engineer snow day for Cloudflare or whatever.
So again, the burden of proof is really, really high.
So we have to be both faster and cheaper
and have these better workflows.
So those are the things that we've built out over time.
It's interesting, right?
It's like that movement that you just mentioned
from the ICP of like a, you know,
15 to 50, the 50 to 150 of that camera with the exact range set at 250, something like
that.
I assume that's like engineers, right?
I'm curious at the 50 plus engineers, they're probably deployed someplace.
So is this like you're actively going in like winning compute from like the hyper scale
clouds?
I'm kind of curious, curious, how does that work?
I'm sure there's some secrets, but broadly speaking,
is it specific industries or these people,
they got all in on the mono cloud
and they turn into this complexity explosion
and they're wasting all this money on stuff
and they don't care.
What's the process look like
to try and win these mid-market customers?
I get the value prop you're selling,
which is stop spending so much money
on trying to build your own platform,
build on top of our platform,
and just focus on the thing that matters to you.
Get off these clouds and give me the right primitives,
some version of that, I'm sure.
But it's interesting,
because the long-term thought here has always been like,
well, the clouds will ultimately win.
The Habano clouds will ultimately win,
like GCP, AWS, Azure,
and primarily because they have data gravity.
Once the data is in there, it's so hard to get out, right?
So I'm just, it's just an interesting thought process
because it is a narrative violation to say,
I'm going to go win spend away from like the big three.
So I'm curious to learn more.
And I'm sure our listeners are as well.
Yeah, for sure.
It's a very interesting and I guess like a delicate dance
as you like kind of get up into the larger spends when you're talking like 250k, 500k, like million dollar spend on cloud.
Because you're across the table from a director of engineering and the director of engineering's main concern is usually it's like engineering velocity.
But then it's also just like reliability, right? Like I just got to make sure that like we move fast and then the system stays up,
we deliver value for our customers
and we're able to move quickly, right?
So what gets us usually in the door
is that kind of like promise of like being able
to move quickly in general, right?
And then what allows us to kind of like move
through those sections is just incremental adoption,
honestly, right?
Like there are workloads where, you know,
some use cases they're not so critical.
People will start with like-tier zero services.
They'll start with non-database services.
They'll start with really, really kind
of like, trivial use cases.
Staging environments is a really, really good one
where we can just like, we can manage staging,
we can manage PR environments, et cetera, for you.
And then you can adopt kind of the production features
as you kind of want to go and cut over.
We have a bunch of people who will like,
they'll start on that for like three to six months
and then they'll move kind of
like the production environment.
So you have these kind of like POCs where you're incrementally cutting things over.
And then in terms of like how you find these customers, right, there's various
different like channel partners that I won't reveal like where we're getting a
lot of our business from in general.
But like there are industries for which, you know, they're not the Netflix is the
coin bases that kind of et cetera of the world that
have these platform themes.
They basically say, hey, listen, we
have initiatives to either cut cost or increase velocity,
and we really, really want to do both.
How do we go and make that happen in general?
And so they're heavily incentivized
to actually go in and say, let's rip and replace
a lot of this infrastructure.
And then we can work with them from our solutions
or support engineering team to say,
OK, here's the lay of your land.
Let's start with this.
Let's move to this.
And then we'll get the database in general.
And the database, as you mentioned,
is the last bastion of the thing that matters because
of the data gravity.
I think the clouds have done a really, really good job.
Even looking at this is why egress
is so expensive on regular clouds
is because they don't want you pulling
that data out, right?
Because they understand the thing that we understand,
which is the only thing that matters for compute
is proximity to data.
And so you're talking about all these edge compute workloads
or anything else like that.
And then you look at a real world application,
and it's firing auth requests to a Redis instance
in a different region.
It's firing four different chain database calls back and forth
all the way here.
And it creates this massive latency overhead.
So getting to that database and that proximity to that data,
whether you're deploying into somebody else's cloud,
we can deploy into AWS GCP or our own bare metal now.
So long as it's sitting near that database,
we have a kind of eyes on way to say, hey listen, we can save you money,
we can move faster, all of these other things,
and just do it incrementally.
So maybe on this point,
so much of your latest tweets
is talking about the cost.
Customers spending 500k on Google,
coming to Railway,
unfortunately just 50k a year.
I find it so fascinating.
So for all the in-front nerds like us, we want to know coming to railway, you know, unfortunately, just 50K a year, right? I find it so fascinating.
So for all the in-front nerds like us, right,
we want to know, like, what is the things you worked on
to really make this sort of cost reduction possible?
I'm sure the multi-tenancy is part of the story,
because I think you mentioned that as well,
but I'm sure just multi-tenants alone isn't,
like the simplest form of it,
is not going to make you that easily to be able to do cost reduction. Right. If you're pushing the cost even further down now, what
is has been the type of things you have to do to make the cost actually much lower than
just anybody just doing a very typical, you know, putting pause together.
Yeah. So this may be like verges on my hot take and my hot take and what we've kind of
like almost like skipped to the last step on in the interest of like growing our computer under management as quickly as possible is that all compute will be metered by the minute, right?
Like compute becomes a utility. You are not going to be purchasing box size boxes from AWS. You are going to be deploying something and running it. And you are going to be paying for the minimum, minimum possible amount that you could possibly pay for.
I've often joked with our marketing lead
that our best pricing page would not have any numbers on it.
It would simply say, we are going
to make the cheapest compute possible,
and we are going to charge you 10% to 15% on top of that.
That's it.
And so when you look at and compare that to larger cloud
providers, like AWS, where they're selling these boxes,
they ultimately end up not using.
And then on top of that, what they're doing is they're, you know,
over committing workloads, you're using spot instances,
all these other things to really kind of like drive up the like margins they've got there, right?
Like our goal is actually to go and say like,
what we're building is a mechanism for compute management
that will disrupt the traditional cloud providers
because they sell these boxes and because they sell this over subscription.
And so they're incentivized actually
to not disrupt themselves, right?
As much as like Lambda is very, very cool
and they're kind of pushing it in general,
it's also really fucking expensive, right?
So it's still expensive just in a different way, right?
But the goal is to kind of that like Lambda style
kind of like compute with the ability
to do stateful at the smallest possible kind of like price
point, which is paid by the maintenance, right? And so via us doing metal, which like drops
the cost significantly, and us writing our own orchestration engine to build like pack as many
of these things as possible onto instances, that's where most of the savings actually comes from for
us, right? And then being able to kind of like sprinkle these like hobbies, serverless PR, etc.
environments all across these instances. Like if you've ever looked or like listened to any of like the early Borg talks, you know,
they're talking about like their packing mechanism, right? I forget who the author is,
right? But he goes into the basic like, so yeah, this is like Gen 9 of our like packing system. And
you know, we were able to pack these like batch workloads here and then we made changes here
and it got significantly worse. And here's why it got worse, right? And so like a lot of the
cloud providers have kind of like done this on the bit-impact level,
and we're trying to do it on almost the real-time operating
systems principle level, where we just say,
your compute workload is going to scale.
We're going to move either other things around it or itself
and live migrate it to different instances
so you don't get any interruptions or anything else
like that, and then charge you, again, that premium
that if we do our job correctly,
right, like it'll be that premium on, you know,
$1 billion in revenue, $10 billion in revenue,
$100 billion in revenue, right?
Like data center markets are like a half trillion dollar,
like a year market as of right now.
So it's just massively growing in terms of like
what the size is.
And so it is kind of like an economies of scale business
where we're just kind of like barreling directly towards what the most possible efficient way of doing this thing is and
saying, let's just get really, really fucking good at doing this. While the other client people are
not incentivized to go in and do this. And that's where we'll make our margins.
A couple of interesting things you said there, right? Which is one, something that no one
actually does say outside of Google, which is like, I mean, I'm sure Tim could talk a lot about how Mezos enabled this sort of like packing problem. Because if I remember correctly,
the Mezos schedule was like built to solve this specific problem. But it's like, interesting,
like I'm curious to hear, this is the thing that starts to differentiate you versus like say a
render versus say like a flat IO, like these are like, we have all these, what I call, what I start
to call like these Neo clouds, right?
And you're part of this group of Neo clouds
and there's GPU specific ones and right.
And they all have these sort of different parallels
of focus and it sounds like one of yours
is simplicity and cost.
And like the amount of engineering that you can do
to drive costs down is actually very impressive.
So I'm curious, like, you know,
one of the things that we're talking about
is how you get instances packed together closely, tightly.
Like what are other places that you've made real investments
that you wouldn't, a normal platform team at like Random Co
and the mid market would never ever think about doing
and the clouds don't give it to you out of the box.
Yeah, I think there's a couple of things there in general.
I think writing our own orchestration engine
is the one that I reach for the most in general,
because most people will just use cube or Nomad
or something else like that.
Because it's sufficient for their use cases.
They don't need to pack these things as tightly as possible.
But I would say that kind of leads,
it's almost like leads credence to the actual real thing
that we do, which is we try to really go
deep on a lot of these things.
We are writing EBPF, we're working with the kernel, we're working with kernel modules.
We're going really, really deep on things that I think are almost lost arts, if that makes sense,
or deeply descentivized in current gen of, I would say, infrastructure builders.
So we're trying to really pop the hood on a lot of things that Cube would really abstract for you,
like block storage through the, what is it,
the CRI or whatever this object thing.
We're trying to really, really pop the hood
and go deep on a lot of those things.
And kind of one of the core ethos of the company
is just try and do hard things.
And if you're faced with two unique paths
and you can build some sort of competitive advantage
by simply doing a hard thing, there's really only two motes
in anything.
It's hard decisions and hard work.
And so if you do that and you go deeper on a lot of these things,
then you're kind of in a situation
where you keep popping the hood and you keep uncovering these,
wait, why are we doing it this way, kind of things.
And then you can kind of make those changes.
And I would say probably like 60% to 75% of them
are positive and work out well.
And then the other 20% to 25%, you figure out,
and you're like, oh, that's exactly why.
They were doing it that way.
That makes a ton of sense.
But every layer of kind of indirection that you pop,
you end up actually kind actually getting that almost unlock,
whether it's a performance unlock, or whether it's a cost
unlock, or whether it's something else like that.
And so when other companies choose
to go a little bit wider, we choose
to go deeper so that we can try and solve these problems
at a generalizable level.
We're trying to solve generalizable block storage
across any compute that you have.
The goal is to curl a binary onto a box.
It runs anywhere.
You create a massive mesh of your compute.
We will give you keys for this compute.
And then that's your ring of device trust, essentially.
And you can land those workloads on that.
And that whole system is essentially
what powers the Railway Orchestration
Engine as of right now.
We will use WireGuard to go and peer these things together.
And then we can also allow you to do like private kind of offshoots in general.
Right?
So that's kind of the goal, but like I would say that that's where we find most of
our interesting stuff, right?
Um, is going to be on kernel level.
So like that EBPF stuff, like having to look at like things like IOU ring,
um, stuff like that.
Right.
Yeah.
I'm really fascinated because I think a lot of companies, usually when it gets some certain
skill or certain size, now you specialize teams into very, very specific places, right?
You know, have a team just working like an in a board team, right?
There's a certain number of people just working on certain things.
And there's actually another set of people actually trying to do the max mixing, the
batch workloads and interactive workloads, because it takes different type of skill sets
to even learn how to navigate the system level work
and the sort of AI or statistical work
to do this sort of like multiplexing of workloads.
And I worked on a lot of this kind of stuff.
And I just was curious how you think about you,
the first thing that comes to mind is like,
how do you run a team?
How do you hire and run a team that works across
the whole gambits of these things?
Because you have all the way to developer packing system,
developer experience stuff, now into the metal stuff.
Just doing metal alone is a little daunting in my mind
as a small team.
And so do you have very tiny teams doing certain things
or do you have certain people doing across different? how do you think about even like getting the right
people to do what because I'm sure not all developers have all experience working at
all layers.
Yeah, I think like org design is a very, very, very, very real engineering problem.
Like you are designing a distributed system.
You are designing ways for like humans to communicate with each other.
Right. And you know And humans are not computers.
They're not going to be reliable.
It's like you've 0.9 availability on a human.
And so it's like you need to build out
these systems such that people can actually
work and develop this context across multiple different zones
so they can get this kind of almost like,
I don't know if you've ever read Range by what's,
I forget his name, it's something Epstein,
but not that Epstein, you know?
It's like, how do you accumulate this knowledge
across this range of topics
so that you can almost compress it and say,
again, why the fuck are we doing it this way
when we could actually just be doing it this way, right?
And the only way that works is by hiring people
who either want to develop that range from specialists
or have that range and want to go and really, really
develop the focus acumen, et cetera want to go and really, really develop the focus
acumen, et cetera, to go and specialize in those topics
to create T-shaped individuals.
So we're like 25 people right now.
We manage a million users and a lot of computing in general.
And we're all across the globe.
So there's a nice bonus there in terms of I work with this person
in Thailand.
Or I was literally handing off a PR to a guy
in Spain this morning, cause we're like, okay, like
how do we get this thing done like for this week, right?
And so just having to design the org so that that thing works
and that you get reliability at scale while simultaneously
allowing people to like cycle out and not be like
bus factor zero, it's like, that's a lot of kind of like
the challenge in general, right?
And the only real solution to that is like basically just hiring really,
really great people. Right.
And so we try and focus on like,
how do we actually get the leverage per individual to be really, really high?
Like one of our core like KPIs is like revenue per employee, right?
Like how do we generate as much revenue per employee as possible so that we can
go and like shovel that back to people and say, listen,
if we're going to hire really, really excellent people, right.
Like we can do it on equity
initially, right. But at a certain point, like you need to
be able to like actually back it up with cash. So because people
have like general kind of like life commitments, all those
other things, right. So that's kind of what we focus on in
general, right, from that perspective. But yeah, that's
that's a very hard part of it, right, especially as you start
going and getting really, really deep, right, in terms of like,
you know, Linux fundamentals, it almost
seems like a lost art in terms of people who have range,
but also they're committing patches to the kernel,
and they're writing modules, and all of these other things
that you would traditionally associate with.
And I mean this with all the kindness of the world,
the gray beards of the world, right?
The wizards who are just like, they're seasoned,
and they're sages, and all those other things.
And they've got all this knowledge kind of like, I don't say like locked up, right? Like the wizards who are just like, you know, they're they're seasoned and they're they're sages and all those other things. Right. And they've got all this like knowledge kind of like, I don't say like a locked up.
Right. But just like from like 30, 40 years of like working on the kernel and like building out these systems.
Right. Or, you know, I think Metta does a really, really good job of like hiring these people like the Katran load balancer team.
Right. They're written all the CBP efforts, et cetera.
And, you know, if you try and go and poach any of these people, it's like you got $2 million a year comp packages, right?
And so trying to find people who want to go
and accumulate this very, very sparse knowledge
ends up being an interesting recruiting problem, but yeah.
I mean, I'm really curious,
one of the things I've talked to a lot of founders
about recently is sort of this discussion.
The future company, it's a lot on Twitter,
the future company is 30 people
with a billion in ARR or something.
The idea being that what fills in the gap is AI.
I'm really quite curious to hear your perspective
with your focus on revenue per employee
and your focus on doing all these really hard things.
Where do you find, where are you using AI or LLMs
or what forms automation are you applying at Railway
to help you get that leverage, right?
Like if there's a moment of anything right now,
you know, we're all talking about it.
We're either selling, we're helping people do it,
but like how are you at Railway taking advantage
of like your tools, like what workflows are you bringing on?
Like where are you looking for places where you get
a lot of leverage out of this stuff for internal stuff
to build on sort of Tim's question around like small team,
how do you get all this done?
Yeah, I would say that like a lot of the stuff we do
for better or worse in terms of like creating this leverage,
sometimes it's for worse and you have to buy something
off the shelf, but usually it's for the better.
That's kind of the like 80, 20 that I was talking
about prior.
We build a lot of shit in-house.
Our support tooling is entirely built in-house.
We use some AI primitives for like almost like
currying context in a thread, pulling in some AI primitives for almost like currying context
in a thread, pulling in information from docs
or anything else like that.
And then we build the system in such a way
where we have three support engineers,
and we try and just almost maximize for borderline Starcraft
gamer terms, like APM per ticket,
APM of just moving through these things
and attaching them to tickets in general.
So you can use AI there, but a lot of it
just ends up being like, if you end up building technology
from scratch, it's a lot more malleable.
You can change how this thing works from an org level.
And so if you end up buying it off of the shelf,
what you end up doing is you end up almost pulling
that kind of ossified structure inside of your organization. So the support tool that you buy and purchase actually ends up incentivizing
how you're going to go and build out your support organization because it only has specific
features or it works this way or this is how escalation works or anything else like that.
So for us, we've almost found really, really solid arbitrage in terms of saying like, listen,
let's use cloud for AI coding and stuff like that.
Let's just use those things to accelerate
the pace at which we can move from a development
perspective.
Let's build a tool that allows us to move really, really
quickly from an infrastructure perspective.
And let's build the tooling that we
will need to go and essentially scale that out with leverage
at scale using both of those technologies
that we have in general.
So it's kind of this nice blending
where we're kind of creating leverage at every layer.
And you also have that kind of flexibility to say,
oh, actually, here's how we're going to do escalations
because we can just build it ourselves from scratch.
All right, sir.
We'll move into our favorite section called
the spicy future.
Spicy future.
Well, you already kind of knew what we were going to talk about,
but I'm just curious, maybe tell us what you believe is a spicy hot take
that most people don't believe in yet.
I think like, and maybe this will sound kind of like lame,
but there's this whole like, we're going to do a billion dollars in revenue
with like 30 people in general,
and then there's the kind of like trad VC kind of way of like,
hey listen, if you don't build the most accelerated pathway to get to like IPO,
somebody else will. Right.
And they will hire a shitload of people and they will kind of just like scale
that up. Right. And so my hot take is almost like somewhere in between.
Right. Like any company that is going to be able to go and build this,
like leverage to get to like a billion dollars in annual revenue is actually
going to get out accelerated by some other company that's going to like go in and scale.
Right. But that will be a short term kind of like solution.
Right. Because they'll have like scaled really, really quickly.
They'll be comp loaded, etc. All of those other things.
Right. And so for me, I think the best almost like arbitrage of this
is kind of like sitting in the middle.
Right. So like, how do you grow headcount by 50 percent year over year
and grow revenue by five X. Right.
And I think if you're growing revenue by over 5x,
I think you're fucking something up.
Because you just don't have the capacity internally
to go and facilitate that.
And I joined Uber late in 2018 or anything else like that.
Post Travis, post hypergrowth, post whatever.
And when I talked with any of the old guard,
all of their fondest memories were
pre-period of them like for X year over year.
They were just like, we hired way too fast.
We like sheared all of our culture, all of those other things.
Right.
And so I think the pendulum is like, you know, it's on, if you check on like Twitter or X,
right, like it's swinging that general direction of like, we're going to high leverage all
of the other things.
Right.
And then you have the kind of like traditional VC stuff.
I think the reality is somewhere in between.
And that's what you're going to kind of like really,
really want to get on aim for,
because that means that you can scale your company,
you can scale your org and you can scale your revenue
in a competitive way without becoming bloated
if your target is to do a billion dollars
or $10 billion in revenue, right?
And that's going to give you the kind of like
almost maximum value for long-term of basically saying,
I don't actually have to be first in this leg of the race.
I can actually totally be second if that competitor
is gonna raise a shitload of money,
try and scale really, really quickly, become bloated,
slow their organization down,
and not be able to kind of like innovate at the same level
because you've just hired way more people in general.
So that's probably one of my hot takes from like,
I guess, VC side of things.
On tech side, I think people should just build more shit themselves.
And I think that people really just don't, they just don't lean into it.
And yes, it's hard.
It's hard to build data centers and stuff like that.
But you should do harder things.
I think the age old adage of risk is not as risky as you think
also applies to building hard things.
Hard things are hard, but they are not as hard as you would think.
And so there's good arbitrage in just simply building
hard things.
That said, you can't build everything yourself
from scratch.
So you do have to like, I know, Tailscale
has this concept of like innovation tokens.
You do have to like spend those innovation tokens.
You can't just basically be like, oh, we're
going to do everything, et cetera.
So being deliberate about like how you're
going to go and build that out is also part of like building,
I would say, maximum leverage.
Where do you think these like tools start to fall over? Like where does this AI stuff stop working for you? how you're going to go and build that out is also part of building, I would say, maximum leverage.
Where do you think these tools start to fall over?
Like, where does this AI stuff stop working for you?
And you're back to like, oh, it's just about having smart humans
that are capable of doing stuff.
Like, there's like a boundary box here, right?
Where you're like, no, this is a whole person.
Kind of from your experience, both like I looked at your product,
you use a lot of like autocomplete, you use a lot of like LOM,
natural language interfaces in the product, it's beautiful. You know, in the way that you build your support stuff, your teams using
I'm sure Cursor or Windsurf, like where's the like level of like, oh, this is it's really
good for these things. You get here and it's just terrible and it's never going to work.
And I'm kind of curious where that boundary is from your experience building a railway
and using it and how that impacts the way you envision sort of like the future of compute
and how that also helps you think about like,'s the future of where we really need to be?
So this is interesting.
As an aside, my roommates worked at OpenAI
for the past six years.
And so we have probably a monthly sauna session
where we'll go to the sauna.
It ends up basically just trending towards AI.
And I'm like, hell, where do you think
this stops in general?
And then we talk about theory of how this kind of all works.
And my theory is that there's no limit to where it goes.
So if you think of the AGI as compression,
you can get really asymptotically good compression.
So in terms of me stating a thing and saying, hey,
agentic whatever, go in and do this thing,
you'll be able to continue to go and do that,
and that will continue to get refined
to one nine of accuracy, two nines of accuracy, et cetera.
So bigger and bigger targets.
But as you get bigger and bigger things,
you're going to get bigger and bigger loss.
And you're already seeing this in general.
I think YC is trying to do this thing right now
where they're astroturfing windsurf, which is interesting,
where they're just saying the context model doesn't
whatever with cursor, but windsurf indexes the thing.
So how do you get essentially that context and enough space
for it to work in general?
I would say where it goes, it's almost a big lever
that will continue to pay dividends in general.
But the accuracy of that thing is
going to be almost all of the battle.
It's going to be like, how do you basically
say tune the AI to say, just do this thing?
Be very, very specific in terms of where you're going on this.
Because again, accuracy matters.
And the more scope you give it, the more loss, essentially,
you have in the system.
So I don't know actually where it stops in general.
I'm actually extremely bullish on it
when I wasn't, I call it a few years ago.
So I think that that goes there.
But I think in terms of people talk about AGI,
I think you will always need input.
You're not going to have this almost agentic thing that just
goes and becomes like Rocco's Basilisk or whatever.
You're not going to have this thing that just kind of rolls
out of control because it doesn't make sense
from a mental model of AI as compression,
because it won't do that expansion.
So I think that that's where it stops in general.
It will always be input driven.
And so at a certain point, because those tokens
end up being expensive, there's almost
a bounding box of what's almost economically efficient.
Because you have to reprompt this thing,
you have to pull in the whole context window.
You're essentially going to sheer tokens like,
no, tomorrow, the bigger this bounding box is.
So it's all about accuracy of those things.
And one thing I'll also say is I feel very bad for junior developers
in general, because it's really, really hard to get a lot of that context.
And you almost have this AI that you can lean on a ton
and be like, build me this game.
And then it's like, here's this game.
And you're like, I have no idea what this is, right?
And unless I know the scope of like, OK, actually, I
want you to write this very, very specific EBPF program
to mux these bytes like this, right,
and giving it very, very concrete prompts,
it's just going to add a control, right?
I've had some prompts that work reasonably well,
and then I look into them and like, them like oh actually you totally forgot like this thing
You're not dedupe any of these keys
And we're gonna totally like destroy production like if we if this like stuff makes it in there right so you still have to be accurate
About it right so I think it just becomes a lever for 10x engineers to become 100x 1000x etc engineers
Right I think that that's kind of where that goes
Awesome well we have so much we could ask you, but you know, just because of
respect of time, where can people find you? And also if people want to actually
try a railway, they're so intrigued by all the amazing stuff you do. Like,
where's the place to get started?
Yeah. So you can find me on Twitter or X or whatever you want to call it.
Nowadays, I'm just Jake. So J U S T and then Jake. And then whatever you want to call it nowadays. I'm just Jake, so J-U-S-T and then Jake.
And then if you want to try Railway, you can go to railway.com
or you can go to dev.new and just like point us
towards your GitHub repository.
We've built an open source to build engines from scratch
at this point.
So the goal is you don't need a Docker file.
You don't need anything else like that.
We will go in and parse in anything that you've got
that will help us kind of like infer the deployment
and then we'll go and get it up and running for you.
And if it doesn't work for you
in like a singular one-shot way,
message me on Twitter or X
and we will figure out how to like get that thing
up and running automatically so that we can version that
so that that helps other people in the future.
Cool.
Thanks for being on Jake.
We have a ton of fun.
Thanks for having me.
Yeah, fun as well.
Yeah, this was a ton of fun. Awesome. Thank you so much.