The Infra Pod - Owning both software and hardware to run the best cloud for deployment - Chat with Jake from Railway

Episode Date: May 19, 2025

In this episode of the Infra Pod, Tim and Ian sit down with Jake, CEO of Railway, to explore how Railway is reinventing cloud deployment. Jake shares the origin story of Railway, their unique approach... to simplifying cloud deployments, and the innovations that separate Railway from traditional cloud providers. With a focus on developer experience, cost efficiency, and leveraging hard-to-build technologies, Jake outlines Railway's journey from supporting small projects to attracting large-scale enterprises. The discussion delves into organizational strategies, technical challenges, and the potential future impact of AI on cloud infrastructure management.00:22 Railway's Mission and Approach22:45 Deep Dive into Orchestration and Kernel-Level Work30:07 Leveraging AI and In-House Tools31:55 The Spicy Future: Hot Takes and Predictions

Transcript
Discussion (0)
Starting point is 00:00:00 Welcome to the InfraPod. This is Tim from SNSVC and Ian, let's go. Hey, this is Ian, lover of bare metal compute apparently. Jake, I'm so excited to have you on the podcast. CEO of Railway. Why don't you introduce yourself, tell us how Railway got started and what in the fact Railway actually is. like really, really trivially easy to use a cloud platform. So as you're building and deploying, your complexity on traditional cloud providers becomes exponential. You've got to parse in Terraform.
Starting point is 00:00:51 You've got to parse in Helm. You've got to figure out how all of these pieces work together. We give you a really, really easy to use canvas where you just basically say, give me temporal. Give me Postgres. Whatever.
Starting point is 00:00:59 It'll spin that thing up, provide its open source. And then we'll straight that and make sure that that thing scales for you. So that So that's the kind of like larger end goal of it started about five years ago, because we were just, I guess, a little bit disappointed about the cloud landscape, like, you know, a lot of power kind of like locked behind a lot of kind of pain to actually kind of access these things. So we kind of like, as much as I hate the term, like democratizing,
Starting point is 00:01:24 whatever, right, it's like, it is kind of like, you want to make that way easier for people to go in and things. So we kind of like, as much as I hate the term like demurrocratizing, whatever, right? It's like, it is kind of like, you want to make that way easier for people to go in and use. So that's like probably the most applicable term, you know? So, you know, one of the things that you focus on with, with Railway is it's just like the simplicity. Like you go and look at the website and the first thing that pops in your face is like this demo
Starting point is 00:01:39 of how you can drag and drop infrastructure and just create it. Like, what was it about the way the infra was? And we were like, no, this is what I want on my front page. Like, how would you get to that conclusion and drop infrastructure and just create it. What was it about the way the infra was? We were like, no, this is what I want on my front page. How would you get to that conclusion that this is how you wanted to build the railway interface? Yeah. So I guess from a go-to-market perspective,
Starting point is 00:01:56 I don't know if you've heard the saying, if you're everything to everybody, you're nobody to anybody. And so for us, we wanted to focus on a segment that was disappointed like us with these cloud primitives. So for us, we wanted to focus on a segment that was kind of disappointed like us with these cloud primitives. So for us, we put simplicity front and center. We put support front and center.
Starting point is 00:02:11 And so things, again, that AWS or GCP or whatever just fundamentally isn't. And so that's why we've gone with the branding that we have currently. It's also a future that we believe in. We want to say that these things need to be significantly easier so that everybody can access them and then spend less time doing infrastructure.
Starting point is 00:02:29 Because that's the whole end goal, is you spend less time gluing and more time doing. And so I think for folks that probably, most developers probably already seen or heard what you guys do. But I'm sure there's probably still a number of folks that don't know what Real Way truly is. And this sort of back end as a service, for a lot of developers like us,
Starting point is 00:02:51 sometimes we don't really know what the difference is between most of these things. So maybe you can give us at least an introduction of Real Way. Like hey, this is what we do and this is one of the most special things people always talk about really. Why is this effing cool to use it?
Starting point is 00:03:03 Maybe some of that color would be great, man. Yeah. So if you look at like traditional kind of back end as a service providers, they're pretty vertical in terms of what they allow you to go in and do, right? Like Heroku is, you know, they'll outsource a lot of their state to kind of their marketplace. Versel kind of does the same thing generally.
Starting point is 00:03:19 You know, even you mentioning it's like a back end as a service, it's, you know, we do front end, we do anything, right? So we want to be basically a next gen hyperscaler where you can do anything, right? But it is significantly more trivial than the larger cloud providers. So it's not like kind of locked in.
Starting point is 00:03:33 You're not going to get to a point where you're saying, oh, no, you just, you can't do that on railway, right? You can do anything on railway because we've built a lot of the kind of like primitives to solve a lot of these hard problems, right? Like we will manage the staple storage for you. We will go and manage the containers for you, right? Like you don't have to change anything about your application.
Starting point is 00:03:49 You just hand it to us and then we'll go and figure out how to go in and deploy it. Right. So I would say that's the kind of like main difference with Railways. Like we try and really, really meet you where you are exactly. You're not going to have to go and make changes or anything else like that. You can kind of just deploy anything you want. I've been following you guys' products and I'm reading a lot of like the latest announcements. And I feel like everybody's backend or whatever we call this as a cloud, everyone have a different take. What is the most important parts of the stack is. And
Starting point is 00:04:20 there's a lot of emphasis, I realize, for you on really getting developer experience, right? But also like taking away the abstractions of infrastructure required. And every platform sometimes trying to claim this, but they do it so differently, right? There's some different taste here that I think is really hard to nuancedly describe. Can you talk about like, what are the things like the sort of infra infra lists approach I saw in your blog point talk about infra lists, right? Like I don't want to feel like there is actually in front involved But you know for a lot of people that build
Starting point is 00:04:54 Backends they actually want to know what infra is because these are maybe they make the right choices and so there's like this weird blend So we talked about your taste What has been the driving thing that makes the design really what it is so far? And what are some of the maybe unique things you guys do or choices? I think we sort of did this in the early days where we like hid a bunch of things from you where we basically said, oh, you don't need that, right? You don't need like persistent volume claims. You don't need whatever. Like all of these, you don't need that, right? You don't need persistent volume claims. You don't need whatever.
Starting point is 00:05:26 Like all of these other, you don't need like Bazel-style rollouts or anything else like that. And our ethos has kind of changed over the years to basically say, well, you do actually need some fundamental way to go in and do that. But how do we compress that primitive? So for things like rollouts, we'll say, oh, if you have a dependency on any other service,
Starting point is 00:05:44 like if you're consuming the URL or the IP address or anything else like that, we can actually just construct a dependency graph for you. And so instead of you actually being to go and define any of these things and kind of manage that complexity yourself, what we're doing is saying, that complexity still exists, here's how you wrangle it.
Starting point is 00:05:59 And here's how you wrangle it in a way that's you go from exponential complexity to linear complexity. And so that's what we mean when we're talking about the canvas. For people who haven't used the Railway UI in general, it kind of looks like a Figma UI in Canvas. It's like an infinitely sprawling thing.
Starting point is 00:06:13 It's a bit different than traditional infrastructure code. But the reason we go and do that is actually twofold, because it allows you to drill into your service and only think about your service as that level of abstraction. And then you can zoom out to say, oh, I want more and more of this context.
Starting point is 00:06:29 But then you've linearized the complexity of your system because you don't have to think about any of those spanning microservices across my flat network or address space. And then what we also do is the Canvas will actually version these changes for you. So if you go and make those changes, it will keep track of them. And then if you go and merge them in, then it will actually just version these changes for you. So if you go and make those changes, it will keep track of them.
Starting point is 00:06:45 And then if you go and merge them in, then it will actually just replay those changes for you. So the kind of complexity of like, I have N microservices, I have to think about all of them, I have to digest them as part of this Terraform or break them up into modules, we manage that for you. And that versioning of, oh, let me try and make a change. Every DevOps engineer will tell you,
Starting point is 00:07:02 you go and you make the change in staging, you test it, and then you try and figure out how to make it actually either import or reproducible in Ansible or anything else like that. We just skip all of that. And we say, you make the changes in the canvas, we record those changes for you, and then we'll go and apply.
Starting point is 00:07:15 So that's another way that we kind of take that complexity. And we don't try and destroy it, right? There's a whole, like, complexity cannot be destroyed. But I do think complexity can be compressed, right? And so we try and take the approach of, how do we compress all of that complexity to give people primitives that feel like, once you learn it, you can kind of compose it together
Starting point is 00:07:32 with all of these other things and move really, really quickly. So you kind of feel like you're leveling up every single time you learn something new about the platform. So I think this would be a good time. If you think about the standard platform engineer person at like your mid market company, you know, the tools in their tool belt are like Terraform, Helm,
Starting point is 00:07:51 Kubernetes, and some cloud or series of clouds. And like the workflow there is I package up some services into Helm charts. I publish some Docker containers and Helm charts. I use Terraform to build up the cloud, the base bottom level cloud infrastructure. I get the RDS instance, get the EKS cluster, get all the IP address space, all the other stuff. And I use that to set up my staging, set up my production. And then I use Helm and maybe I'm
Starting point is 00:08:16 using Customize or something else on top of Helm to configure the KOOPS cluster. What's the workflow with Railway Lite? Like what's the Delta there? Because you're talking about this canvas, but I'm curious as an uneducated user of Railway, like how you talk about the complexity compression, but like what's the workflow of this canvas and how do you have multiple environments with a canvas?
Starting point is 00:08:40 Like is there one canvas into one environment or is it like broadcast out? The past production is an incredibly complex beast today. It was an incredibly complex beast 20 years ago. So how do you compress that path production complexity? Yeah, so I think Git does a lot of things right here. And so there's a lot of things to learn in general. So allowing you to start with one singular environment
Starting point is 00:08:59 and then make changes to that environment, and then go and say, actually, I want a staging environment. Let me take that config and let me just pull it into a new parallel environment that I can go and make changes for. You'll know I start with an application, API server, a worker, a Redis cache, and a Postgres instance. I'm hacking along in prod, and at some point, I say, OK, well,
Starting point is 00:09:22 we've got to get ops in here because we're just committing directly to main. And the same thing happens with infrastructure. The problem is that you have to go back and refactor all of that stuff. Everything has to become a module. And then you have to figure out how to version those modules so that you can push the staging and go
Starting point is 00:09:36 and do all those things. And so you have to set up a ton of different infrastructure when you want to move faster as a team, right? Versus with Railway, we just build it in right out of the gates and we've kind of built that versioning into the canvas so you don't have to go and kind of like, almost replay all of those things, right?
Starting point is 00:09:51 And so the workflow for users is basically just saying, I want this thing, I want this thing, et cetera, and we will keep track of all of those things for you, right? Got it. And so I think I want to really kind of figure out what is the type of audience you usually focus on when it comes to developers? Because I feel like there is a lot of cool stuff
Starting point is 00:10:10 you're announcing latest launch week, a lot of performance in metal, right? Performance on network, cause, trying to bring it down. I think focuses on folks that really want to run at scale production, because folks really care when it comes to the margins and how well I really want to like bet on this. But like I said, there's also like a huge amount of like the templates,
Starting point is 00:10:29 the canvases. I'm sure a lot of developers when they're just getting started, they don't know backend that well. They have just like, I don't even know what to do, what to start, right? There's a lot of this kind of like developers. So I'm sure you probably don't focus on like, I've just learned backend type of folks, right? I'm like, okay
Starting point is 00:10:45 I have something that needs to run in production and skill from there if I'm right like maybe talk about like a typical Persona and like the type of folks that you think is the type of developers you care the most Yeah We have what we call the go-to-market master plan Which is an internal documented railway that we've kind of like built out. And it's essentially about almost stretching your ICP. And so instead of shifting a market, you stretch a market and you go and build for those. So our current ICP is 50 to 250 person teams.
Starting point is 00:11:17 And so our previous ICP was actually 15 to 50 person teams. And that's kind of interesting because it's the scale at which you're like, OK, well, maybe we need to get a DevOps person involved, right? And that's kind of interesting because it's kind of like the scale at which you're like, okay, well, maybe we need to get like a DevOps person involved, et cetera, right? And so we've kind of like shifted that ICP from, you know, five years ago when we started the company,
Starting point is 00:11:32 you know, we were saying like, hey, it's literally anybody with a pulse. Like you have a, you know, tiny Discord bot that you want to go in and run. Like, like you're our ICP, like get in, get in here. We'll help you go in and do that, right? And then we've kind of slowly shifted to people who want to move fast to startups, and then Series A, et cetera,
Starting point is 00:11:48 kind of like companies who are not interested in kind of like trying to go and find and hire these DevOps people, like go and build out this kind of like machine on like a larger kind of like time horizon to now kind of like these Fortune 500 companies who are like, hey, we need like RBAC and like a way to do like granular permission structuring
Starting point is 00:12:03 and stuff like that. So that's again, like kind of a good example of like, hey, the complexity is RBAC and like a way to do like granular permission structuring and stuff like that, right? So that's, again, like kind of a good example of like, hey, the complexity is going to come at some point. How do you like make it really, really trivial for people to like go in and add people to your teams, right? So you're not like waiting, you know, three minutes for Terraform to like go in and get its projected state.
Starting point is 00:12:17 And then you apply it and it says, congrats, you don't have any permissions. Like, you know, go and ask your manager, right? So we've kind of like taken that approach in terms of like building out and layering it. So I would say that've kind of like taken that approach in terms of like building out and layering it. So I would say that's kind of like where we're at right now. It's a very interesting spot because you get to these conversations, right?
Starting point is 00:12:31 And this is kind of why we rolled out the metal stuff, right? Because even if our workflows are really, really good, you know, the conversations on the other side of the table ends up being, oh yeah, it's great and we'll move faster, but it's still a bet on our Ench team and it's more expensive, right?
Starting point is 00:12:45 And so we were like, okay, well, we have to be better and cheaper, right? The burden of proof is on us, right? Because we have these massive cloud providers and they have a really large track record. Look at the big three and then CloudFlare, right? People are like, oh yeah, I wouldn't host whatever on my important stuff on CloudFlare.
Starting point is 00:13:02 CloudFlare has been around for ever. It powers half the internet. Anytime that there's anything, it's like an engineer snow day for Cloudflare or whatever. So again, the burden of proof is really, really high. So we have to be both faster and cheaper and have these better workflows. So those are the things that we've built out over time.
Starting point is 00:13:20 It's interesting, right? It's like that movement that you just mentioned from the ICP of like a, you know, 15 to 50, the 50 to 150 of that camera with the exact range set at 250, something like that. I assume that's like engineers, right? I'm curious at the 50 plus engineers, they're probably deployed someplace. So is this like you're actively going in like winning compute from like the hyper scale
Starting point is 00:13:43 clouds? I'm kind of curious, curious, how does that work? I'm sure there's some secrets, but broadly speaking, is it specific industries or these people, they got all in on the mono cloud and they turn into this complexity explosion and they're wasting all this money on stuff and they don't care.
Starting point is 00:13:58 What's the process look like to try and win these mid-market customers? I get the value prop you're selling, which is stop spending so much money on trying to build your own platform, build on top of our platform, and just focus on the thing that matters to you. Get off these clouds and give me the right primitives,
Starting point is 00:14:12 some version of that, I'm sure. But it's interesting, because the long-term thought here has always been like, well, the clouds will ultimately win. The Habano clouds will ultimately win, like GCP, AWS, Azure, and primarily because they have data gravity. Once the data is in there, it's so hard to get out, right?
Starting point is 00:14:27 So I'm just, it's just an interesting thought process because it is a narrative violation to say, I'm going to go win spend away from like the big three. So I'm curious to learn more. And I'm sure our listeners are as well. Yeah, for sure. It's a very interesting and I guess like a delicate dance as you like kind of get up into the larger spends when you're talking like 250k, 500k, like million dollar spend on cloud.
Starting point is 00:14:50 Because you're across the table from a director of engineering and the director of engineering's main concern is usually it's like engineering velocity. But then it's also just like reliability, right? Like I just got to make sure that like we move fast and then the system stays up, we deliver value for our customers and we're able to move quickly, right? So what gets us usually in the door is that kind of like promise of like being able to move quickly in general, right? And then what allows us to kind of like move
Starting point is 00:15:15 through those sections is just incremental adoption, honestly, right? Like there are workloads where, you know, some use cases they're not so critical. People will start with like-tier zero services. They'll start with non-database services. They'll start with really, really kind of like, trivial use cases.
Starting point is 00:15:30 Staging environments is a really, really good one where we can just like, we can manage staging, we can manage PR environments, et cetera, for you. And then you can adopt kind of the production features as you kind of want to go and cut over. We have a bunch of people who will like, they'll start on that for like three to six months and then they'll move kind of
Starting point is 00:15:45 like the production environment. So you have these kind of like POCs where you're incrementally cutting things over. And then in terms of like how you find these customers, right, there's various different like channel partners that I won't reveal like where we're getting a lot of our business from in general. But like there are industries for which, you know, they're not the Netflix is the coin bases that kind of et cetera of the world that have these platform themes.
Starting point is 00:16:07 They basically say, hey, listen, we have initiatives to either cut cost or increase velocity, and we really, really want to do both. How do we go and make that happen in general? And so they're heavily incentivized to actually go in and say, let's rip and replace a lot of this infrastructure. And then we can work with them from our solutions
Starting point is 00:16:24 or support engineering team to say, OK, here's the lay of your land. Let's start with this. Let's move to this. And then we'll get the database in general. And the database, as you mentioned, is the last bastion of the thing that matters because of the data gravity.
Starting point is 00:16:38 I think the clouds have done a really, really good job. Even looking at this is why egress is so expensive on regular clouds is because they don't want you pulling that data out, right? Because they understand the thing that we understand, which is the only thing that matters for compute is proximity to data.
Starting point is 00:16:53 And so you're talking about all these edge compute workloads or anything else like that. And then you look at a real world application, and it's firing auth requests to a Redis instance in a different region. It's firing four different chain database calls back and forth all the way here. And it creates this massive latency overhead.
Starting point is 00:17:11 So getting to that database and that proximity to that data, whether you're deploying into somebody else's cloud, we can deploy into AWS GCP or our own bare metal now. So long as it's sitting near that database, we have a kind of eyes on way to say, hey listen, we can save you money, we can move faster, all of these other things, and just do it incrementally. So maybe on this point,
Starting point is 00:17:32 so much of your latest tweets is talking about the cost. Customers spending 500k on Google, coming to Railway, unfortunately just 50k a year. I find it so fascinating. So for all the in-front nerds like us, we want to know coming to railway, you know, unfortunately, just 50K a year, right? I find it so fascinating. So for all the in-front nerds like us, right,
Starting point is 00:17:48 we want to know, like, what is the things you worked on to really make this sort of cost reduction possible? I'm sure the multi-tenancy is part of the story, because I think you mentioned that as well, but I'm sure just multi-tenants alone isn't, like the simplest form of it, is not going to make you that easily to be able to do cost reduction. Right. If you're pushing the cost even further down now, what is has been the type of things you have to do to make the cost actually much lower than
Starting point is 00:18:15 just anybody just doing a very typical, you know, putting pause together. Yeah. So this may be like verges on my hot take and my hot take and what we've kind of like almost like skipped to the last step on in the interest of like growing our computer under management as quickly as possible is that all compute will be metered by the minute, right? Like compute becomes a utility. You are not going to be purchasing box size boxes from AWS. You are going to be deploying something and running it. And you are going to be paying for the minimum, minimum possible amount that you could possibly pay for. I've often joked with our marketing lead that our best pricing page would not have any numbers on it. It would simply say, we are going to make the cheapest compute possible,
Starting point is 00:18:56 and we are going to charge you 10% to 15% on top of that. That's it. And so when you look at and compare that to larger cloud providers, like AWS, where they're selling these boxes, they ultimately end up not using. And then on top of that, what they're doing is they're, you know, over committing workloads, you're using spot instances, all these other things to really kind of like drive up the like margins they've got there, right?
Starting point is 00:19:15 Like our goal is actually to go and say like, what we're building is a mechanism for compute management that will disrupt the traditional cloud providers because they sell these boxes and because they sell this over subscription. And so they're incentivized actually to not disrupt themselves, right? As much as like Lambda is very, very cool and they're kind of pushing it in general,
Starting point is 00:19:33 it's also really fucking expensive, right? So it's still expensive just in a different way, right? But the goal is to kind of that like Lambda style kind of like compute with the ability to do stateful at the smallest possible kind of like price point, which is paid by the maintenance, right? And so via us doing metal, which like drops the cost significantly, and us writing our own orchestration engine to build like pack as many of these things as possible onto instances, that's where most of the savings actually comes from for
Starting point is 00:19:57 us, right? And then being able to kind of like sprinkle these like hobbies, serverless PR, etc. environments all across these instances. Like if you've ever looked or like listened to any of like the early Borg talks, you know, they're talking about like their packing mechanism, right? I forget who the author is, right? But he goes into the basic like, so yeah, this is like Gen 9 of our like packing system. And you know, we were able to pack these like batch workloads here and then we made changes here and it got significantly worse. And here's why it got worse, right? And so like a lot of the cloud providers have kind of like done this on the bit-impact level, and we're trying to do it on almost the real-time operating
Starting point is 00:20:28 systems principle level, where we just say, your compute workload is going to scale. We're going to move either other things around it or itself and live migrate it to different instances so you don't get any interruptions or anything else like that, and then charge you, again, that premium that if we do our job correctly, right, like it'll be that premium on, you know,
Starting point is 00:20:49 $1 billion in revenue, $10 billion in revenue, $100 billion in revenue, right? Like data center markets are like a half trillion dollar, like a year market as of right now. So it's just massively growing in terms of like what the size is. And so it is kind of like an economies of scale business where we're just kind of like barreling directly towards what the most possible efficient way of doing this thing is and
Starting point is 00:21:08 saying, let's just get really, really fucking good at doing this. While the other client people are not incentivized to go in and do this. And that's where we'll make our margins. A couple of interesting things you said there, right? Which is one, something that no one actually does say outside of Google, which is like, I mean, I'm sure Tim could talk a lot about how Mezos enabled this sort of like packing problem. Because if I remember correctly, the Mezos schedule was like built to solve this specific problem. But it's like, interesting, like I'm curious to hear, this is the thing that starts to differentiate you versus like say a render versus say like a flat IO, like these are like, we have all these, what I call, what I start to call like these Neo clouds, right?
Starting point is 00:21:45 And you're part of this group of Neo clouds and there's GPU specific ones and right. And they all have these sort of different parallels of focus and it sounds like one of yours is simplicity and cost. And like the amount of engineering that you can do to drive costs down is actually very impressive. So I'm curious, like, you know,
Starting point is 00:22:03 one of the things that we're talking about is how you get instances packed together closely, tightly. Like what are other places that you've made real investments that you wouldn't, a normal platform team at like Random Co and the mid market would never ever think about doing and the clouds don't give it to you out of the box. Yeah, I think there's a couple of things there in general. I think writing our own orchestration engine
Starting point is 00:22:25 is the one that I reach for the most in general, because most people will just use cube or Nomad or something else like that. Because it's sufficient for their use cases. They don't need to pack these things as tightly as possible. But I would say that kind of leads, it's almost like leads credence to the actual real thing that we do, which is we try to really go
Starting point is 00:22:44 deep on a lot of these things. We are writing EBPF, we're working with the kernel, we're working with kernel modules. We're going really, really deep on things that I think are almost lost arts, if that makes sense, or deeply descentivized in current gen of, I would say, infrastructure builders. So we're trying to really pop the hood on a lot of things that Cube would really abstract for you, like block storage through the, what is it, the CRI or whatever this object thing. We're trying to really, really pop the hood
Starting point is 00:23:14 and go deep on a lot of those things. And kind of one of the core ethos of the company is just try and do hard things. And if you're faced with two unique paths and you can build some sort of competitive advantage by simply doing a hard thing, there's really only two motes in anything. It's hard decisions and hard work.
Starting point is 00:23:33 And so if you do that and you go deeper on a lot of these things, then you're kind of in a situation where you keep popping the hood and you keep uncovering these, wait, why are we doing it this way, kind of things. And then you can kind of make those changes. And I would say probably like 60% to 75% of them are positive and work out well. And then the other 20% to 25%, you figure out,
Starting point is 00:23:55 and you're like, oh, that's exactly why. They were doing it that way. That makes a ton of sense. But every layer of kind of indirection that you pop, you end up actually kind actually getting that almost unlock, whether it's a performance unlock, or whether it's a cost unlock, or whether it's something else like that. And so when other companies choose
Starting point is 00:24:12 to go a little bit wider, we choose to go deeper so that we can try and solve these problems at a generalizable level. We're trying to solve generalizable block storage across any compute that you have. The goal is to curl a binary onto a box. It runs anywhere. You create a massive mesh of your compute.
Starting point is 00:24:28 We will give you keys for this compute. And then that's your ring of device trust, essentially. And you can land those workloads on that. And that whole system is essentially what powers the Railway Orchestration Engine as of right now. We will use WireGuard to go and peer these things together. And then we can also allow you to do like private kind of offshoots in general.
Starting point is 00:24:47 Right? So that's kind of the goal, but like I would say that that's where we find most of our interesting stuff, right? Um, is going to be on kernel level. So like that EBPF stuff, like having to look at like things like IOU ring, um, stuff like that. Right. Yeah.
Starting point is 00:25:01 I'm really fascinated because I think a lot of companies, usually when it gets some certain skill or certain size, now you specialize teams into very, very specific places, right? You know, have a team just working like an in a board team, right? There's a certain number of people just working on certain things. And there's actually another set of people actually trying to do the max mixing, the batch workloads and interactive workloads, because it takes different type of skill sets to even learn how to navigate the system level work and the sort of AI or statistical work
Starting point is 00:25:34 to do this sort of like multiplexing of workloads. And I worked on a lot of this kind of stuff. And I just was curious how you think about you, the first thing that comes to mind is like, how do you run a team? How do you hire and run a team that works across the whole gambits of these things? Because you have all the way to developer packing system,
Starting point is 00:25:53 developer experience stuff, now into the metal stuff. Just doing metal alone is a little daunting in my mind as a small team. And so do you have very tiny teams doing certain things or do you have certain people doing across different? how do you think about even like getting the right people to do what because I'm sure not all developers have all experience working at all layers. Yeah, I think like org design is a very, very, very, very real engineering problem.
Starting point is 00:26:19 Like you are designing a distributed system. You are designing ways for like humans to communicate with each other. Right. And you know And humans are not computers. They're not going to be reliable. It's like you've 0.9 availability on a human. And so it's like you need to build out these systems such that people can actually work and develop this context across multiple different zones
Starting point is 00:26:39 so they can get this kind of almost like, I don't know if you've ever read Range by what's, I forget his name, it's something Epstein, but not that Epstein, you know? It's like, how do you accumulate this knowledge across this range of topics so that you can almost compress it and say, again, why the fuck are we doing it this way
Starting point is 00:26:53 when we could actually just be doing it this way, right? And the only way that works is by hiring people who either want to develop that range from specialists or have that range and want to go and really, really develop the focus acumen, et cetera want to go and really, really develop the focus acumen, et cetera, to go and specialize in those topics to create T-shaped individuals. So we're like 25 people right now.
Starting point is 00:27:12 We manage a million users and a lot of computing in general. And we're all across the globe. So there's a nice bonus there in terms of I work with this person in Thailand. Or I was literally handing off a PR to a guy in Spain this morning, cause we're like, okay, like how do we get this thing done like for this week, right? And so just having to design the org so that that thing works
Starting point is 00:27:33 and that you get reliability at scale while simultaneously allowing people to like cycle out and not be like bus factor zero, it's like, that's a lot of kind of like the challenge in general, right? And the only real solution to that is like basically just hiring really, really great people. Right. And so we try and focus on like, how do we actually get the leverage per individual to be really, really high?
Starting point is 00:27:52 Like one of our core like KPIs is like revenue per employee, right? Like how do we generate as much revenue per employee as possible so that we can go and like shovel that back to people and say, listen, if we're going to hire really, really excellent people, right. Like we can do it on equity initially, right. But at a certain point, like you need to be able to like actually back it up with cash. So because people have like general kind of like life commitments, all those
Starting point is 00:28:12 other things, right. So that's kind of what we focus on in general, right, from that perspective. But yeah, that's that's a very hard part of it, right, especially as you start going and getting really, really deep, right, in terms of like, you know, Linux fundamentals, it almost seems like a lost art in terms of people who have range, but also they're committing patches to the kernel, and they're writing modules, and all of these other things
Starting point is 00:28:34 that you would traditionally associate with. And I mean this with all the kindness of the world, the gray beards of the world, right? The wizards who are just like, they're seasoned, and they're sages, and all those other things. And they've got all this knowledge kind of like, I don't say like locked up, right? Like the wizards who are just like, you know, they're they're seasoned and they're they're sages and all those other things. Right. And they've got all this like knowledge kind of like, I don't say like a locked up. Right. But just like from like 30, 40 years of like working on the kernel and like building out these systems. Right. Or, you know, I think Metta does a really, really good job of like hiring these people like the Katran load balancer team.
Starting point is 00:28:59 Right. They're written all the CBP efforts, et cetera. And, you know, if you try and go and poach any of these people, it's like you got $2 million a year comp packages, right? And so trying to find people who want to go and accumulate this very, very sparse knowledge ends up being an interesting recruiting problem, but yeah. I mean, I'm really curious, one of the things I've talked to a lot of founders about recently is sort of this discussion.
Starting point is 00:29:22 The future company, it's a lot on Twitter, the future company is 30 people with a billion in ARR or something. The idea being that what fills in the gap is AI. I'm really quite curious to hear your perspective with your focus on revenue per employee and your focus on doing all these really hard things. Where do you find, where are you using AI or LLMs
Starting point is 00:29:42 or what forms automation are you applying at Railway to help you get that leverage, right? Like if there's a moment of anything right now, you know, we're all talking about it. We're either selling, we're helping people do it, but like how are you at Railway taking advantage of like your tools, like what workflows are you bringing on? Like where are you looking for places where you get
Starting point is 00:29:59 a lot of leverage out of this stuff for internal stuff to build on sort of Tim's question around like small team, how do you get all this done? Yeah, I would say that like a lot of the stuff we do for better or worse in terms of like creating this leverage, sometimes it's for worse and you have to buy something off the shelf, but usually it's for the better. That's kind of the like 80, 20 that I was talking
Starting point is 00:30:17 about prior. We build a lot of shit in-house. Our support tooling is entirely built in-house. We use some AI primitives for like almost like currying context in a thread, pulling in some AI primitives for almost like currying context in a thread, pulling in information from docs or anything else like that. And then we build the system in such a way
Starting point is 00:30:30 where we have three support engineers, and we try and just almost maximize for borderline Starcraft gamer terms, like APM per ticket, APM of just moving through these things and attaching them to tickets in general. So you can use AI there, but a lot of it just ends up being like, if you end up building technology from scratch, it's a lot more malleable.
Starting point is 00:30:53 You can change how this thing works from an org level. And so if you end up buying it off of the shelf, what you end up doing is you end up almost pulling that kind of ossified structure inside of your organization. So the support tool that you buy and purchase actually ends up incentivizing how you're going to go and build out your support organization because it only has specific features or it works this way or this is how escalation works or anything else like that. So for us, we've almost found really, really solid arbitrage in terms of saying like, listen, let's use cloud for AI coding and stuff like that.
Starting point is 00:31:25 Let's just use those things to accelerate the pace at which we can move from a development perspective. Let's build a tool that allows us to move really, really quickly from an infrastructure perspective. And let's build the tooling that we will need to go and essentially scale that out with leverage at scale using both of those technologies
Starting point is 00:31:41 that we have in general. So it's kind of this nice blending where we're kind of creating leverage at every layer. And you also have that kind of flexibility to say, oh, actually, here's how we're going to do escalations because we can just build it ourselves from scratch. All right, sir. We'll move into our favorite section called
Starting point is 00:31:58 the spicy future. Spicy future. Well, you already kind of knew what we were going to talk about, but I'm just curious, maybe tell us what you believe is a spicy hot take that most people don't believe in yet. I think like, and maybe this will sound kind of like lame, but there's this whole like, we're going to do a billion dollars in revenue with like 30 people in general,
Starting point is 00:32:21 and then there's the kind of like trad VC kind of way of like, hey listen, if you don't build the most accelerated pathway to get to like IPO, somebody else will. Right. And they will hire a shitload of people and they will kind of just like scale that up. Right. And so my hot take is almost like somewhere in between. Right. Like any company that is going to be able to go and build this, like leverage to get to like a billion dollars in annual revenue is actually going to get out accelerated by some other company that's going to like go in and scale.
Starting point is 00:32:46 Right. But that will be a short term kind of like solution. Right. Because they'll have like scaled really, really quickly. They'll be comp loaded, etc. All of those other things. Right. And so for me, I think the best almost like arbitrage of this is kind of like sitting in the middle. Right. So like, how do you grow headcount by 50 percent year over year and grow revenue by five X. Right. And I think if you're growing revenue by over 5x,
Starting point is 00:33:06 I think you're fucking something up. Because you just don't have the capacity internally to go and facilitate that. And I joined Uber late in 2018 or anything else like that. Post Travis, post hypergrowth, post whatever. And when I talked with any of the old guard, all of their fondest memories were pre-period of them like for X year over year.
Starting point is 00:33:26 They were just like, we hired way too fast. We like sheared all of our culture, all of those other things. Right. And so I think the pendulum is like, you know, it's on, if you check on like Twitter or X, right, like it's swinging that general direction of like, we're going to high leverage all of the other things. Right. And then you have the kind of like traditional VC stuff.
Starting point is 00:33:43 I think the reality is somewhere in between. And that's what you're going to kind of like really, really want to get on aim for, because that means that you can scale your company, you can scale your org and you can scale your revenue in a competitive way without becoming bloated if your target is to do a billion dollars or $10 billion in revenue, right?
Starting point is 00:33:58 And that's going to give you the kind of like almost maximum value for long-term of basically saying, I don't actually have to be first in this leg of the race. I can actually totally be second if that competitor is gonna raise a shitload of money, try and scale really, really quickly, become bloated, slow their organization down, and not be able to kind of like innovate at the same level
Starting point is 00:34:15 because you've just hired way more people in general. So that's probably one of my hot takes from like, I guess, VC side of things. On tech side, I think people should just build more shit themselves. And I think that people really just don't, they just don't lean into it. And yes, it's hard. It's hard to build data centers and stuff like that. But you should do harder things.
Starting point is 00:34:34 I think the age old adage of risk is not as risky as you think also applies to building hard things. Hard things are hard, but they are not as hard as you would think. And so there's good arbitrage in just simply building hard things. That said, you can't build everything yourself from scratch. So you do have to like, I know, Tailscale
Starting point is 00:34:52 has this concept of like innovation tokens. You do have to like spend those innovation tokens. You can't just basically be like, oh, we're going to do everything, et cetera. So being deliberate about like how you're going to go and build that out is also part of like building, I would say, maximum leverage. Where do you think these like tools start to fall over? Like where does this AI stuff stop working for you? how you're going to go and build that out is also part of building, I would say, maximum leverage.
Starting point is 00:35:05 Where do you think these tools start to fall over? Like, where does this AI stuff stop working for you? And you're back to like, oh, it's just about having smart humans that are capable of doing stuff. Like, there's like a boundary box here, right? Where you're like, no, this is a whole person. Kind of from your experience, both like I looked at your product, you use a lot of like autocomplete, you use a lot of like LOM,
Starting point is 00:35:23 natural language interfaces in the product, it's beautiful. You know, in the way that you build your support stuff, your teams using I'm sure Cursor or Windsurf, like where's the like level of like, oh, this is it's really good for these things. You get here and it's just terrible and it's never going to work. And I'm kind of curious where that boundary is from your experience building a railway and using it and how that impacts the way you envision sort of like the future of compute and how that also helps you think about like,'s the future of where we really need to be? So this is interesting. As an aside, my roommates worked at OpenAI
Starting point is 00:35:52 for the past six years. And so we have probably a monthly sauna session where we'll go to the sauna. It ends up basically just trending towards AI. And I'm like, hell, where do you think this stops in general? And then we talk about theory of how this kind of all works. And my theory is that there's no limit to where it goes.
Starting point is 00:36:10 So if you think of the AGI as compression, you can get really asymptotically good compression. So in terms of me stating a thing and saying, hey, agentic whatever, go in and do this thing, you'll be able to continue to go and do that, and that will continue to get refined to one nine of accuracy, two nines of accuracy, et cetera. So bigger and bigger targets.
Starting point is 00:36:34 But as you get bigger and bigger things, you're going to get bigger and bigger loss. And you're already seeing this in general. I think YC is trying to do this thing right now where they're astroturfing windsurf, which is interesting, where they're just saying the context model doesn't whatever with cursor, but windsurf indexes the thing. So how do you get essentially that context and enough space
Starting point is 00:36:55 for it to work in general? I would say where it goes, it's almost a big lever that will continue to pay dividends in general. But the accuracy of that thing is going to be almost all of the battle. It's going to be like, how do you basically say tune the AI to say, just do this thing? Be very, very specific in terms of where you're going on this.
Starting point is 00:37:13 Because again, accuracy matters. And the more scope you give it, the more loss, essentially, you have in the system. So I don't know actually where it stops in general. I'm actually extremely bullish on it when I wasn't, I call it a few years ago. So I think that that goes there. But I think in terms of people talk about AGI,
Starting point is 00:37:32 I think you will always need input. You're not going to have this almost agentic thing that just goes and becomes like Rocco's Basilisk or whatever. You're not going to have this thing that just kind of rolls out of control because it doesn't make sense from a mental model of AI as compression, because it won't do that expansion. So I think that that's where it stops in general.
Starting point is 00:37:51 It will always be input driven. And so at a certain point, because those tokens end up being expensive, there's almost a bounding box of what's almost economically efficient. Because you have to reprompt this thing, you have to pull in the whole context window. You're essentially going to sheer tokens like, no, tomorrow, the bigger this bounding box is.
Starting point is 00:38:08 So it's all about accuracy of those things. And one thing I'll also say is I feel very bad for junior developers in general, because it's really, really hard to get a lot of that context. And you almost have this AI that you can lean on a ton and be like, build me this game. And then it's like, here's this game. And you're like, I have no idea what this is, right? And unless I know the scope of like, OK, actually, I
Starting point is 00:38:31 want you to write this very, very specific EBPF program to mux these bytes like this, right, and giving it very, very concrete prompts, it's just going to add a control, right? I've had some prompts that work reasonably well, and then I look into them and like, them like oh actually you totally forgot like this thing You're not dedupe any of these keys And we're gonna totally like destroy production like if we if this like stuff makes it in there right so you still have to be accurate
Starting point is 00:38:53 About it right so I think it just becomes a lever for 10x engineers to become 100x 1000x etc engineers Right I think that that's kind of where that goes Awesome well we have so much we could ask you, but you know, just because of respect of time, where can people find you? And also if people want to actually try a railway, they're so intrigued by all the amazing stuff you do. Like, where's the place to get started? Yeah. So you can find me on Twitter or X or whatever you want to call it. Nowadays, I'm just Jake. So J U S T and then Jake. And then whatever you want to call it nowadays. I'm just Jake, so J-U-S-T and then Jake.
Starting point is 00:39:26 And then if you want to try Railway, you can go to railway.com or you can go to dev.new and just like point us towards your GitHub repository. We've built an open source to build engines from scratch at this point. So the goal is you don't need a Docker file. You don't need anything else like that. We will go in and parse in anything that you've got
Starting point is 00:39:42 that will help us kind of like infer the deployment and then we'll go and get it up and running for you. And if it doesn't work for you in like a singular one-shot way, message me on Twitter or X and we will figure out how to like get that thing up and running automatically so that we can version that so that that helps other people in the future.
Starting point is 00:39:59 Cool. Thanks for being on Jake. We have a ton of fun. Thanks for having me. Yeah, fun as well. Yeah, this was a ton of fun. Awesome. Thank you so much.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.