Software at Scale - Software at Scale 21 - Colin Chartier: CEO, LayerCI

Starting point is 00:00:00 Welcome to Software at Scale, a podcast where we discuss the technical stories behind large software applications. I'm your host, Utsav Shah, and thank you for listening. Hey Colin, welcome to another episode of the Software at Scale podcast. Quick intro for listeners, Colin is the CEO and co-founder of LayerCI, which is a really unique and interesting CI company. And I'm not going to try to butcher the details so much. So maybe you can talk a little bit about it and welcome to the show. Yeah, thanks for having me. I mean, yeah, LayerCI is a different CI company. It's primarily focused on people making websites. And it doesn't really focus so much on running tests as getting more subjective feedback. So

Starting point is 00:00:57 the experience that led to making LayerCI is you're reviewing and other developers change, and they edit a CSS file. And you want to know what the ramifications of that are, because no unit tests will fail, obviously. And I guess that eventually led to LayerCI. So at a very high level, that's what we do. I've seen companies like Facebook and maybe GitHub, they provide functionality like this similar to their internal developers on pull requests. And you can maybe, you probably know more about this than I do. Have you seen like bigger companies is generally do this? Yeah. I mean, there's there's a concept of like a PR box,

Starting point is 00:01:31 which is like, usually it's in your cloud environment. So if you're doing like AWS production environments, you'd set up like a slack bot, you know, you'd get like 100 engineer hours. And you'd make a bot that you could send a Slack message and it would create a production environment. So that's not really what we do. I guess there's a bunch of downsides to that. One is it takes a long time to provision an environment. Two is it's expensive.

Starting point is 00:01:58 You need to micromanage when to turn them on and off. If each of those environments is 1% of production and you have 100 of them, then you're doubling your production cost. So we're more of an approximation. We primarily focus on just running front-end, back-end database, something along those lines. So a reviewer can see at the application level how things are going, but you don't necessarily have all of your lambdas, all of your data stores, everything. It's all just basically the MVP of what you need to review. Okay.

Starting point is 00:02:28 That makes sense to me. So how do you make it more efficient? Do you just run, you run less stuff, that's the first thing. But I would imagine that it's still pretty costly to spin stuff up on every commit. So what are some interesting tricks that you do? Yeah, well, I mean, the big idea

Starting point is 00:02:47 for layer CI is that, like, when you set up some environments, you're doing a lot of repetitive work, it's always, you know, set up a database, put some fake data in the database, you know, run the database migrations, start some microservices, you know, like start your GraphQL provider, like all of these minutiae that you always have to do on every pull request. And like, it's not really different per pull request, like you don't really change your infrastructure configuration very often. So the idea for layer CI is you do all of that, it kind of automatically gets snapshoted. So we take a memory snapshot, as if we were hibernating the machine doing all of

Starting point is 00:03:25 the setup. And then we just make a bunch of copies of it. So the next time you make a change, instead of running all of the setup, again, it's like if you don't edit the files that do the environment setup, it just loads the snapshot. So it's like five seconds to get a new copy of everything for your new change. This reminds me of Android Zygots. I don't know if you're familiar with that. My brother actually worked on that team. Oh, interesting. Yeah, for listeners,

Starting point is 00:03:51 since spinning up a new JVM is really slow for Android, what the Android OS does is it keeps a hot JVM plus some stuff running in memory. And this is super outdated knowledge of mine from maybe seven, eight years ago of Android development. But basically, the Android OS just clones the Zygote process so you don't have to spin up a JVM. You basically get a warm JVM for free for every app and that's how it keeps things snappy. That's kind of a similar idea, right? Yeah. I mean, we also get a lot of other advantages. So I guess we can maybe talk

Starting point is 00:04:29 about architecture later. But since we're focused specifically on developer environments, you know, we're not we're not promising reliability, we're not promising, like your customers will have a good experience if they use these environments. That means we can do things like really aggressive disk caching. We can basically make all of the I.O. act as fast as it was like a RAM disk, and we can have really good internet connections and caching for everything. We basically made our own AWS clone that is specifically tuned to developer production needs. Interesting.

Starting point is 00:05:06 So the concept of caching these intermediate layers sounds remarkably similar to Docker, right? So can you maybe talk about the difference? Like one thing that stands out to me is that Docker is not doing any of this stuff in memory. And plus I'm sure they're not trying to be super efficient because it has to be more correct because you use it in production, But what else is there? Yeah. Yeah. So I guess the original idea for

Starting point is 00:05:30 Layer was based on, I mean, my experience before Layer was at another tech company. And we used Docker CI. So we used GitLab CI, the Docker base images or whatever. And that experience is okay, but it's just annoying in very subtle ways. So like you need a base image because, you know, Docker, like you have your configuration file for the stuff that needs to be in your CI. And you also have your configuration for what the pipeline actually is. But they duplicate a lot of the same things. It's like if you add a step in your CI pipeline that needs a library, you'd edit your base image to add the library, rebuild your base image, push it. There's basically duplicate work between the two places.

Starting point is 00:06:13 So that's one thing layer files solve. And yeah, the memory snapshotting is also really annoying because Docker can't keep processes running between build steps. So you can have all of the files you need, but files are a small part of a CI environment. You often want a microservice running or a web server or if you want to use something like Bazel, these build agents that keep running in the background, you can't have a Docker file that's like start Bazel and then using the running Bazel like in the next directive down, using the running Bazel builds something because Bazel would shut down between the build layers in Docker. So I guess we just took the idea of using Docker and CI and then

Starting point is 00:06:51 cut out the extra configuration. We just extended Docker files with what you'd usually use in CI, like if statements and stuff like that. And we just made them automatically be detected. So the same way way you have Docker files that are built every time you push,

Starting point is 00:07:07 it's just like you push to CI, all the layer files run. There's no extra configuration needed for like the meta, how to glue things together, which services run in which order. So that makes me think that, you know, base images are basically just like a performance hack for Docker. That's what you're

Starting point is 00:07:25 implying in a sense. I mean, base images for the CI use case are a performance hack. Yes. Yeah. But I mean, that's what lots of people end up doing, right? They make their own Travis clone by making a base image that has Ruby and Go and Python and all of the production versions of everything installed in it. And then they do all of their pipelines based on that because they don't want to reinstall Python all the time. But that's no better than Travis, which is the way CI was 10 years ago.

Starting point is 00:07:54 So there's not a huge amount of value that Docker brings at that point. Yeah, I can personally say that I can feel this pain. One of my first tasks at my new company was trying to upgrade the version of Mongo that's on our CI base image. Cause it differs from the version of Mongo we use in production and, and move coming from a world where we use like Basel to basically like

Starting point is 00:08:17 hermetically build everything. This felt like a step in the opposite direction, but it also felt like since we don't upgrade Mongo every day, it makes things more efficient. But what I'm thinking now is that layer is fast because it basically takes the state of your RAM or your memory and serializes it on disk or it just keeps it around somewhere.

Starting point is 00:08:40 Is that accurate? Yeah, so I mean, another way of describing layer as a platform is it's just a snapshots platform. For example, you run a pipeline on Friday. You've done stuff. It fails at 4 p.m. You're like, well, I have to do this meeting, and I'm not going to have time to look at this today. Monday rolls around. You look at your pipeline. There's some nebulous error. How do you find out from the logs what failed? You can rerun the pipeline, wait another five minutes or whatever. It's kind of annoying. Or if you have the memory snapshot around, you can just wake up the memory snapshot and shell in. The fact that we keep all of these

Starting point is 00:09:17 memory snapshots around means if any pipeline fails and the memory snapshot hasn't been deleted yet, you can just shell directly into it. If you want to view the web server in a pipeline, it's like you visit the web server, we show you a spinner while we wake up the memory snapshot, and then we forward your requests to it. So you get these free ephemeral environments. If you want multiple services running in parallel, you just load two memory snapshots, one for the front end, one for the back end. And then because we tie the memory snapshots to which files were front end, one for the back end. And then because we tie the memory snapshots to which files were read to get to that point, you don't really have to micromanage, oh, copy package lock.json first so that the cache doesn't get invalidated. You just copy everything,

Starting point is 00:09:58 you do some actions, those actions cause files to be read. And then we'll map that back to which snapshot we can load that's consistent with the files in your new diff. So all of these things are relatively annoying to do in Docker, but they're all kind of magically done if you make a CI provider that specifically cares about these snapshots. What have you seen? Have you seen any customers' jaw drops?

Starting point is 00:10:21 What have you seen as an interesting, when you're doing a demo or some interesting customer feedback that has been like this has changed my life like maybe if you can just share some stories around that sure um i mean we we just talked to a customer we have a i mean i think the video will be published soon uh we interviewed them and they said that in their inno days like their innovation hackathons that, you know, like I think this is a common policy, but a lot of companies, like once a month,

Starting point is 00:10:49 like all of the engineers get together on some Friday and they they make some cool things and they like see what kind of like wacky ideas they can launch. And then they, they did this. And then it was the first month or the first Inno day after they'd installed their CI and everyone could just demo live what they built because you have these links for each environment instead of fighting for the three staging servers or asking Infra to provision more specifically for InnoDay or whatever. It's like you just push your code, the environment exists, you didn't need to configure creating the environment, and you can just share the link with your... You give the demo

Starting point is 00:11:24 on Zoom and then you post the link in the Zoom chat and all of your coworkers can just share the link with your, you know, like you give the demo on Zoom and then you post the link in the Zoom chat and all of your coworkers can just play around with things. So, like, just the ability to have these snapshots around and demo things per branch is really useful. What is the memory requirements for saving one of these snapshots? I have no idea how much this ends up being in terms of file size. I guess there's some magic going on in the back end. But memory for most applications goes from one to 100 gigabytes. So if you're shuffling around 100-gigabyte files, it kind of lends itself poorly to using an existing cloud provider. So we have some

Starting point is 00:12:05 unique architecture on the back end to deal with that. And we do a lot of copy and write stuff, if you know what that is. It's like avoiding making entirely new copies of things. Because if you have a chain of snapshots, you can deduplicate only the parts that have changed between them, which is how Docker works. Okay. Yeah. And that brings me to your, your architecture, right? So you don't use a cloud provider and we were just talking about the fact that you use Kubernetes on bare metal. So like, first of all, how is, how's the experience of that? How did you decide that that's what you need to do?

Starting point is 00:12:37 It sounds like you were just like efficiency bound and like that, that that's what your constraint was, but maybe you can talk a little bit more about your setup. Yeah. So I mean, I guess by nature of having these snapshots, we're really limited in what we can do. We need to run our own hypervisor, which means we need access to KVM. We need access to the kernel hypervisor stuff. And so if you're running an EC2 instance that's VM-based, every level of nested virtualization is 20% slower. So we're going to get huge performance loss if we use spot instances to run our worker

Starting point is 00:13:14 nodes. And also, it's going to be really expensive because these worker nodes have hundreds of gigabytes of memory to be able to run all of these VMs. And I mean, we're at the point that each production node has a terabyte of memory. So like these AWS bare metal instances that we'd be setting up would be both expensive and difficult to maintain and not really significantly better than doing it ourselves anyways. So production for us just looks like a bunch of like bare metal servers in OVH, which is like a French server provider. We're in Canada, so we're familiar with it, I suppose. And over that, we have a Kubernetes cluster

Starting point is 00:13:54 where each service is sharded kind of onto different nodes. And individual customers are also sharded onto groups of nodes so that we don't have to copy these gigabyte memory snapshots all over the place. They're just on the node that the customer last ran their stuff on. Interesting. And I'm guessing that also provides like isolation because if you're like a CI company, one customer can easily overload like, or without any checks in place, like one customer can easily like use up a lot of capacity that's meant for everyone else. There's C groups and all of that. It's even more interesting because we run our hypervisor within containers. So we expose KVM into the container and then the hypervisor interacts with KVM but still is limited by C groups, which are like the Linux way of like stopping certain processes from using

Starting point is 00:14:47 too many resources. So it's not actually very difficult for us to limit users interacting with or like overloading the node or whatever, because it's exactly the same like Kubernetes configuration you'd use in a cloud provider without doing hypervisor stuff. So, but how about the case where a customer just has like 100 commits or like 1,000 commits coming in? That might not overload the node itself,

Starting point is 00:15:10 but it might just use up capacity of your cluster. Or does that just generally not happen? So we have rate limiting. We only promise, I mean only, but we promise 12 parallel VMs per seat. So a company with 10 engineers can at most make 120 things. That should be more than enough for most people. Yeah.

Starting point is 00:15:33 And I mean, like nobody really hits the cap legitimately, but sometimes if they have a poorly configured Dependabot or whatever. Sounds like something that might've happened. Surprisingly often Dependabot is not very good at rate limiting. Okay. That brings me to a blog that you've written super recently that was doing really well on r slash programming, at least. The rise of crypto mining and how that's annoying CI providers.

Starting point is 00:16:01 So maybe you can talk a little bit about that. Sure. I mean, to summarize the blog post, crypto has gone up like 10x in value in a year, like the market cap of all cryptos. A lot of top 10 cryptos, two or three have a system called proof of work, which basically means like you can burn CPU time to make money. It's not very much money. You can spend $100 of AWS credits to get $10 of money. But if you can somehow find free tiers available in the wild and you have nothing better to do, then you can just make a full-time job of attacking these things. So AWS has the free tier, but they are

Starting point is 00:16:45 like really restricting. I don't know if you've tried to set up a free tier AWS account any time recently, but it's really difficult now. You need a phone number, you need a credit card, you need a two-factor authentication. To get any free credits, you need to have an incubator partner. And the same thing's happening in CI. So CircleCI is being attacked. We were being attacked. Shippable was being attacked. And GitLab and Shippable have both worsened their free tiers in the past year

Starting point is 00:17:14 because of the nonstop attacking. And because it's profitable for people to just make a full-time job of attacking, and attackers have an advantage in this sort of thing, it's really difficult to defend because even if you have a full time team of defenders, the attackers will just keep making new accounts. When we were being attacked, it was really bad for a couple of weeks. And we banned the entire country of Indonesia because that's where a lot of the attacks are coming from for some reason. And then the second we banned the country, there was like 15 new IPs on corporate networks in the

Starting point is 00:17:49 US. So they had the resources to pay like a dollar a month for the IPs. And they're probably paying crypto for these IPs. So it's not traceable regardless. And there's not really much you can do defense-wise, except for just huge broad stripes like banning countries and banning... We use Cloudflare, so we use Cloudflare's VPN detection, and we just made Cloudflare ban all VPNs.

Starting point is 00:18:17 So some developers might not be able to access us with a VPN, but there's not really much we can do about that. But these people will be attacking mostly through the free tier. They'll be automating creation of accounts and then just running stuff. Is that how it works?

Starting point is 00:18:32 Yeah. So, I mean, they basically use the free tier as a command and control script. So they'll set up a fake desktop. They'll set up something that connects back to their connection. They can bounce it through any IP they want. They can do the same thing in GitHub's IP range so that you can't block list their IPs.

Starting point is 00:18:56 You can't block list any AWS IPs or your customers won't be able to connect to AWS. And so they can just rent a free UPS in AWS and they'll command and control out of that. And then they'll, you know, they can run a browser in there and sign up for other CI services with your CI services IP, like they can use it as a proxy, they can mine crypto. It's like, you know, you get arbitrary code execution by nature of the able to run things. Yeah. When we were running CI, it was just one of the biggest fears that that's how people can use.

Starting point is 00:19:32 We were worried of a supply chain attack, just like SolarWinds, because it's literally remote code execution as a service. And we worried a lot about all of that. But ultimately, there's only so much you could do. We tried to just have audit trails and just prevent people from ever deleting branches and stuff just so that you can at least see if there's somebody attacking you, they can't just hide everything.

Starting point is 00:19:56 But blocking everybody completely was considered really hard. So at least let's just have audit trails and stuff. What is the legality of all of this? Are there vectors of where you can say that this is illegal or that doesn't matter, like regulation hasn't caught up or regulation has caught up, disenforcement is really hard? Yeah, I mean, it's certainly illegal in the States. There's unauthorized use of a computer system, obviously. It's against our terms of service. It's against GitLab's terms of service.

Starting point is 00:20:28 But are they going to send FBI agents to Vietnam or whatever to catch these people in internet cafes paying with Bitcoin? It took five years to bring down Silk Road, and that was in the US. So if people are using Tor and are using crypto to pay for things, it's both difficult to find them, and it's difficult to even persecute. That makes sense to me. And it's just, it's also like, unless a bunch of companies come together to try to drive enforcement or something, I can just think of like, you know, Epic itself can't sue Apple for a monopoly, but like five companies like Epic, Spotify, all of these different companies together that might stand a chance, but it might be maybe something similar where one company alone probably can't

Starting point is 00:21:15 do much. What do you think the future of this is? These attacks keep happening. Value of crypto just goes down. Maybe that doesn't seem like that's happening anytime soon um i mean i think the the only long-term solution is to like restrict proof of work so like if the if the people making the cryptos choose different sorts of difficulty metrics like uh i mentioned in the blog post but ethereum is moving to something called proof of stake which you don't get more ethereum for hacking computers at that point, or unauthorized.

Starting point is 00:21:48 You're just burning CPU cycles. It doesn't give you more Ethereum once they switch to proof of stake. So if the popular cryptos make it not profitable to just burn CPU time, then it'll become unprofitable to just make a full-time job of attacking free tiers. And if that doesn't happen, then basically free tiers will just go away because there are forums on the other side where they're trading information about how to circumvent security measures, IPs you can use. I read some of the comments to my article, and people were saying, oh, why don't you just block the IPs?

Starting point is 00:22:27 Why don't you just block the connection to the pools? And it's like, people are essentially using corporate networks as IPs to connect to LayerCI already. So if they're using random IPs from AWS to connect to us, they can just make a tunnel from there. There's just so little you can do. And they're using Selenium to run crypto miners, so you can't even do executable analysis. It's just very, very difficult as a blue hat to deal with this.

Starting point is 00:23:01 Yeah, the Ethereum migration to proof of stake, it's going to probably take a year plus. And I don't even know if the others like Dogecoin, which is just so popular suddenly, maybe because of Elon Musk tweeting about it all the time. I don't even know if they have a plan to move to proof of stake. But yeah, I think that would also be better for the environment if we finally moved away from proof of work. But even Bitcoin, a lot of the people that were attacking us were mining these

Starting point is 00:23:31 random little small caps. They were mining sugar chain and they're mining like Monero. Sugar chain, yeah. I think Monero is really the only one that's in the top ten that is ever really used for this. And that's because it's specifically designed to be profitable to mine with CPUs. There's big memory requirements so that you can't use graphics cards or integrated circuits for it. So I think Monero has a big ethical problem with their design.

Starting point is 00:24:02 And it's also hard to track down Monero users versus Bitcoin, even though Bitcoin is like Skudo anonymous. Yeah. I mean, I don't think tracking down is the problem, as I mentioned. If you track down one of these people, there's thousands of people that have the same capabilities that have the same lack of scruples. If $200 a month is enough money for you to make a full-time job of it, it's like there's a lot of people in the world that $200 a month

Starting point is 00:24:29 is worth doing something full-time for. Do you think it's individuals that are attacking or is it like consortium? Do you know or do you have like a gut feeling about that? So when we got attacked, it was all at once. Like, you know, we were only founded a couple years ago. feeling about that? So when we got attacked, it was all at once. We were only founded a couple of years ago, so as we were growing, we didn't have to have all these security protections in place. We didn't have to have the heuristics for whether a job was good or not.

Starting point is 00:24:55 We were happy if people were coming in and trying our product. And then basically all at once, 10 different individuals were trying to mine Bitcoin on our service. So I'm pretty sure what happened is we got added on a list in some Onion forum somewhere about past companies that you can platform as a service companies that offer a free tier that you can attack to make money on. So I'm sure they're individuals, but acting sort of as teams

Starting point is 00:25:23 where they share resources and have their own like, you know, like the GitHub list for developers that are like free things that are great for developers. Like there's a list somewhere that's like free things that are great for mining. Yeah. It's like an easy list for like ad blockers, but just the opposite. Philosophically, you don't have to answer this or go too much in detail but like what's the impact of like having a free tier um how often do you see customers i'm sure now you do like what do you think the impact would be to your company if like you remove the free tier and if not your company even like to circle ci like what's your intuition on that i mean in 2021 you essentially can't have a sas company without a free tier.

Starting point is 00:26:10 There's very, very few companies that someone will try if it requires a credit card. AWS can get away with it because they're big and they're ubiquitous. And it's like, well, I understand why I need to put in a credit card to use AWS's free tier. But just the identity problem is essentially unsolvable for most companies. Would you try Calendly if you had to pay for it? I probably wouldn't pay a dollar a month for Calendly in the early days. And so you always choose early on in your usage of a tool, when it's just starting to be important to you, you'll always choose the free one because you're not getting enough value from it yet to pay for it.

Starting point is 00:26:44 And then as you get enough value for it, then you upgrade to a paying customer. So basically all SaaS companies need a free tier or something that people can evaluate. And if you don't have that, then you just lose to the people that do have a free tier. That makes sense. It's just a competitive thing because there will always be someone who will be willing to burn VC money to give people free tiers and get customers. Or someone that's as big as AWS that can have a full-time team of... AWS has teams of 10 or 20 engineers working on crypto detection and crypto prevention. It's like 20 engineers cost millions of dollars a year.

Starting point is 00:27:24 So small companies can't afford that sort of resource. So in the end, the only people with free tours would be the Heroku's and the AWS's because they're the only ones that can afford, you know, the huge public company spending million dollars a year is a blip to them. What about like no charge credit cards? Like you have to put your credit card in, but you don't get charged. Is that like still a super high blocker to people? When was the last time you did that? I probably wouldn't give my credit card to

Starting point is 00:27:54 some random company just to do their trial. Gym memberships have burned people from this. You sign up for your 30-day gym membership and you need to mail them something for them not to charge you. A lot of companies will do that. So it's difficult. Yeah. I've seen companies trying to roll out no charge credit cards, but that's only companies that previously had credit card trials. So clearly like the metrics would help there, but I've never seen it go from, I've never seen a company just start off with like a credit card zero dollar trial I think it would be an interesting experiment that maybe somebody has an idea of

Starting point is 00:28:32 I mean another funny thing is that uh like there's all of these companies that let you make virtual credit cards now right like yeah these on MasterCard have these APIs that you can use to make virtual credit cards and you'd say like well, why don't I just use one of those to make a virtual credit card to sign up for LayerCI? And then they can't, like, I'll use that for the free tier. But it's like, you can detect those, like Visa tells you whether something's a prepaid credit card or an actual credit card. And obviously, we'd have to block them. Because if someone stole someone's credit card, they can make 100 virtual credit cards on top of it, and then make 100 accounts. And then, you know then same problem. So there's basically no world in which anything but a credit card or a phone number can be used for authentication. And even then, it has to be a

Starting point is 00:29:17 plus one phone number because there's VIP providers that give you these random international phone numbers and you can get thousands of them for a dollar. And if you can get thousands of phone numbers, then it's not a good rate-limiting resource anymore. Here's a thought experiment, and you can maybe shoot me down, because I just thought of it. What would you think of a government-provided, like Auth as a Service, where you can basically check

Starting point is 00:29:41 with a government API, like a US government API, that this is a legit person or not and provide like a free tier to them without a credit card or anything would you be in like support of an idea like that if suppose if like lawmakers created a bill tomorrow and send like a request for comments

Starting point is 00:29:58 would something like that make sense I mean I think this is more broadly tied into the problem of identity I'm in Canada but like the US and Canada have the problem of like Equifax so you have some identity number some nine digit identity

Starting point is 00:30:15 number and if people have that number they can claim to be you and there's no other way for you to authenticate yourself so like if you're buying something online and you fill in the information and it gets breached for any reason, like there was a famous thing where it was like British Airlines had a credit card skimmer attached to their JavaScript on their payment page. And so

Starting point is 00:30:37 like you'd pay for your flight in British Airlines and they'd steal all of your personal information, the three-digit code on your credit card, all of your birthdate, all of your information because you have it for your boarding pass. And then they just bought a bunch of stuff with it. So the problem is that it's symmetric. It's like a password stored in plain text. Countries like Estonia have something that is like a digital ID card. And it almost acts like a YubiKey. So you can tap it as an NFC device, and it proves that you are the owner of the card. So I think that's something that most countries are going to need in the long term. Because again, it's profitable to just breach people's identities and trade these huge lists of things and buy

Starting point is 00:31:22 free tiers and mine Bitcoin with the free tiers or whatever. So unless crypto gets worse, there's going to be a big identity problem coming up. That makes sense to me. So then let's just take a step back. And we spoke about the initial motivation for LayerCI where you were annoyed at Docker. But can we talk a little bit more about that story? So it's one thing to be annoyed at Docker, But can we talk a little bit more about that story? So it's one thing to be annoyed at like Docker, but what made you decide that this is something that you want to build and, you know, start a company and go through IC and everything?

Starting point is 00:31:55 Yeah. I mean, I guess my personal motivations for doing things is I just like building stuff. I think a lot of humanities problems will be solved by the people building the automation. I mean, you hear a lot of people claiming, oh, when the truckers are automated, there'll be this huge work problem, blah, blah, blah. Automation is bad. Society's not ready for it. But for hundreds of years, it's been like, oh, the people in the textile mills are going to

Starting point is 00:32:23 be replaced by machines and they're going to lose all their jobs. So I think automation is really the only way to get out of all of these problems that we're facing. The climate problem, adding more resources to people automating cars and automating shipping and automating politics so that these things can happen. Automating identity. All of these are important problems that need to be solved in the next 50 years. Housing. Toronto has a big housing problem. It's like if people make factories to build skyscrapers, which is technology that exists in China, for example, that would solve a lot of housing because you could just make prefabricated buildings. But people aren't building these things because there's not

Starting point is 00:33:02 the infrastructure for it. People are driving coast to coast because in the 60s, the US government prioritized infrastructure for cars. But no one's really prioritized the infrastructure for the internet. With maybe the exception of Starlink lately. So I think as someone building developer tools, we like building the tools that will move humanity forward because it's like the developers and the people building automation and like making supercharging them so that they can build things faster that's actually going to affect the world that's a fascinating and like wonderful response to that um when people talk about the future and developers often like people bring up no-code tools.

Starting point is 00:33:47 And I'm sure you've seen things like Retool where you don't have to write that much code or it's low-code. You write a little bit of code and mostly everything gets solved for you. Does LayerCI ever have a vision of helping test those things? Because I can see a symmetry there.

Starting point is 00:34:04 You build a low-code tool, but it's really hard to test. Maybe layer can help you with that. Does that make sense at all? Could layer help with low-code someday? So, I mean, I think philosophically, low-code was never really intended to replace programming. It's like as more things, you know, as the

Starting point is 00:34:26 internet eats the world or programming eats the world, more and more things involve programming and the programming gets more and more complex as like the baseline of things gets built. So you know, you can make like these ridiculously complicated video games with one person studios now, whereas like Quake took, you know, I mean, took John Carmack and all of the resources of id Software. Assuming I'm remembering the studio correctly. But now you can make things because of all of this automation. And so no code basically backfills that. It's like the things that were hard to make 10 years ago are now being made easy to make

Starting point is 00:35:10 with things like Zapier and Retool. But it's just opening up higher up, like the programming challenges that used to be impossible. So self-driving requires lots of programming now. But 10 or 15 years ago, it was essentially impossible. So I think Retool and Zapier aren't even necessarily competing with programming. They're just building a new world. Also things like Webflow,

Starting point is 00:35:39 these tools that facilitate things that were traditionally programmer tasks. You need someone doing the programming to make it an industry. And then once the industry is big enough that it supports a no-code solution, then it can be backfilled by no-code solutions. But unless there was the WordPresses originally, then Webflow wouldn't exist because WordPress needed people to know what a website was to exist. So we're not ever going to focus on no code. We're going to focus on the bleeding edge, people that are developing and making hard things.

Starting point is 00:36:11 And I mean, that starts with web apps because that's where a lot of the innovation is going on right now with things like Vanta, for example. But in the long term, it doesn't necessarily mean web apps because I'm sure a lot of the things that people are currently doing in web apps will get automated. Next.js is less and less code. And then as Next.js becomes more and more standardized and Auth0 and all of this identity stuff becomes more standardized, then you can build whole websites without needing programming. So we'll keep chasing the people building novel stuff. Mm-hmm. I think soon enough, there's going to be a day where there's going to be a billion dollar business running on something like Replit this is just like my prediction I don't know if it's going to be in like five years or 50 years but there will be that one person in their basement

Starting point is 00:36:56 or whatever and they have like a billion dollar business just running on Replit that would be like an interesting world to live in I think but. But there are already businesses you can make without programming, and you can't do it alone. So I think the stereotype of just one person is probably not legit. But I do agree that there's probably big businesses that can be made with no programming. And I think a lot of businesses hire programmers too early for the repetitive tasks.

Starting point is 00:37:24 Why hire a programmer to make a website when you can just use Webflow? Is your custom-coded website really going to bring value to your developers? So I guess maybe tangent, but I was thinking of for this course I'm making, the DevOps Academy, there's like a system design component. And I was thinking, like, what are the components types? And like, what do people use for those component types? So there's like databases, there's like, you know, MongoDB versus Postgres. And there's like all of these traditional system design components. But then I realized that like, no code is now a system design component. instead of choosing like what front-end technology

Starting point is 00:38:07 am i going to use oftentimes it's like what website builder am i going to use and am i going to use like cloudflare pages or am i going to use uh like github pages am i going to use so it's like for early startups it doesn't make sense to build these websites and host them yourself anymore. You choose that as a system design component. So I think engineers should warm up to the fact that no-code or low-code tools are part of system design. The same way that you wouldn't code a database from scratch anymore, even though you would have in the 80s. It's like you just take that as a no-code. Postgres is sort of low-code in the same way that Webflow is.

Starting point is 00:38:48 So like I think more and more components will become low-code. And developers should just like use those as tools for building these like interesting novel things. That makes sense to me. And that brings me like to a bunch of new questions. Like one thing is that I think about, and do you worry about this?

Starting point is 00:39:03 So Postgres and Python, as you said, they were the original local tools so that you don't have to write that much code on your own. But they're open source and they're kind of community driven. With Cloudflare Pages and Webflow and everything, it's all individual companies building out these new components. Does that worry you or is that just a fact of life and the way things are

Starting point is 00:39:25 going to be like moving forward? I mean, I think usually the way it goes is there's like a closed source offering that kind of bootstraps things, and then they're disrupted by open source and open standards. And then it kind of settles into like an ecosystem of closed source and open source building over open standards. So you know, So there was Mosaic in the early days, then there was Netscape, and then there was Firefox, which was open source. And Firefox and its contributors really moved the internet forward. And then there was Chrome that had to be open source to compete with Firefox, and then that moved the standards forward. And then there's closed source stuff that was built on top of that, like Internet Explorer, and then Internet Explorer

Starting point is 00:40:02 turned into Edge, and then Edge now uses open source stuff for their rendering because they had to backfill. So I think Webflow is a great idea, and Figma is a great idea, but if you can make versions of these that are self-hostable, like a self-hosted Figma is probably a company that will exist 10 years from now. And people will switch to it because they'll want the ability to like, or they'll want the certainty not to get switched off of. And we saw that in our batch even.

Starting point is 00:40:32 We had someone making an open source version of Intercom. So it's like, again, like about 10 years after the closed source version, the open source version becomes possible to make a business of. And we saw an open source Firebase to plug them in Supabase and PaperCups. If you want to set those up in your developer, you're probably more likely to go with the open source versions that are hosted somewhere

Starting point is 00:40:58 because you want the certainty that if something goes wrong or if you need to scale it or add features, you can use the source code. I, I mean, I think it's just natural that closed source, like, leads the way. And then they'll have to embrace open source on the back of that, or they'll be disrupted by someone that does exactly what they do, but open source. That makes sense to me. And it's like the free market, right? You make the closed source version in order to, like, capture the market and gain money. And the open source version is another part of the free market where there's a competitor

Starting point is 00:41:26 that has this one feature, which is a really important one, which is being open. It's like GitLab versus GitLab. Android versus iOS. It keeps happening that someone makes something closed source and then five or 10 years later, someone makes the open source version

Starting point is 00:41:40 and eats their market share. They can both coexist. iOS and Android coexist on different merits. Yeah, and now let's talk about your DevOps course. Like, I don't know too much about it. So maybe you can just tell listeners, like, what are you building and why? Sure. So the course we're building is called DevOps Academy.

Starting point is 00:42:03 We're releasing it with a partner. And the idea for the course is like DevOps, like most of the DevOps stuff you read is by like solutions architects. It's like, you know, someone at AWS is trying to like teach you DevOps concepts within the lens of AWS. Or someone at like some consultancy

Starting point is 00:42:21 that does DevOps for you is teaching you AWS in the context of buying their services. So like, it's very exclusive right now. Like you hear DevOps, and you think of like, consultants being paid $300 an hour to set up like Oracle database for you. And you think of like Java and like unit tests and Jenkins and kind of like the old world of DevOps. But DevOps is not actually a scary buzzword and it's like reasonably accessible even to startups. And startups that adopt DevOps practices

Starting point is 00:42:51 do go much faster. Even if they don't know that they're doing it, it's maybe intuitive in the modern world. So the idea for DevOps Academy is teach things from like a startup perspective, like make these processes, but don't over-engineer them and realize that they're an investment. So I didn't set up CI when I was making Layer of CI initially

Starting point is 00:43:15 because why would you set up CI before you have any customers? CI is a tool to reduce churn, and it's a tool for collaboration. So if you're one developer that's been working six months on something, you don't need to write tests and you don't need CI. So that's not something that like a solutions architect would usually tell you, but you know, like a, from a startup lens and from like a DevOps company lens, that's something we can, we can talk about.

Starting point is 00:43:38 So like what are the components of like code review automation and when roughly should you set them up like when is it a reasonable time to invest in them that makes total sense to me and i will be so interested to read this course once it comes out because not not enough people talk about like there's a lot of podcasts and everything that talked about like a ceo's experience of like building a company and like scaling and all that but like the actual technical details that's one of the reasons why i started this podcast just that i can get those kind of conversations going so i'll be really interested to read that once it's out what is something as part of like you know researching and like developing this course

Starting point is 00:44:19 that you found like was interesting like maybe you can give us like a sneak peek so the ci one was interesting like what else have you found you know when Maybe you can give us a sneak peek. So the CI one was interesting. What else have you found? When should you do code review? Yeah. So we've been researching by talking to our customers. One of the benefits of being a CI company is we talk to customers that are prioritizing developer tooling and setting up things. One of the surprising things was nobody really has a clear picture of the space. So you have engineers that have worked at previous companies. They know how things are done at those previous companies,

Starting point is 00:44:53 and nobody really compares. So it's like if at your previous company you used Lambdas, at your new company you'll use Lambdas. You don't really understand the pros or cons of it. If at your previous company you used Docker, at your new company you'll use Docker. It Docker. You're just used to that technology. And it basically just boils down at a lot of companies to who's the first technical hire and what have they used in the past? But obviously, there's a whole world of pros and cons

Starting point is 00:45:18 out there. And it's important to have someone on the team that knows the scope of things. We've taught a lot of our customers with linting, which I thought was something that was ubiquitous. Don't have code reviews that involve comments about semicolons. Seems like a reasonable thing in a team of any size because it's so easy to set up ESLint or GoLint for our case. But people don't know it exists. So they do their code reviews. Their developers are bogged down for a day doing repetitive

Starting point is 00:45:52 code reviews where half of the comments are, like, needs whitespace, needs semicolon. And you don't realize that workflow is broken if you've never had an automated version of the workflow in another company. That makes sense to me. And I think it's also like,

Starting point is 00:46:10 it's hard to inject some linters into like people's workflows, right? People don't, it's hard to, it's hard to like set up like pre-commit checks unless you have like a setup script that runs for everyone initially. So it's very hard to like backfill linters if you already have like a lot of that runs for everyone initially. So it's very hard to backfill linters if you already have a lot of developers not used to that. So that makes sense to me. Yeah, we talk about that in the academy.

Starting point is 00:46:33 But you can actually set up workflows that automatically lint and reformat things. So developer pushes a linted version of their branch will be automatically created from the CI. How do you do that? Is that just through GitHub Actions or something? Yeah, I mean, if you're using GitHub Actions as your CI, then you can just get checkout in GitHub Actions using a deployment key.

Starting point is 00:46:53 It seems obvious on hindsight that that's something you could do. And then instead of opening your merge request from the base branch, you open it from the linted branch. And then when the person merges it, it's all linted. You don't get any comments about whitespace. Interesting. But unless you hear that, you don't really think about how annoying the workflow of linting, repushing, rerunning all of your tests is. I think it's getting more and more obvious

Starting point is 00:47:15 that I need to read this book or this course as soon as it's out. Maybe we can talk a little bit about LayerCI, the company, which is, how big is the company now? How many employees do you have? We have six full time. Okay.

Starting point is 00:47:31 And any engineers? So what was your process like hiring the first engineer? Like, did you hire just someone you know? Or what was your framework for deciding this is who the first engineer should be? The first employee engineer in a sense? Yeah. So I mean, I guess as a CI company, well, my framework for hiring was I wanted people with good fundamentals that were willing to learn quickly. I guess my previous experience was if you hire people that check all the boxes, they often actually do worse than people

Starting point is 00:48:04 that don't check all the boxes, but are actually do worse than people that don't check all the boxes, but are hungry to get equity and hungry to learn. Because if people check all the boxes, they're basically acting as a consultant. If you're hiring them because they check boxes, then they'll do what you hired them to do, and then they'll stagnate most of the time, at least. Because if they're interviewing for places based on their technology stack or whatever then that's what they're comfortable with and if you ask them to go outside of that then it's just like you know that's not really the way startups hire so we our hiring was like uh in the job posting i didn't mention what

Starting point is 00:48:39 programming languages we used it was like we we want someone that knows kind of operating system fundamentals, because that's required for for editing things. We want someone that has like, an intermediate level of experience. You know, you don't want to hire a junior developer that needs to be told exactly what to do for your first hire. So we interviewed for that. And the interview itself is just questions that I've had to solve while building LayerCI. Like building the MVP of LayerCI, there was some like graph theory, there was some operating system stuff, there was some math. And you just take the questions you had to solve while building your product. And then you find people that could also solve those problems.

Starting point is 00:49:24 And if they could solve the existing problems, it's a pretty good indicator that they'll be able to solve the upcoming problems as well. That makes sense to me. At what point did you know that you had to hire someone? Maybe the answer is obvious, but I'd just like to hear it. Yeah. I mean, it became obvious once we started getting people liking our product. Before you have product

Starting point is 00:49:41 market fit, or before you have people that actually like your product, it doesn't really help much to hire engineers. Mythical Man month and all of that, if you have one engineer working on something and you increase it to three engineers, it'll be maybe 20% faster. So it's a very questionable value to hire people in the early days before users actually enjoy your product.

Starting point is 00:50:03 But then in about September, right after we finished Y Combinator, we noticed that people were actually starting to use our product consistently. They were starting to... Usage had increased 10x in four months. And so people were really starting to push lots of commits and we're starting to find bugs with the product and we're complaining when it went down. That's a good indicator that people like your product. And that's when we started kind of making job postings and hunting we didn't end up hiring until uh april though so it took like about six months to to get people on full-time and you know it ended up being like i as ceo don't want to be programming

Starting point is 00:50:43 i want to be doing podcasts and explaining, or doing the CI course, like all of the educational content that'll build a developer following. And that's something I can do as CEO, but you can't really hire people for very easily. But if you have a product and you need features built and you want to collect customer feedback and start doing stand-ups and scrum

Starting point is 00:51:06 backlogs and stuff, that's totally something you can hire for. So it was around the time that users liked the product and I wanted to do other things. It was obvious that there was better things I could be doing with my time than programming that we hired. Yeah, that's a very simple and understandable explanation. And that makes total sense to me uh how do you think i would so you already mentioned this where like you want to think about podcasts and like marketing but like how do you think your role is going to evolve over this year so you're working on i would

Starting point is 00:51:38 say some kind of building a following is that's basically what you said. What else do you think you're going to be doing over the course of this year? Yeah. So, I mean, it's a bit complicated because LayerCI doesn't currently have a CTO. And it'll depend a lot on whether we get a CTO or not. Because as CTO, I'm doing a lot of product management. I'm doing a lot of talking to customers. You need your technical leaders to actually talk to customers because otherwise your product vision gets bad. So I'm currently doing a lot of that. And I will probably still be doing that by the end of the year because it takes a long time to either hire or groom someone to be CTO. And it's a big mistake to rush that. But as CEO, a lot of the work is just aligning people. I realized at my previous company,

Starting point is 00:52:29 but if you don't consistently tell people what the company is doing, they'll very quickly forget. If you hire an engineer, and the engineer is working on some particularly interesting optimization problem, it's really easy for them to veer off the important stuff course. They're like dockerizing your CI or if your second developer you've hired is dockerizing your CI or whatever, it's like, oh, is that really necessary for our company?

Starting point is 00:52:58 Which is funny for me to see as a CI provider, but it's an easy mistake for developers to over optimize things really early on. And so like as a CEO, the big thing is just like continuously telling people like, this is the direction we're going. Like I include everyone that works full-time in our investor update, which is something that a lot of people don't do, but I think it's like, it's best for people to know like, this is the cash in the bank. You know, this is what the company's doing. This is like the customers we the bank, this is what the company's doing, this is the customers we've closed, this is what went wrong, this is the future of the company, this is what every individual person is doing to further that goal. And as the team scales, that's going to become more and more of my time. So with a 30-person company, it's basically a

Starting point is 00:53:40 full-time job to just keep everyone aligned. So I guess the CTO hat will shrink, the developer hat will shrink, and the leadership getting everyone aligned hat will keep growing. Yeah, that makes sense to me. And maybe some final questions or final thoughts on what will the developer experience, in your opinion, look like in five years? So today, there's standard workflows. People like NPM serve or some version of that locally to test things. Layer CI is kind of changing that and making it easy to share a version of your code on a particular commit and do it extremely quickly.

Starting point is 00:54:18 What do you think in like five years, like how are people going to be developing differently than they are today? And maybe a flip side of that, what's going to stay the same? Well, I don't think npm serve is going anywhere anytime soon. We use ngrok at LRCI for various things. If you want to test something locally, if you want to iterate very quickly without pushing to a repository, npm serve and ngrok does very well for that.

Starting point is 00:54:44 So there are always going to be the local developer workflows with VS code, code sharing and ngrok. But I think pull request automation is going to become more and more popular. So our vision for LayerCI is to do something like Slack, where your specific company has specific needs for their workflow. For lunch in Slack, you can install a poll bot that will automate the workflow of choosing a place to go to lunch, or you can install the GitHub integration to automate notifying people when something is pushed. But in GitHub or in whatever source code management tool you use, it's always just like, here's the code diff, and here are some buttons, like here are some checkmarks with buttons next to them.

Starting point is 00:55:31 And for a lot of teams, that interface just isn't good enough. Like for websites, it's not good enough for evaluating CSS changes. For apps, it's not good enough for running QA. Like basically every company isn't perfectly suited by that. So our vision for the future is make something that is extensible as Slack so that people can put blocks for the various parts of their workflow. Run Cypress. If Cypress fails, put the screenshot directly in the view. Assign these reviewers. If something visual changes, assign the designer and require their check. Otherwise, assign the code owner for that piece of code. And then you don't have to wonder

Starting point is 00:56:11 who you should assign to this. You don't have to ask your manager who should review your code. It's just you push code. All of the relevant stuff happens automatically. And then if you pass all the gates, like you pass the eye, the reviewers all say it's okay, then you just merge it and it's shown to customers. I think that's the future where instead of fighting for days and weeks with notifying people and pinging them on Slack and setting up screen sharing sessions and wondering why your CI is failing or whatever because you have bad observability into your tests. If you can put all of that in a Slack block

Starting point is 00:56:47 system right in the pull request, I think that's the future of code reviews. That makes sense to me, and I would like to live in a world where we have tools like that that are super easy to integrate and not too expensive so we can convince our managers so that we should buy them.

Starting point is 00:57:03 Well, thank you so much for being a guest. I think I learned a bunch from this conversation. It was great being on your podcast. Thanks for having me.

Pet Camera - EBO Air 2

Software at Scale - Software at Scale 21 - Colin Chartier: CEO, LayerCI

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.

Your Ad Here

Pet Camera - EBO Air 2

Software at Scale - Software at Scale 21 - Colin Chartier: CEO, LayerCI

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.