Two's Complement - Virtual Infrastructure

Episode Date: July 15, 2022

Ben and Matt compare container technologies like Docker to virtual machines, and discuss the tradeoffs when deploying applications. Matt explains the scary things that can happen when you share a VM w...ith strangers. A visitor enters through the couch.

Transcript
Discussion (0)
Starting point is 00:00:00 I'm Matt Godbolt. And I'm Ben Rady. And this is Two's Compliment, a programming podcast. Hey, Ben. Hey, Matt. How are you doing? Very good. Excellent.
Starting point is 00:00:23 Well, we don't normally refer to things that have happened in the news because that gives us a certain flexibility in the order that we release these recordings. But you and I were literally just talking about the fact that Broadcom has bought VMware and we were going to talk about some level of containers versus not containers versus virtualization versus whatever and it seems like we should we should bring that up so yeah let's talk what do you think a good topic right exactly it's such a deep one and you know we've got varying levels of experience in different technologies for essentially what is how do i make sure my software works in the environment that I'm expecting it to
Starting point is 00:01:06 and I'm thinking personally from this point of view like a developer who deploys server type applications headless applications that run on machines in the cloud or in data centers and whatever like that but I guess actually now I'm saying that I was sort of giving that so that you know I can't say too much about how UI stuff is developed. But then there's a number of software packages I see for Linux these days that come as a pack file or what are those things called? There's a bunch of different snaps, snapshots, which are essentially here's a whole operating system's worth of application wrapped into a file system and then presented as if it's like a single thing and it's like a docker container right but but differently so so it it's it's everywhere and i think we all want software that's easy to deploy and run
Starting point is 00:01:56 but there's a number of ways of achieving it yeah what are your thoughts uh well i mean i think it's interesting that i think you and i have a sort of a similar perspective and that we look at those tools that way. a fixed amount of hardware that they have provisioned and paid for and have pre-purchased the electricity for and have backup batteries for and have networking for and say, okay, well, how do I take this sort of fixed resource and allocate it out to all of the needy, greedy software engineers who keep telling me that they want more servers? Well, I got to have a server for this app and and I have a server for this app, and a server for this app. And so, you know, I think from their perspective, they might see some of these virtualization tools as a way to, you know, manage those resources more effectively and have a little
Starting point is 00:03:00 bit more control over not only just the resources themselves in terms of memory and compute, but also the sort of blast radius if one of them goes horribly wrong, right? Being able to wipe an image and give someone a fresh new server with a few clicks of a button is way easier than driving down to the data center and unracking a machine that is no longer responsive because somebody did something terrible to it. I don't know what you're talking about. I have never done that. Oops, I just fork-bombed my own machine.
Starting point is 00:03:36 So you're right, actually, that's a very valid point. Those infrastructural things are super important and it's sort of a funny thing. I was talking about it at lunch with a bunch of folks the other day and um regaling with the story from from my past and a friend of mine who used to work at uh a big airline and those folks are still using mainframes and mainframes have always been able to do all the things that we're now kind of starting to rediscover in that virtualization world you know like hey you want more cpus yeah we can bolt more cpus on while the mainframe is still running hey you want to shut
Starting point is 00:04:08 down the mainframe do maintenance and bring it back up again sure what we can do is we can teleport the mainframe's image up to a backup site nobody even notices that your connection is now going to manchester instead of london um and your terminals keep on responding everyone's still doing their requests and the batch jobs are still going that meanwhile they power down the main machine fix the ram and then you can teleport it back and those things have been around for you know half a century and yet we're rediscovering them in terms of what i mean specifically you're mentioning things uh like like vmware um allow you to manage the resources really fine-grained and make those kinds of like
Starting point is 00:04:46 hey we need to move from one machine to another machine i it's it's sort of miraculous that it works as well as it does yes yes yes yeah because those mainframes were clearly designed with those specific use cases in mind right hardware capabilities to do those things right right they built those things from the ground up with like, okay, we're going to be able to do this offsite backup and we're going to make sure that it all works. And with all these other things, we sort of backed our way into it
Starting point is 00:05:11 because it's like, clearly there's a need to do that. But, you know, the old school operating systems and CPU architectures and all these other things that we have, maybe someone gave that a thought a long time ago,
Starting point is 00:05:25 but they certainly didn't design the whole ecosystem from the ground up to be able to do that. And so now we're sort of in this state where it's like that need is still there, the desire is still there, and it's this sort of tricky problem of, okay, well, how do you actually do that? And yeah, folks like VMware have got their solution for it.
Starting point is 00:05:43 There are obviously other vendors that can do it. And of course, I mean, one should note as well that the chip manufacturers have been slowly heading this way too, adding more and more like hardware level virtualization things. Because, you know, we've always been able to do these things. It's like my hobby of writing little emulators for old machines. Once you can fully emulate something, of course, its state is just a bunch of numbers that you've've got you can move that around anywhere you like and then kind of carry
Starting point is 00:06:07 on somewhere else and have you know your single step through each frame of a game and then go backwards because you could just emulate from a snapshot one fewer frame forward and keep you know that kind of stuff so this has always been possible but it was just infeasible to do it without actually running the same cpu as you as you're trying to virtualize and the same hardware, but then things have come along. But I feel like we're going off base from where I was thinking of going. I just got excited, which is what this podcast is about, right? Yeah. No, I mean, I think these things are all kind of related. And maybe, I don't think we should necessarily dive into this at the start,
Starting point is 00:06:45 but one place that you could maybe take this is like, this isn't just about virtual computers. It's also about virtual networking equipment, right? Like if you look at, you know, some of the tools that are out there, it's like, yeah, you think that this IP address is a switch, but it's not. I mean, one only imagines what's going on in like the the aws's and the google cloud infrastructure in terms of their physical network separation and their ability to as you say make it look like you have your own cloud to yourself knowing that actually no those those fibers are the same fibers that everyone else is using between all the racks. It's all magic. Right, right, yeah. But yeah, talking specifically about the part of this that is, you know, I, as a software developer,
Starting point is 00:07:31 somebody who's, you know, sort of building a total application, you want to be able to deploy it. That's really important. You want to be able to, you know, connect to the machine that's running it and troubleshoot it and read the logs and, you know and run TCP dump and
Starting point is 00:07:45 netstat and all the other wonderful tools that we talked about in some prior podcast and still have the flexibility of the things that we were talking about in terms of making maximal use of those resources and being able to tear it up and down and being able to build the definition of what that system is in a configuration file rather than in a PC part picker. Here's a checklist that Barry has to go down and make sure they all look the same. Right. Yeah, yeah, yeah, yeah. Right, right.
Starting point is 00:08:19 And so there's lots of different tools to do this, but I think they all sort of serve the same needs. So what are some of the tools that you've actually used in anger to do this? Well, I mean, the main one that springs to mind, other than bespoke ones that I guess became Kubernetes, when I was at Google, there were some things that became sort of like strange containery type things.
Starting point is 00:08:40 I don't know if it's exactly the same thing now, I think, out loud. But the one I have the most experience with is Docker. And Docker is a great solution to the, I want to have a reproducible environment that's incrementally built with layers. And so it's relatively efficient if you are only changing the end layers. And you can definitely have that kind of feeling of like, well, if I have a Docker image that I can give to you and you're going to run it, then I am 99.69% positive
Starting point is 00:09:12 that if it worked on my machine, it worked on your machine because what I in fact did was ship my machine to you. Now you're running my machine, which is a blessing and a curse. And I think that's the problem, right? Is that it can be misused like anything, like any technology.
Starting point is 00:09:29 My experiences with Docker, so Compiler Explorer started out, actually it didn't start out with anything. It just started out with a shell script running the Node.js on a bare machine. And then very quickly, it was like, how am I going to manage this? So I decided to use Docker rather sens and at the time and docker served
Starting point is 00:09:47 as well for many many years um docker did not scale with the gigabytes of gigabytes you know hundreds of gigabytes of compilers that i wanted to build into the image the images took longer and longer every time to build we're going to take a pause there while my wife comes in through through the couch through the back yeah through the couch i never realized that there's a door behind your couch how else do you get between um places you know we put flu powder in and then we can go anywhere to any other couch that's how this works that is exactly how this works it's okay the back door's jammed okay the back doors oh no so you had to come door is jammed. Okay, the back door is jammed. Oh, no.
Starting point is 00:10:26 So you had to come in through the couch door. Yeah, I had to come in through the couch door. All right, there goes my dog. And then we're going to have to try and remember what I was saying and work something out. Or just pretend it didn't happen and just put this in and, you know, give our listener... I was just thinking, actually, I hope our listener isn't called Barry. Because I always use Barry as my general dog's body person. Oh, yeah.
Starting point is 00:10:48 I think I say Steve for some reason. I don't know why that name. Steve, yeah. So we were talking about Docker. Yeah, you're talking about using Docker in Compiler Explorer. That's right. So the problem with bigger and bigger images is that no matter how you cut it, you're uploading layers upon layers
Starting point is 00:11:06 upon layers upon layers of a piece of software with more and more compilers. And it was just getting unwieldy. And there are definitely tricks you can do with volumes and other things like that. And we looked at them for a while. But ultimately, we backed out when we realized we needed more security than Docker would give us. At at the time there were some relatively high privilege um exploits for breaking out of docker containers into the wider world and we were kind of tacitly relying on docker to also be a sort of protection domain and the other thing is that if you're running inside that container even if you um even if you don't get privilege escalation outside that container that container is long-. So if you're like servicing somebody's request and it was a poison request and it was able to monkey with the system,
Starting point is 00:11:50 it's now monkeyed with that running Docker container. And so it's going to be there until we restart the machine or we start the container. So there were some things we didn't want properties. We didn't that we wanted to get in terms of jailing. And once you're in one container, you can't have a container inside a container and inside a container arbitrarily,
Starting point is 00:12:07 at least at the time you couldn't. So we switched out to a different approach where we just have tarballs and run them on the operating system. But it did serve a need for a long time. And it's a frequent question we get asked is, hey, do you publish a Docker container of a Compiler Explorer instance
Starting point is 00:12:24 that I can just get started because people do just want to do Docker run, blah, blah, and you get that benefit. It just works. We have different ways of achieving that, I think. So anyway, that's my experience with Docker. I also have used it at a number of places at work, and I think it works great if you plan very carefully your Docker image layout and the layers are sensible and well managed. Yeah, so when you say a Docker layer, what do you mean?
Starting point is 00:12:53 What's a layer? So Docker logically is a better explanation, into a bunch of temporary directories and then overlays each directory, one directory over the other. So you start with a base image, which is maybe your entire Ubuntu distribution. And then you're like, oh, the first thing I'm going to do is I'm going to install these 20 apt packages that I need.
Starting point is 00:13:21 And so the next layer will be another file system that only contains the things that change between the base system and the system where you ran sudo apt install my hundred packages I needed my extra packages and then the next layer might be oh and now I'm going to copy some files from my git repo that I'm running it in into the container at a particular location and that's another layer of the file system that only contains those copied files. And then so on and so forth. Each layer would add in more bits of the software and configuration. And the cool thing is, of course, is that you only
Starting point is 00:13:55 need to regenerate layers that changed. And of course, the layers that are immediately after them. So if I change, for example, the base Ubuntu image, of course, everything depends upon that. So I'll have to rerun the commands that populated the later layers and create new layers. But if I'm just changing my application software and I don't change my dependencies and my system dependencies, then oftentimes it's only that last layer of a few hundred kilobytes or so that changes. And so not only is the build time faster, but the way that Docker distributes itself is as compressed layers. And very often, of course, if you're upgrading software time and time again, those base layers are already on the system in the cache somewhere. And the only thing that you need to do is upload the few hundred K each time, which is fabulous fabulous so that's a really good way of of having um a sort of incremental deployment of your of your software yeah so what happens if i have like a layer that's like
Starting point is 00:14:54 fetch the latest version of this thing from the internet well that's that is an excellent question and uh that is one of the biggest problems with something like docker is that it's very easy docker cache is based on the text contents of the command it's going to run so if you just say curl get me latest version of something pipe through tar zxf or whatever to extract it then that command will run exactly once on your machine when it populates that layer and then if you run again having uploaded up having changed the um the contents on the website that you're curling from or like a new version of the software is released and the url doesn't encode that in some way you know you're getting you know like getting latest or bob.latest exactly right then you won't see that but
Starting point is 00:15:42 unfortunately anyone who later builds with your docker container will see that change and so these things will not necessarily agree right and so it's really important that if you are fetching external resources and it's so easy not to get this right but if you are fetching external resources that you get like a specifically named version of everything that you want to get for two reasons one it means that you get like a specifically named version of everything that you want to get for two reasons one it means that you get reproducibility if someone else grabs your docker file and just says build me this please and the second thing is that necessarily if you want to change that image you have to edit the the url that encodes the git char or the version
Starting point is 00:16:19 number or whatever and which means that it will be rebuilt automatically but it's hard to do that right and it's hard to make sure you apply that everywhere right even things like the base image itself you know oftentimes when you say in the docker file hey i'd like to build something based on ubuntu 2004 that's essentially what you say you say from ubuntu colon 2004 from ubuntu latest or something like that and those are kind of like a Git pull of whatever someone has tagged as being the 2004 for Ubuntu. If you really, really want to make sure you get reproducible builds, you need to put the char hash of that particular layer
Starting point is 00:16:55 in the get command as well so that you know you're always going to start with the same version. And of course, there's a duality there, right? It's convenient from you know from my mindset it's great to have a totally reproducible build and that means that i can hand you a docker file not not the contents of the docker image right that's different but if i hand you just the text that says this is how to build my world you will get the same answer that i got every time
Starting point is 00:17:21 and that's really powerful but it's super inconvenient because every time some little trivial fix in the base image is pushed you know a security patch or a security fix or whatever then I have to think to go back and change the shard to be the latest one and that kind of feel if I want to keep those things going and of course the first thing you're going to do this is almost always what the first line after the from ubuntu is sudo not sudo because you're running as root is apt get update and update update sorry upgrade and update right because you want to pull in pull in all of the the things that are latest there's no kind of version for that there's no bi-temporality to that so you're a bit stuck at that point um and that factors into where some of the problems that one has with with something like docker it's
Starting point is 00:18:08 a boon but you have to be really careful how to use it and have to understand these slightly sharp edges and maybe most people don't care about those but i know that it's affected us before and we we have a you know you and i have definitely got um an industry where we really want to be able to reproduce what we did before and understand it. Yeah. It's also very easy to generate gigantic layers. If you think about, if you don't design your Docker file correctly, you know. So in the example I just gave, apt update, apt upgrade, apt install, right?
Starting point is 00:18:46 Those are like sensible commands I might type myself if I had a fresh new computer that you handed me. Right. The simple thing to do would be to run them as three separate layers. And that makes a lot of sense. But I've pulled down a whole bunch of stuff and replaced a bunch of... There's a load of temporary files that get pulled into the apps directory that I probably don't need in my production image.
Starting point is 00:19:08 I've then updated a whole bunch of stuff, which has replaced a load of temporary files that get pulled into the apps directory that I probably don't need in my production image um I've then updated a whole bunch of stuff which has replaced a bunch of stuff and then I'm like maybe installing my own packages and maybe I remove some system packages that I don't want right and so I've got three or four layers each of which is strictly additive and then there are sometimes if you had to delete files so you might be tempted at the end of that to go and the last thing I do is rm-rf var apt-cache. Kill the cache. I don't want it anymore. It's like gigabytes of all the intermediate crap that was downloaded while I was installing my packages. But if you put it as a separate step,
Starting point is 00:19:35 unfortunately, those already exist. Those intermediate files exist in a layer. That delete can't remove them from the layer. It just marks them as being, you can't see them anymore. It puts tombstones in there. And so your overall overall size the number of bytes you need to ship around still contains the layer that has all of those files in it and then a separate layer that says and by the way all those files are gone now right so you have to be really careful so you what people end up doing is writing a long stanza of like apt-get and update
Starting point is 00:20:05 and as like one giant long single bash command. And at the very end of that, rm-rvar apt-cache and depackage dash dash, you know, purge caches, all the things as one thing. So atomically, all those things happen. And then it's just the end result that gets shipped as the layer. Yeah, yeah. And I've definitely seen that in Docker files, and it's sort of this like, you know,
Starting point is 00:20:28 it just reads as gobbledygook at the start of the file, and you sort of parse it, and you sort of figure out what's going on there, but it's not the sort of like clean, you know, one instruction at a time, maybe with a helpful comment as to why you're doing it. You know, you see sometimes people will write shell scripts that they then copy into the image to run and then delete again afterwards, just because then the shell script is essentially atomic from the point of view of the layers.
Starting point is 00:20:54 And it's, I mean, it could be a tooling thing. It could be just what you'll get used to. I don't know, but it's easy to get wrong. And the thing is that as a developer running locally you tend not to notice these mistakes because it's necessarily incremental you've been doing this you've been building on and building on and building on right and then when you ship the the when you dock a push for the first time you discover that you've got several layers of you know gigabytes each and i'm sure you've done this as well when you've pulled someone else's docker image and you're like oh my golly what on
Starting point is 00:21:21 earth is it pulling down why is this docker image so big as a game that many have played and few have won? It's a really painful experience sometimes. You start cracking open the layers and trying to figure out what the heck is going on, and it's just like, oh, jeez, why are we doing this? Right, geez. Why are we doing this? Right, right. Yeah, yeah. And I think a lot of the time, people reach for Docker because it's super convenient, everyone understands it,
Starting point is 00:21:52 and it does solve a very real need. But I think oftentimes, in my experience with the kind of things that we do at least, a table of the code that you're going to run, maybe containing the node.js binary you want to run it with, or maybe, you know, because we're in a luxurious position where we own our machines.
Starting point is 00:22:12 They live in a data center. We know which machines they're running on, which are, you know, probably virtual machines as it happens. So that's another layer of virtuality above all of this. But if we know a lot of things about what version of libc is it running what you know base operating system are we running what things can i assume are there which of course is now a dangerous game to play which docker kind of makes you address fully but most of the time you're like well okay if i've got libc this version i'll just pass along all my dependencies right and it's not that big you know
Starting point is 00:22:42 for native applications often a bit of a few environment variables and suddenly now all of your dlls will be looked for inside the directory you ship and then you just like copy them all with you and that's a bit bigger but you know we're talking tens of megabytes of library files here right in a little tarball that you extract and will run on the developer's machine and a remote machine. And I guess the other sort of critical part about Docker is that it requires elevated privileges, which means there's a lot of monkeying around with which user you're running as. And sometimes it's useful. Sometimes you want a totally unprivileged user that's isolated from the kind of was it 12 factor type model where an application sort of
Starting point is 00:23:27 consumes only logs to standard out only reads and writes to external things through tcp that's fine you treat it as like a black box but very often it's tempting for developers to kind of go well it'd be really convenient if i could get to this set of files on the network or if i could write to this log directory and so you start passing things you start puncturing the isolation that docker gives you and then suddenly you wonder why i know you've got 100 files that are owned by the wrong user right excuse me there's a truck going past um but you know you run this command and then you'd like try to delete it afterwards and it goes i'm sorry i can't delete that you know You need to be root. Wait a second, I'm not...
Starting point is 00:24:06 How are you root? Yes, how did you write this as root? And I think it is really an unfortunate thing that the default behavior of Docker is to run as root because it's really easy to sort of fall into a trap of building an application that
Starting point is 00:24:22 accidentally, for really no good reason, needs those elevated privileges. If you had just been forced to think about it for a minute, you would have been like, oh, we don't. I mean, the dumbest example I can think of is we're binding to port 100 instead of 2000. There's no reason in the world why that integer matters to anyone. But if you build a whole application, it's like, yeah, there's 30 other apps that connect to port 100 because that's the port that we chose. And not realizing that that requires elevated privileges. Then you've just added a whole bunch of you've added a constraint completely by accident. Right. And and running as a non-privileged user
Starting point is 00:25:07 you'll find that out right away right um and there are other things like that too and i and i feel like it's almost like the testing thing right now on brand oh my gosh testing you say i know i haven't talked about testing in like a podcast and a half so i know all right it's you know part of the reason you write the test first is to make sure that the resulting solution that you come up with is testable. If you build something and you don't think about tests and then you try to add the test later, it's really hard. And so most people don't. And the reason for that is you came up with a perfectly reasonable solution if you completely ignore this other constraint. And then you try to add it in later right and so you're doing kind of the same thing when you run uh you know apps in as root in docker is you've you've
Starting point is 00:25:50 got a constraint that would be nice but you don't even think about it until it's invisible which okay so i'm going to take the other side of that just as of in the defense of a docker style thing i know obviously this is uh uh there's many a nuance here but one of the things that docker gives you kind of out of the gate is deployability which is another thing that if you don't think about right at the beginning yeah you it's hard to retrofit we've all seen applications that you're like well this is all well and good if i can get clone and i've got full access to the internet and then i can run uh these commands and i've got access to these things and i can do whatever and you're like that's great on my developer machine again the loudest
Starting point is 00:26:30 truck in the world is now outside my house they're circling just circling they really no it's just he's taunting me he's reversing it up this has been the most i'm i will try and edit some of these things but i think if you're if you can hear this dear listener then i failed to edit the podcast very well all right i think they've gone so but where were we um i was ranting about something you were about to defend docker it was i was defending no the deployability is an important thing to not have to retrofit afterwards and docker kind of hands you that straight away you're like well docker, Docker pull, Docker run. Amazing, right? My CI is Docker build and Docker push. And my runtime is Docker pull and Docker run.
Starting point is 00:27:13 And the cool thing is that my developers can run as if they have the CI build because they can Docker pull as well and then Docker run as well. And so it ticks tons of boxes, right? It's so lovely, right? From that point of point of view again until you discover that half of your computer is now owned by root and you don't actually have root privileges on it and then you're like well i'm stuck with these files i guess yes right until you fire up the container and then uh rm them from the
Starting point is 00:27:39 container inside the container has root i mean a good friend of mine, I will not drop them in, but a good friend of mine has a one-liner that gives you actual root privileges on the machine that you're on if you have Docker available with non-sudo. It's a convenient little thing to remember and just click it. Oh, that's convenient. Right, right. If you have Docker, you basically have root. Yes. Even if you weren't allowed to in the first place if you and if you live if you work in one of those horrible environments where they don't let you have sudo on your own machines which is insane but they do exist you can maybe put in a request for docker
Starting point is 00:28:13 instead and get basically the same thing let me just say that this this uh this is a personal opinion that ben and i hold um don't want to get anyone in trouble with their security teams please don't do anything daft with that information. But it is true. And it's great for taunting your infrastructure and SecOps folks if you indeed need Docker for whatever. Anyway, that's Docker. Other containment, containment, container solutions. I mean, containment as well. Containment solutions.
Starting point is 00:28:38 Like from the Ghostbusters. Yeah, I was actually thinking the same thing. Yeah. The light is green. The trap is clean. Yeah. So, I was actually thinking the same thing. Yeah. The light is green, the trap is clean. Yeah, so, I mean, VMware. So we kicked off this whole discussion with VMware and virtual machines, which are a very different kind of technology than Docker.
Starting point is 00:28:57 Do you think you could give us a two-minute overview of the differences between something like VMware or VirtualBox or other sort of virtual machines and Docker, having built many virtual machines in your life? Well, my virtual machines have all been 8-bit, which makes them considerably easier on some axes. But yeah, so let's explain a little bit about how Docker is working. So at least Docker on Linux,
Starting point is 00:29:21 which is my only experience here. So Linux supports namespacing. That is the ability to make groups and resource allocations that are kind of contained and have their own namespace away from anyone else running on the system. And now obviously you can think about a user is a sort of a namespace of vaguely. But, you know, if you type PS as a particular user or ps aux you can see all of the other users that are running on the system in this instance namespaces can contain off areas of the operating system so that like the main operating system can see what's going on but if you're inside that namespace if a process is inside that namespace it only sees things in its own namespace and namespaces can be file systems they can be users they can be um oh
Starting point is 00:30:08 cpus and that may be secrets but there's a number of things number of like um aspects of the system which can be compartmentalized and held separate but you're still running the same operating system and you're still doing all the things that you were doing before you're just making a new namespace and what docker effectively is doing is making a new namespace um creating inside that namespace a bunch of links to the outside world for things like the terminal for things like um oh yeah network is another namespace you can create and you can make a name space you can make a bridge then that talk that talks one namespace to another as if it was one of those network devices that we're talking about as a um and and then you're basically running like a regular process except that if you type ps or if you to type ls you'll only see the world that the container gave
Starting point is 00:30:58 you through giving you your own namespace and it's a bit like if someone's ever looked at like ch root jails, which was like the precursor to this, where you could say, hey, start a new process and pretend that the root directory, like the slash, the top of the hierarchy, is this subfolder I just made. And then you can never see outside of there. And you can imagine that you're effectively in a jail. You can't see outside of there.
Starting point is 00:31:19 And your process can run along and be isolated. And you can see how you might build like a duplicate operating system image in there and then run it. But it's running really on the main operating system. And that has a really interesting side effect. The kernel calls that you're making are going straight to the host operating system's kernel. There is no kernel that you're running inside your Docker container. So if you're running on kernel version 5.star and there's some whiz-bang new feature that's in kernel version 6 and above
Starting point is 00:31:52 and you've got a Docker image that's Ubuntu 24, whatever, that wants to use that, it ain't going to work. No amount of Docker magic will make new features appear in your running kernel. Virtualization, on the other hand, takes this down to the hardware level and pretending effectively like you are...
Starting point is 00:32:13 Oh, God, now the distractions are a cat hitting the microphone. At the virtualization level, you are pretending that you have a CPU and networking resources and hardware resources that don't actually exist and then a full-on kernel boots up in that world and as far as that kernel is concerned with a few caveats it thinks that it's running on a real computer but it's actually running on a simulation of a computer that's running on the real computer. Now it's kind of like how we're all living in a simulation. We are all living in a simulation, which explains an awful lot.
Starting point is 00:32:50 Yes. Right. But yeah, we're all living in some kind of the matrix and all we're doing is we're putting another matrix in our matrix so that we can run another copy inside of that. So as far as that virtual machine is concerned, it is a full sovereign computer in its own right and it can do anything it likes unaware that when it says hey oh i've got a network device over here what's really happening is that some kind of um trap is happening in the cpu when it's accessing or trying to access that device and an operating system one layer up in the list of of matrices um goes oh wait a second and much like when um a regular operating system misses a page and has to swap it in from disk and like the
Starting point is 00:33:31 process is put to sleep and while the the process the the image is red and then it kind of goes oh yeah the memory is there now the same thing happens at one layer above in what's called the hypervisor which is like the operating system running the show for all of the operating systems underneath it. And so that hypervisor can do, can arbitrate access to the real network cards and the real physical block devices, like the hard disks that are in the machine that you're emulating. And then when you say emulation, you think it's going to be super slow. And in fact, you know, you could obviously write a genuine emulator, and then you could pretend to be an ARM machine when you're running on an x86 or whatever. What typically happens is that these are hardware accelerated.
Starting point is 00:34:16 The CPU knows, quotes, that there are layers and rungs of the hierarchy of simulation environments. And it gives the hypervisor more privileges than the operating system underneath. And in fact, mostly nowadays, the guest operating systems, as they're called, are in cahoots with the virtualization layer. They actually do know that they are living in a simulation. And that allows certain things to be a lot faster.
Starting point is 00:34:42 So instead of actually having to emulate a real network card like as with this sort of two-way back and forth between the hypervisor and the underlying uh operating system there can be some kind of agreed thing of like hey i'd like to talk to the network card i'm just going to put all the data i would like you to look at over here and then hey hypervisor imagine that a network you did that whatever the network card thing is there's a certain amount of collaboration i'm making that up in full disclosure. But what that means is that when you go to your Amazon account and say, I'd like a new computer, please. That computer is not a real computer. It is just a virtual computer running on someone else's infrastructure.
Starting point is 00:35:19 And you get a certain number of CPUs and a certain number of disk IOs per second and all that good stuff. And this then comes back to the VMware thing that you were saying at the beginning this is why infrastructure folks love it because i can buy two 128 core uh terabyte ram machines and then i can hand them out to as many developers as i'd like in like two or three or four cpu slices which i can't even buy i can't buy a t2 c CPU computer anymore. And they get to share it, and they all have root on their machine, and there's no way they can bust out of their virtualization environment to get to the hypervisor. But they can blue screen, their kernel can panic, the whole thing can go down.
Starting point is 00:35:59 It's exactly like a normal computer, except that really it's just one tenth of the physical machine you're running on right right so when the annoying developer tells you that they need a server to run their app and you ask what the app is and they're like well this is no js app that runs in one thread you're like there's no way on the planet i'm giving you a ten thousand dollar server to run a single threaded no js app so i'm just gonna give you this one little slice and you think it's a server and it has its own operating system, which means obviously there is a, you know,
Starting point is 00:36:27 your storage requirements, both in terms of memory and in terms of disk space go up because, you know, like there is a real honest to God Linux kernel running there. And probably on the sibling CPU, like literally on the die, you know, two millimeters away from you is another CPU running someone else's Linux kernel and never the twain shall talk to each other. Right. Rowhammer issues and other things aside.
Starting point is 00:36:49 Yeah, don't give me an in to talk about that kind of stuff. You know, so actually, all right, we are going to, we're going to have to now because you poked my buttons. So. Rowhammered them. Not Rowhammer, but that's definitely one for another conversation but what um what a reasonable person might do given what i just said is say well the hypervisor is sat there not doing very much doesn't need any cpu resources most of the time because it's
Starting point is 00:37:18 reactive to the host operating systems that are really running on the cpus right but we could potentially say well let's give one or so cpu the hypervisor itself, and it can do some background maintenance activities. What if it scanned through all of the physical memory of the computer and went, wait a second, I've seen this 4K page before, right? I've got every single of my 60 guest operating systems have all loaded up variants of the same linux operating system why the hell would i have the same 4k pages you know like many many many 4k pages that are exactly the same because they all loaded like you know vm linux 4 5 29 whatever why don't i just point them all at the same actual physical location and then discard the copies of it, but pretend to all of the individual guest operating systems that they have their own copy,
Starting point is 00:38:11 and then it's just copy on write. If they try to write to it, then they get their own copy. A bit like when you fork a process on a single operating system, the same tricks happen. Makes perfect sense. Now, obviously, you have to do it retroactively. When you fork, you know that every page that you currently have is going to be shared in the child process. But this is a sort of emergent property of once you've booted a machine up, eventually some pages will be the same on one machine as they are on another, in which case you deduplicate them. And then you're right. You've got more free memory for the system as a whole.
Starting point is 00:38:38 And it seems like there could be nothing wrong with that until the security people come along. Yes. And ruin everyone's fun. say ruin everyone and ruin everyone exactly exactly uh so it was shown that and maybe i won't go into too much details for two reasons one i don't necessarily know the details and two we've probably talked too much about this already um it was shown that if you have the same implementation of open ssl or one of the other cryptographic libraries as a co-located virtual machine to you. So I'm going to just go to Amazon and I'm going to ask for 100 EC2 instances. And then I'm going to run a test to see if I can find that I'm co-located with my target. Just by coincidence, I happen to be running on a machine that also has an SSL process somewhere in it, right? The chances
Starting point is 00:39:26 are that obviously those 4k pages will be deduplicated because it's the same.so that we've both got. It's OpenSSL, Ubuntu, whatever version, right? Now I can start doing timing attacks because I know my physical RAM is associated with the same physical RAM that they have. And so if I know which code paths are taken in their code i can poke around in my cache and sort of try and determine whether or not getting the keys basically exactly i got this bite of the key and i got that bite of the key and i don't have it all yet but that's close enough it took a long while to read this bit out because it must have been an l3 but if it wasn't then i know it must be in someone's l2 somewhere and that someone
Starting point is 00:40:04 might be and all these kinds of things and you can imagine how terrifying that is from a point of view of of security you like you've lost the isolation between the virtual machines that aren't even meant to know that their siblings exist so that's your own fault ben worth it we can talk about rohammer another time. Worth it, worth it. So in terms of deployment, though, I mean, you sort of alluded to that by saying that, like, as a developer, it's convenient to be able to go to your infrastructure folks and say, can I just have a server to run my little Node.js app? Or not even talk to them and just, like, run a script that generates one for you. And they keep tabs on it and they know who's allocated to and they can call you up and say, hey, you're using 35 servers.
Starting point is 00:40:48 Do you really need them? But you can automate those things, right? And it's really great when you do. That's very true. I mean, I forget, of course, that this is what Terraform and what the like do for me and Amazon, right? I just say, I want another computer,
Starting point is 00:41:00 another computer appears. It's never occurred to me that really somewhere behind the scenes, all this magic is going on to make that happen but you know it just does right and yeah that puts a lot of power and responsibility but a lot of power into the developer's hand you don't have to like overload a machine and you get the isolation that say a docker container would give you but at a much deeper level now different problems again right you know at least in your own server if it's running as root well it's only running as root because you made it run as root right as root
Starting point is 00:41:30 you got every choice you like yeah yeah so what do we think about that in terms of like the trade-offs when what would what would make you choose one method over another? I mean, I tend to lean more toward, you know, having virtual machines and, you know, having more of like the, I'm going to get this virtual machine. I will probably build some very lightweight automation to set it up. But again, the setup of it is mostly just, you know,
Starting point is 00:42:03 kind of like you were saying, the app update, app upgrade. Maybe install one or two system packages, but hopefully not if I can avoid it. And then just run all my applications as a user, as an unprivileged user. And every version is a new tarball that gets copied up to the computer or maybe have some automated thing that pulls them down from a central repository somewhere.
Starting point is 00:42:25 You've got like a deployment thing that you use, haven't you? You've got like a, is it GitDeploy? Oh, GitDeploy? Yeah. Is that open? That is open source, right? That is open source, yeah. So GitDeploy is sort of my Heroku-style deployment script that I made
Starting point is 00:42:36 that will let you take any server that you have SSH access to and basically push to it as if it was a Git repository. And as a side effect of that, if the push works, that is your code is not out of sync with everyone else that's deployed to it, it will deploy your application and start up. And so you get to sort of use the Git semantics around push and pull as your mechanism to make sure that you don't accidentally clobber someone else's deployment. I see. And so it's sort of a safer way to be able to empower people to deploy locally from their machines if that makes sense to do.
Starting point is 00:43:19 Now, sometimes that doesn't make sense. It doesn't always make sense, right, yeah. In fact, it sort of usually doesn't make sense, but sometimes it makes a ton of sense. And it's really nice to be able to do that in a way that is safer than just you know scp right right but i mean often you know there are there are also places where or times when you want to be able to push to like a development machine a development cluster and that seems like a good thing there where i would actually want the feature is i have a code on my machine that i want to have running in an environment that i can't reproduce
Starting point is 00:43:49 myself locally right it's not ideal to be in a situation where you can't quite reproduce it locally but sometimes you know i want to batter it with 200 machines that are going to send queries to it and so i want to deploy my version that has my fixing or whatever. Yep. Yep. And I mean, you know, you can take, speaking of virtualization, like you can take these things a lot further. And one of the things that I've been playing around with on one of my projects is sort of getting rid of the idea of the production environment. So all of the environments in this project that I'm working on are just branches. There's the main environment for the main branch, and that's where the DNS entry for the top level domain points to. But if you make a new branch, it will automatically
Starting point is 00:44:33 spin up a new environment, and it will marshal all the services that that environment needs, and it will do everything that it does. And so if you want to make a change that involves potentially making changes to the infrastructure, like, oh, I'm going to change a security group or I'm going to change, you know, the number of servers from four to five or whatever it might be. Yeah. You just create a new branch. You push that branch to GitHub and the infrastructure magically appears. And the name of that infrastructure is literally the name of the branch. So they're tied together in that way.
Starting point is 00:45:03 And when you delete the branch, the infrastructure gets torn down. That's super cool. The main branch is always there. That's sort of the quote-unquote production environment. But if you were to ever delete the main branch, it would also actually tear down the production environment.
Starting point is 00:45:18 I mean, that's probably what you want, though. You know, it's like, it's sort of a weird thing, but it's like, it's like just coupling those two things together very tightly and saying a branch is an environment. There's no such thing as the dev or the test or the UAT or the production. They're just names of branches.
Starting point is 00:45:32 Right. And that is only possible because of virtualization. You couldn't do that any other way. On a real machine. Well, yeah, for all the reasons. I mean, cost was what I was about to bring up because I'm sort of trying trying to move compiler explorer towards a system which is a tiny bit more like that where instead of the staging environment that we do tests in being kind of like just a sub category under the production environment it's like its own AWS account effectively and then I can do the kind of
Starting point is 00:45:59 things you're talking about like hey let's have a new um balancer. Let's try out a different way of doing everything in the staging environment. But for me, that's prohibitively expensive because those resources are not free and they're quite expensive. Like having one load balancer is expensive enough. And I can configure that one load balancer to kind of say, well, if it has slash staging in the thing,
Starting point is 00:46:23 then go to this subsection, right? And that's how it works at the moment. So there's a trade-off to be had there. And obviously, in a world of infinite resources, it's no problem that if you create 12 different branches, you've got 12 environments. Right, right, right. Well, one of my subtle motivations for doing this, and again, I'm trying this on my own project, but maybe one day I'll get to do this in a more widely shared company environment, is to directly manifest to the bean counters the cost of so many different branches. That's amazing. It's sort of like, yeah, branches have a cost and it's hard for you to measure that cost. It's mostly cognitive load on developers. What if we just turn that
Starting point is 00:47:02 into dollars? Actual dollars. actual dollars you could measure them and be like why do then you have accountancy like why do we have so many branches as as one of those folks that sends out the emails and the the nags to people saying like hey this pl has been open for three years is there any chance of it being closed i totally i'm down with that right yeah the cognitive load when i hit auto complete in uh in the for my branch name i'm like what the heck is this this person left the company two years ago why is it still here those kinds of things yeah yeah no that's that's i like that approach i like the idea of of manifesting and i think you know obviously you've talked about
Starting point is 00:47:36 doing it in terms of virtualization there are ways and means of doing it with the uh docker style approach as well. And I know we're kind of getting close on the amount of time that we've got available, but I'd like to sort of suggest, you know, there's the Kubernetes type approach. There's Nomad, which is a HashiCorp system, which can be used to run Docker containers. And then there are definitely low balancer type things that can talk to those containers.
Starting point is 00:48:02 And you could definitely have it so that every branch that you commit builds a docker image and pushes it to a tag a named tag for that branch and then auto registers a container a job running in nomad to say hey i'm the your project hyphen your branch name environment that has all these machines running so it can be done too you are still paying the cost there are processes running in one operating system or one set of operating systems that is the nomad cluster or the kubernetes cluster or whatever it's a sort of lower um i guess it's higher level i don't know how you describe it where it's cutting you know is it lower level or higher level um than than virtualization it's definitely higher level i don't know how you describe it where it's cutting you know is it lower level or higher level um than than virtualization it's definitely higher level like measured on the axis that makes sense to me right now but um but uh yeah so so you can achieve it using that
Starting point is 00:48:56 too which is which is great um and i mean i think all of this kind of comes down to is what we're talking about in this instance is infrastructure is code however however it's achieved, be it VMs or Nomad machines running Docker stuff. And so we've kind of strayed from the original point about like, how does one do deployment? How does one use virtualization? How is it, what different things are available? Yeah, they're all related though. They are, I guess, related. Yeah.
Starting point is 00:49:20 And that infrastructure as code thing is super important to be able to say like yeah it's not no one has to rack a machine no one has to um physically move any cables around when i stand up this instance here um and that instance is defined by a piece of code or a configuration file that's generated by code or just a configuration file that a human edits which you know is a fabulous way of tracking i mean we're especially software engineers we know where we stand with source control and ci and things like that so having having the machine the physical world work that way too and be able to roll back and all that kind of stuff is super cool however so it's achieved yeah all right well this is this has been a fun exposition of what on earth can we remember about how all this stuff fits together yeah i feel like we only really like touched on it like there's you could probably do a whole other hour on these topics like you're talking about absolutely yeah i mean i don't
Starting point is 00:50:12 know enough about kubernetes i mean i used borg at google which i i believe to be related in some way but i don't know either i remember them trying to pretend that it wasn't called that and they used to pretend that it was anita borg it was named after not clearly the the the evil people in in star trek right uh which yeah and i think that was because they leaked out because um they weren't laundering their uh referrers and so people were running internal services from machines and then people would like link to i know youtube not youtube videos because that would be google too but you know link to other people's websites and then it's all went through like, what is this? All the machines had like names that DNS names that uniquely refer to the job that we're running that was running on it, which is super convenient for everything.
Starting point is 00:50:53 You know, you want to hit your job and it's running a web server. Then you just go to that long name and it hits the machine and the machine then looks at the name you gave it. And then it redirects it to the correct port for that particular instance that you were running on and then off you there you are there's your there's your job running and you can look at it um but obviously if you then have a web page on there that has a hey click the cat animation you click the cat animation then you've leaked you know 12.7.borg.google.dns or whatever right right right right oops refers refer headers, man. Yeah. Yeah.
Starting point is 00:51:26 All right. We should stop talking. We should stop talking. We've got plenty of things to audit, to edit. That included. And audit. And audit. We can't let the Borg stuff leak out.
Starting point is 00:51:38 That's true. That ship has sailed. Cool. All right. We'll see you next time. Bye. out all right i'll see you next time bye you've been listening to two's compliment a programming podcast by ben rady and matt find the show transcript and notes at two's compliment.org contact us on twitter at two cpp that's at T-W-O-S-C-P Theme music by Inverse Phase

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.