Two's Complement - Virtual Infrastructure
Episode Date: July 15, 2022Ben and Matt compare container technologies like Docker to virtual machines, and discuss the tradeoffs when deploying applications. Matt explains the scary things that can happen when you share a VM w...ith strangers. A visitor enters through the couch.
Transcript
Discussion (0)
I'm Matt Godbolt.
And I'm Ben Rady.
And this is Two's Compliment, a programming podcast.
Hey, Ben.
Hey, Matt.
How are you doing?
Very good.
Excellent.
Well, we don't normally refer to things that have happened in the news
because that gives us a certain flexibility in the order that we release these recordings.
But you and I were literally just talking about the fact that Broadcom has bought VMware
and we were going to talk about some level of containers versus not containers
versus virtualization versus whatever and it
seems like we should we should bring that up so yeah let's talk what do you think a good topic
right exactly it's such a deep one and you know we've got varying levels of experience in different
technologies for essentially what is how do i make sure my software works in the environment that I'm expecting it to
and I'm thinking personally from this point of view like a developer who deploys server type
applications headless applications that run on machines in the cloud or in data centers and
whatever like that but I guess actually now I'm saying that I was sort of giving that so that you
know I can't say too much about how UI stuff is developed.
But then there's a number of software packages I see for Linux these days that come as a pack file or what are those things called?
There's a bunch of different snaps, snapshots, which are essentially here's a whole operating system's worth of application wrapped into a file system and then presented
as if it's like a single thing and it's like a docker container right but but differently so
so it it's it's everywhere and i think we all want software that's easy to deploy and run
but there's a number of ways of achieving it yeah what are your thoughts uh well i mean i think it's
interesting that i think you and i have a sort of a similar perspective and that we look at those tools that way. a fixed amount of hardware that they have provisioned and paid for and have pre-purchased
the electricity for and have backup batteries for and have networking for and say, okay, well,
how do I take this sort of fixed resource and allocate it out to all of the needy,
greedy software engineers who keep telling me that they want more servers?
Well, I got to have a server for this app and and I have a server for this app, and a server for this
app. And so, you know, I think from their perspective, they might see some of these
virtualization tools as a way to, you know, manage those resources more effectively and have a little
bit more control over not only just the resources themselves in terms of memory and compute, but also the sort of blast radius if one of them goes horribly wrong, right?
Being able to wipe an image and give someone a fresh new server with a few clicks of a button
is way easier than driving down to the data center and unracking a machine that is no longer
responsive because somebody
did something terrible to it.
I don't know what you're talking about.
I have never done that.
Oops, I just fork-bombed my own machine.
So you're right, actually, that's a very valid point.
Those infrastructural things
are super important and
it's sort of a funny thing. I was talking about it
at lunch with a bunch of folks the other day and um regaling with the story from from my past and a friend of mine who used to
work at uh a big airline and those folks are still using mainframes and mainframes have always been
able to do all the things that we're now kind of starting to rediscover in that virtualization
world you know like hey you want more cpus yeah we can bolt more cpus on while the mainframe is still running hey you want to shut
down the mainframe do maintenance and bring it back up again sure what we can do is we can
teleport the mainframe's image up to a backup site nobody even notices that your connection
is now going to manchester instead of london um and your terminals keep on responding everyone's
still doing their requests and the
batch jobs are still going that meanwhile they power down the main machine fix the ram and then
you can teleport it back and those things have been around for you know half a century and yet
we're rediscovering them in terms of what i mean specifically you're mentioning things uh like
like vmware um allow you to manage the resources really fine-grained and make those kinds of like
hey we need to move from one machine to another machine i it's it's sort of miraculous that it
works as well as it does yes yes yes yeah because those mainframes were clearly designed with those
specific use cases in mind right hardware capabilities to do those things right right
they built those things from the ground up with like, okay, we're going to be able
to do this offsite backup
and we're going to make sure that it all works.
And with all these other things,
we sort of backed our way into it
because it's like,
clearly there's a need to do that.
But, you know,
the old school operating systems
and CPU architectures
and all these other things that we have,
maybe someone gave that a thought
a long time ago,
but they certainly didn't design the whole ecosystem
from the ground up to be able to do that.
And so now we're sort of in this state
where it's like that need is still there,
the desire is still there,
and it's this sort of tricky problem of,
okay, well, how do you actually do that?
And yeah, folks like VMware have got their solution for it.
There are obviously other vendors that can do it.
And of course, I mean, one should note as well
that the chip manufacturers have been slowly heading this way too,
adding more and more like hardware level virtualization things.
Because, you know, we've always been able to do these things.
It's like my hobby of writing little emulators for old machines.
Once you can fully emulate something,
of course, its state is just a bunch of numbers that you've've got you can move that around anywhere you like and then kind of carry
on somewhere else and have you know your single step through each frame of a game and then go
backwards because you could just emulate from a snapshot one fewer frame forward and keep you
know that kind of stuff so this has always been possible but it was just infeasible to do it
without actually running the same cpu as you as you're trying to virtualize and the
same hardware, but then things have come along. But I feel like we're going off base from where
I was thinking of going. I just got excited, which is what this podcast is about, right?
Yeah. No, I mean, I think these things are all kind of related. And maybe,
I don't think we should necessarily dive into this at the start,
but one place that you could maybe take this is like, this isn't just about virtual computers.
It's also about virtual networking equipment, right? Like if you look at, you know, some of
the tools that are out there, it's like, yeah, you think that this IP address is a switch, but
it's not. I mean, one only imagines what's going on in like the the aws's and the google
cloud infrastructure in terms of their physical network separation and their ability to as you say
make it look like you have your own cloud to yourself knowing that actually no those those
fibers are the same fibers that everyone else is using between all the racks. It's all magic. Right, right, yeah. But yeah, talking specifically about the part of this
that is, you know, I, as a software developer,
somebody who's, you know,
sort of building a total application,
you want to be able to deploy it.
That's really important.
You want to be able to, you know,
connect to the machine that's running it
and troubleshoot it and read the logs
and, you know and run TCP dump and
netstat and all the other wonderful tools that we talked about in some prior podcast
and still have the flexibility of the things that we were talking about in terms of making
maximal use of those resources and being able to tear it up and down and being able to build the definition of
what that system is in a configuration file rather than in a PC part picker.
Here's a checklist that Barry has to go down and make sure they all look the same.
Right.
Yeah, yeah, yeah, yeah.
Right, right.
And so there's lots of different tools to do this, but I think they all sort of serve the same needs.
So what are some of the tools
that you've actually used in anger to do this?
Well, I mean, the main one that springs to mind,
other than bespoke ones that I guess became Kubernetes,
when I was at Google,
there were some things that became
sort of like strange containery type things.
I don't know if it's exactly the same thing now,
I think, out loud.
But the one I have the most experience with is Docker. And Docker is a great solution to the,
I want to have a reproducible environment that's incrementally built with layers. And so it's
relatively efficient if you are only changing the end layers. And you can definitely have that kind
of feeling of like, well, if I have a Docker image that I can give to you
and you're going to run it,
then I am 99.69% positive
that if it worked on my machine,
it worked on your machine
because what I in fact did was ship my machine to you.
Now you're running my machine,
which is a blessing and a curse.
And I think that's the problem, right?
Is that it can be misused like anything,
like any technology.
My experiences with Docker,
so Compiler Explorer started out,
actually it didn't start out with anything.
It just started out with a shell script
running the Node.js on a bare machine.
And then very quickly, it was like,
how am I going to manage this?
So I decided to use Docker rather sens and at the time and docker served
as well for many many years um docker did not scale with the gigabytes of gigabytes you know
hundreds of gigabytes of compilers that i wanted to build into the image the images took longer and
longer every time to build we're going to take a pause there while my wife comes in through
through the couch through the back yeah through the couch i never realized that there's a door behind your couch how else do you get between
um places you know we put flu powder in and then we can go anywhere to any other couch
that's how this works that is exactly how this works it's okay the back door's jammed okay the
back doors oh no so you had to come door is jammed. Okay, the back door is jammed.
Oh, no.
So you had to come in through the couch door.
Yeah, I had to come in through the couch door.
All right, there goes my dog.
And then we're going to have to try and remember what I was saying and work something out.
Or just pretend it didn't happen and just put this in and, you know, give our listener...
I was just thinking, actually, I hope our listener isn't called Barry.
Because I always use Barry as my general dog's body person.
Oh, yeah.
I think I say Steve for some reason.
I don't know why that name.
Steve, yeah.
So we were talking about Docker.
Yeah, you're talking about using Docker in Compiler Explorer.
That's right.
So the problem with bigger and bigger images is that no matter how you cut it,
you're uploading layers upon layers
upon layers upon layers of a piece of software with more and more compilers. And it was just
getting unwieldy. And there are definitely tricks you can do with volumes and other things like that.
And we looked at them for a while. But ultimately, we backed out when we realized we needed more
security than Docker would give us. At at the time there were some relatively high privilege um exploits for breaking out of docker containers into the wider world and
we were kind of tacitly relying on docker to also be a sort of protection domain and the other thing
is that if you're running inside that container even if you um even if you don't get privilege
escalation outside that container that container is long-. So if you're like servicing somebody's request and it was a poison request
and it was able to monkey with the system,
it's now monkeyed with that running Docker container.
And so it's going to be there until we restart the machine or we start the
container.
So there were some things we didn't want properties.
We didn't that we wanted to get in terms of jailing.
And once you're in one container,
you can't have a container inside a container
and inside a container arbitrarily,
at least at the time you couldn't.
So we switched out to a different approach
where we just have tarballs
and run them on the operating system.
But it did serve a need for a long time.
And it's a frequent question we get asked is,
hey, do you publish a Docker container
of a Compiler Explorer instance
that I can just get started because people do just want to do Docker run,
blah, blah, and you get that benefit.
It just works.
We have different ways of achieving that, I think.
So anyway, that's my experience with Docker.
I also have used it at a number of places at work,
and I think it works great if you plan very carefully your Docker image layout and the layers are sensible and well managed.
Yeah, so when you say a Docker layer, what do you mean?
What's a layer?
So Docker logically is a better explanation,
into a bunch of temporary directories
and then overlays each directory, one directory over the other.
So you start with a base image,
which is maybe your entire Ubuntu distribution.
And then you're like, oh, the first thing I'm going to do
is I'm going to install these 20 apt packages that I need.
And so the next layer will be another file system
that only contains the things that change
between the base system and the system where you ran sudo apt install my hundred packages I needed
my extra packages and then the next layer might be oh and now I'm going to copy some files from
my git repo that I'm running it in into the container at a particular location and that's
another layer of the file
system that only contains those copied files. And then so on and so forth. Each layer would add in
more bits of the software and configuration. And the cool thing is, of course, is that you only
need to regenerate layers that changed. And of course, the layers that are immediately after
them. So if I change, for example, the base Ubuntu image, of course, everything depends upon that. So I'll have to rerun the commands that populated the later layers and create new layers.
But if I'm just changing my application software and I don't change my dependencies and my system dependencies,
then oftentimes it's only that last layer of a few hundred kilobytes or so that changes. And so not only is the build time faster, but the way that Docker distributes
itself is as compressed layers. And very often, of course, if you're upgrading software time and
time again, those base layers are already on the system in the cache somewhere. And the only thing
that you need to do is upload the few hundred K each time, which is fabulous fabulous so that's a really good way of of having um a sort of incremental
deployment of your of your software yeah so what happens if i have like a layer that's like
fetch the latest version of this thing from the internet well that's that is an excellent question
and uh that is one of the biggest problems with something like docker is that it's very easy
docker cache is based on the text contents of the command it's going to run so if you just say curl
get me latest version of something pipe through tar zxf or whatever to extract it then that
command will run exactly once on your machine when it populates that layer and then if you run again having
uploaded up having changed the um the contents on the website that you're curling from or like a new
version of the software is released and the url doesn't encode that in some way you know you're
getting you know like getting latest or bob.latest exactly right then you won't see that but
unfortunately anyone who later builds with your
docker container will see that change and so these things will not necessarily agree right and so
it's really important that if you are fetching external resources and it's so easy not to get
this right but if you are fetching external resources that you get like a specifically
named version of everything that you want to get for two reasons one it means that you get like a specifically named version of everything that you want to get
for two reasons one it means that you get reproducibility if someone else grabs your
docker file and just says build me this please and the second thing is that necessarily if you
want to change that image you have to edit the the url that encodes the git char or the version
number or whatever and which means that it will be rebuilt automatically but it's hard to do that right and it's hard to make sure you apply that everywhere right even things like the base
image itself you know oftentimes when you say in the docker file hey i'd like to build something
based on ubuntu 2004 that's essentially what you say you say from ubuntu colon 2004 from ubuntu
latest or something like that and those are kind of like a Git pull
of whatever someone has tagged as being the 2004 for Ubuntu.
If you really, really want to make sure
you get reproducible builds,
you need to put the char hash of that particular layer
in the get command as well
so that you know you're always going to start
with the same version.
And of course, there's a duality there, right?
It's convenient from you know from my
mindset it's great to have a totally reproducible build and that means that i can hand you a docker
file not not the contents of the docker image right that's different but if i hand you just
the text that says this is how to build my world you will get the same answer that i got every time
and that's really powerful but it's super inconvenient because
every time some little trivial fix in the base image is pushed you know a security patch or a
security fix or whatever then I have to think to go back and change the shard to be the latest one
and that kind of feel if I want to keep those things going and of course the first thing you're
going to do this is almost always what the first line after the from ubuntu is sudo not sudo because you're running as root is apt get
update and update update sorry upgrade and update right because you want to pull in pull in all of
the the things that are latest there's no kind of version for that there's no bi-temporality to that
so you're a bit stuck at that point um and that factors into where some of the problems that one has with with something like docker it's
a boon but you have to be really careful how to use it and have to understand these slightly
sharp edges and maybe most people don't care about those but i know that it's affected us
before and we we have a you know you and i have definitely got um an industry where we really want
to be able to reproduce what we did before and understand it.
Yeah.
It's also very easy to generate gigantic layers.
If you think about, if you don't design your Docker file correctly, you know.
So in the example I just gave, apt update, apt upgrade, apt install, right?
Those are like sensible commands I might type myself if I had a fresh new computer that you handed me.
Right.
The simple thing to do would be to run them as three separate layers.
And that makes a lot of sense.
But I've pulled down a whole bunch of stuff
and replaced a bunch of...
There's a load of temporary files that get pulled into the apps directory
that I probably don't need in my production image.
I've then updated a whole bunch of stuff, which has replaced a load of temporary files that get pulled into the apps directory that I probably don't need in my production image um I've then updated a whole bunch of stuff which has replaced a bunch of stuff and then I'm like maybe installing my own packages and maybe I remove some system packages
that I don't want right and so I've got three or four layers each of which is strictly additive
and then there are sometimes if you had to delete files so you might be tempted at the end of that
to go and the last thing I do is rm-rf var apt-cache.
Kill the cache. I don't want it anymore.
It's like gigabytes of all the intermediate crap
that was downloaded while I was installing my packages.
But if you put it as a separate step,
unfortunately, those already exist.
Those intermediate files exist in a layer.
That delete can't remove them from the layer.
It just marks them as being, you can't see them anymore.
It puts tombstones in there. And so your overall overall size the number of bytes you need to ship around
still contains the layer that has all of those files in it and then a separate layer that says
and by the way all those files are gone now right so you have to be really careful so you what
people end up doing is writing a long stanza of like apt-get and update
and as like one giant long single bash command.
And at the very end of that, rm-rvar apt-cache and depackage dash dash,
you know, purge caches, all the things as one thing.
So atomically, all those things happen.
And then it's just the end result that gets shipped as the layer.
Yeah, yeah.
And I've definitely seen that in Docker files,
and it's sort of this like, you know,
it just reads as gobbledygook at the start of the file,
and you sort of parse it,
and you sort of figure out what's going on there,
but it's not the sort of like clean, you know,
one instruction at a time,
maybe with a helpful comment as to why you're doing it.
You know, you see sometimes people will write shell scripts that they then copy into the image to run and then delete again afterwards,
just because then the shell script is essentially atomic from the point of view of the layers.
And it's, I mean, it could be a tooling thing.
It could be just what you'll get used to.
I don't know, but it's easy to get wrong.
And the thing is that as a developer running locally you tend not to notice these mistakes
because it's necessarily incremental you've been doing this you've been building on and building on
and building on right and then when you ship the the when you dock a push for the first time
you discover that you've got several layers of you know gigabytes each and i'm sure you've done
this as well when you've pulled someone else's docker image and you're like oh my golly what on
earth is it pulling down why is this docker image so big as a game that many have played and few have won?
It's a really painful experience sometimes.
You start cracking open the layers and trying to figure out what the heck is going on,
and it's just like, oh, jeez, why are we doing this?
Right, geez. Why are we doing this? Right, right. Yeah, yeah. And I think a lot of the time,
people reach for Docker
because it's super convenient,
everyone understands it,
and it does solve a very real need.
But I think oftentimes,
in my experience with the kind of things
that we do at least,
a table of the code that you're going to run,
maybe containing the node.js binary you want to run it with,
or maybe, you know, because we're in a luxurious position
where we own our machines.
They live in a data center.
We know which machines they're running on,
which are, you know, probably virtual machines as it happens.
So that's another layer of virtuality above all of this.
But if we know a lot of things about what version of libc is it running what you know base operating
system are we running what things can i assume are there which of course is now a dangerous game to
play which docker kind of makes you address fully but most of the time you're like well okay if i've
got libc this version i'll just pass along all my dependencies right and it's not that big you know
for native applications often a bit of a few
environment variables and suddenly now all of your dlls will be looked for inside the directory you
ship and then you just like copy them all with you and that's a bit bigger but you know we're
talking tens of megabytes of library files here right in a little tarball that you extract and
will run on the developer's machine and a remote machine. And I guess the other sort of critical part about Docker is that it requires elevated privileges,
which means there's a lot of monkeying around with which user you're running as.
And sometimes it's useful.
Sometimes you want a totally unprivileged user that's isolated from the kind of was it 12 factor type model where an application sort of
consumes only logs to standard out only reads and writes to external things through tcp that's fine
you treat it as like a black box but very often it's tempting for developers to kind of go well
it'd be really convenient if i could get to this set of files on the network or if i could write to
this log directory and so you start
passing things you start puncturing the isolation that docker gives you and then suddenly you wonder
why i know you've got 100 files that are owned by the wrong user right excuse me there's a truck
going past um but you know you run this command and then you'd like try to delete it afterwards
and it goes i'm sorry i can't delete that you know You need to be root. Wait a second, I'm not...
How are you root?
Yes, how did you write this as root?
And I think it is really an unfortunate thing
that the default behavior of Docker
is to run as root because it's really
easy to sort of fall into a trap
of
building an application that
accidentally, for really
no good reason, needs those elevated privileges.
If you had just been forced to think about it for a minute, you would have been like, oh, we don't.
I mean, the dumbest example I can think of is we're binding to port 100 instead of 2000.
There's no reason in the world why that integer matters to anyone. But if you build a whole application, it's like, yeah, there's 30 other apps that connect to port 100 because that's the port that we chose.
And not realizing that that requires elevated privileges.
Then you've just added a whole bunch of you've added a constraint completely by accident.
Right. And and running as a non-privileged user
you'll find that out right away right um and there are other things like that too and i and i feel
like it's almost like the testing thing right now on brand oh my gosh testing you say i know i haven't
talked about testing in like a podcast and a half so i know all right it's you know part of the
reason you write the test first is to make sure that the resulting solution that you come up with is testable.
If you build something and you don't think about tests and then you try to add the test later, it's really hard.
And so most people don't.
And the reason for that is you came up with a perfectly reasonable solution if you completely ignore this other constraint.
And then you try to add it in later right and so you're doing kind of the same thing when you run uh you know apps in as root in docker is you've you've
got a constraint that would be nice but you don't even think about it until it's invisible which
okay so i'm going to take the other side of that just as of in the defense of a docker style thing
i know obviously this is uh uh there's many a nuance here but one of the things that
docker gives you kind of out of the gate is deployability which is another thing that if
you don't think about right at the beginning yeah you it's hard to retrofit we've all seen
applications that you're like well this is all well and good if i can get clone and i've got
full access to the internet and then i can run uh these commands and i've got access to these things
and i can do whatever and you're like that's great on my developer machine again the loudest
truck in the world is now outside my house they're circling just circling they really no it's just
he's taunting me he's reversing it up this has been the most i'm i will try and edit some of
these things but i think if you're if you can hear this dear listener then i failed to edit the podcast very well all right i think they've gone so but where were we
um i was ranting about something you were about to defend docker it was i was defending no the
deployability is an important thing to not have to retrofit afterwards and docker kind of hands
you that straight away you're like well docker, Docker pull, Docker run. Amazing, right?
My CI is Docker build and Docker push.
And my runtime is Docker pull and Docker run.
And the cool thing is that my developers can run
as if they have the CI build
because they can Docker pull as well
and then Docker run as well.
And so it ticks tons of boxes, right?
It's so lovely, right? From that point of point of view again until you discover that half of your computer is now owned
by root and you don't actually have root privileges on it and then you're like well i'm stuck with
these files i guess yes right until you fire up the container and then uh rm them from the
container inside the container has root i mean a good friend of mine, I will not drop them in, but a good friend of mine has a one-liner that gives you actual root privileges on the machine that you're on if you have Docker available with non-sudo.
It's a convenient little thing to remember and just click it.
Oh, that's convenient.
Right, right.
If you have Docker, you basically have root.
Yes.
Even if you weren't allowed to in the first place if you and if you live if you work in one of those horrible environments where they don't let you have sudo
on your own machines which is insane but they do exist you can maybe put in a request for docker
instead and get basically the same thing let me just say that this this uh this is a personal
opinion that ben and i hold um don't want to get anyone in trouble with their security teams please
don't do anything daft with that information. But it is true.
And it's great for taunting your infrastructure and SecOps folks if you indeed need Docker for whatever.
Anyway, that's Docker.
Other containment, containment, container solutions.
I mean, containment as well.
Containment solutions.
Like from the Ghostbusters.
Yeah, I was actually thinking the same thing.
Yeah.
The light is green.
The trap is clean.
Yeah. So, I was actually thinking the same thing. Yeah. The light is green, the trap is clean. Yeah, so, I mean, VMware.
So we kicked off this whole discussion with VMware and virtual machines,
which are a very different kind of technology than Docker.
Do you think you could give us a two-minute overview of the differences
between something like VMware or VirtualBox or other sort of virtual machines
and Docker, having built many virtual machines in your life?
Well, my virtual machines have all been 8-bit,
which makes them considerably easier on some axes.
But yeah, so let's explain a little bit
about how Docker is working.
So at least Docker on Linux,
which is my only experience here.
So Linux supports namespacing.
That is the ability to make groups and resource allocations that are kind of contained and have their own namespace away from anyone else running on the system.
And now obviously you can think about a user is a sort of a namespace of vaguely.
But, you know, if you type PS as a particular user or ps aux you can see all of the
other users that are running on the system in this instance namespaces can contain off areas of the
operating system so that like the main operating system can see what's going on but if you're
inside that namespace if a process is inside that namespace it only sees things in its own namespace and namespaces can be file systems they can be users they can be um oh
cpus and that may be secrets but there's a number of things number of like um aspects of the system
which can be compartmentalized and held separate but you're still running the same operating system
and you're still doing all the things that you were doing before you're just making a new namespace and what docker effectively is doing
is making a new namespace um creating inside that namespace a bunch of links to the outside world
for things like the terminal for things like um oh yeah network is another namespace you can create
and you can make a name space you can make a bridge then that talk that talks one namespace
to another as if it was one of those network devices that we're talking about as a um and and then you're basically running like a regular process
except that if you type ps or if you to type ls you'll only see the world that the container gave
you through giving you your own namespace and it's a bit like if someone's ever looked at like ch
root jails,
which was like the precursor to this, where you could say,
hey, start a new process and pretend that the root directory,
like the slash, the top of the hierarchy, is this subfolder I just made.
And then you can never see outside of there.
And you can imagine that you're effectively in a jail.
You can't see outside of there.
And your process can run along and be isolated.
And you can see how you might build like a duplicate operating system image in there and then run it.
But it's running really on the main operating system.
And that has a really interesting side effect.
The kernel calls that you're making are going straight to the host operating system's kernel.
There is no kernel that you're running inside your Docker container. So if you're running on kernel version 5.star
and there's some whiz-bang new feature
that's in kernel version 6 and above
and you've got a Docker image that's Ubuntu 24,
whatever, that wants to use that,
it ain't going to work.
No amount of Docker magic will make new features appear
in your running kernel.
Virtualization, on the other hand,
takes this down to the hardware level
and pretending effectively like you are...
Oh, God, now the distractions are a cat hitting the microphone.
At the virtualization level,
you are pretending that you have a CPU
and networking resources and hardware
resources that don't actually exist and then a full-on kernel boots up in that world and as far
as that kernel is concerned with a few caveats it thinks that it's running on a real computer but
it's actually running on a simulation of a computer that's running on the real computer. Now it's kind of like how we're all living in a simulation.
We are all living in a simulation, which explains an awful lot.
Yes. Right.
But yeah, we're all living in some kind of the matrix and all we're doing is we're putting
another matrix in our matrix so that we can run another copy inside of that. So as far as that
virtual machine is concerned, it is a full sovereign computer in its own right and it can do
anything it likes unaware that when it says hey oh i've got a network device over here what's
really happening is that some kind of um trap is happening in the cpu when it's accessing or trying
to access that device and an operating system one layer up in the list of of matrices um goes oh wait a second and much like
when um a regular operating system misses a page and has to swap it in from disk and like the
process is put to sleep and while the the process the the image is red and then it kind of goes oh
yeah the memory is there now the same thing happens at one layer above in what's called the hypervisor
which is like the operating system running the show for all of the operating systems underneath it. And so that hypervisor can do, can arbitrate
access to the real network cards and the real physical block devices, like the hard disks that
are in the machine that you're emulating. And then when you say emulation, you think it's going to be
super slow. And in fact, you know, you could obviously write a genuine emulator,
and then you could pretend to be an ARM machine when you're running on an x86 or whatever.
What typically happens is that these are hardware accelerated.
The CPU knows, quotes, that there are layers and rungs of the hierarchy of simulation environments.
And it gives the hypervisor more privileges
than the operating system underneath.
And in fact, mostly nowadays,
the guest operating systems, as they're called,
are in cahoots with the virtualization layer.
They actually do know that they are living in a simulation.
And that allows certain things to be a lot faster.
So instead of actually having to emulate a real network card like as with this sort of two-way back and forth between the hypervisor and the
underlying uh operating system there can be some kind of agreed thing of like hey i'd like to talk
to the network card i'm just going to put all the data i would like you to look at over here and then
hey hypervisor imagine that a network you did that whatever the network card thing is there's
a certain amount of collaboration i'm making that up in full disclosure.
But what that means is that when you go to your Amazon account and say, I'd like a new computer, please.
That computer is not a real computer.
It is just a virtual computer running on someone else's infrastructure.
And you get a certain number of CPUs and a certain number of disk IOs per second and all that good stuff.
And this then comes back to the VMware thing that you were saying at the beginning this is why infrastructure folks love it because i can buy two 128 core uh terabyte ram machines and then i can
hand them out to as many developers as i'd like in like two or three or four cpu slices which i
can't even buy i can't buy a t2 c CPU computer anymore. And they get to share it, and they all have root on their machine,
and there's no way they can bust out of their virtualization environment
to get to the hypervisor.
But they can blue screen, their kernel can panic,
the whole thing can go down.
It's exactly like a normal computer,
except that really it's just one tenth of the physical machine you're
running on right right so when the annoying developer tells you that they need a server to
run their app and you ask what the app is and they're like well this is no js app that runs in
one thread you're like there's no way on the planet i'm giving you a ten thousand dollar server to run
a single threaded no js app so i'm just gonna give you this one little slice and you think it's a
server and it has its own operating system,
which means obviously there is a, you know,
your storage requirements,
both in terms of memory and in terms of disk space go up because,
you know, like there is a real honest to God Linux kernel running there.
And probably on the sibling CPU, like literally on the die, you know,
two millimeters away from you is another CPU running someone else's Linux
kernel and never the twain shall talk to each other.
Right.
Rowhammer issues and other things aside.
Yeah, don't give me an in to talk about that kind of stuff.
You know, so actually, all right, we are going to, we're going to have to now because you
poked my buttons.
So.
Rowhammered them.
Not Rowhammer, but that's definitely one for another conversation but
what um what a reasonable person might do given what i just said is say well the hypervisor
is sat there not doing very much doesn't need any cpu resources most of the time because it's
reactive to the host operating systems that are really running on the cpus right but we could
potentially say well let's give one or so cpu the hypervisor itself, and it can do some background maintenance activities.
What if it scanned through all of the physical memory of the computer and went, wait a second,
I've seen this 4K page before, right? I've got every single of my 60 guest operating systems have all loaded up variants
of the same linux operating system why the hell would i have the same 4k pages you know like
many many many 4k pages that are exactly the same because they all loaded like you know vm linux
4 5 29 whatever why don't i just point them all at the same actual physical location and then discard the copies of it,
but pretend to all of the individual guest operating systems that they have their own copy,
and then it's just copy on write.
If they try to write to it, then they get their own copy.
A bit like when you fork a process on a single operating system, the same tricks happen.
Makes perfect sense.
Now, obviously, you have to do it retroactively.
When you fork, you know that every page that you currently have is going to be shared in the child process.
But this is a sort of emergent property of once you've booted a machine up, eventually some pages will be the same on one machine as they are on another, in which case you deduplicate them.
And then you're right. You've got more free memory for the system as a whole.
And it seems like there could be nothing wrong with that until the security people come along.
Yes. And ruin everyone's fun. say ruin everyone and ruin everyone exactly exactly uh so it was shown that and
maybe i won't go into too much details for two reasons one i don't necessarily know the details
and two we've probably talked too much about this already um it was shown that if you have the same
implementation of open ssl or one of the other cryptographic libraries as a co-located virtual machine to you.
So I'm going to just go to Amazon and I'm going to ask for 100 EC2 instances. And then I'm going
to run a test to see if I can find that I'm co-located with my target. Just by coincidence,
I happen to be running on a machine that also has an SSL process somewhere in it, right? The chances
are that obviously those 4k pages will be deduplicated because it's the same.so that
we've both got. It's OpenSSL, Ubuntu, whatever version, right? Now I can start doing timing
attacks because I know my physical RAM is associated with the same physical RAM that they
have. And so if I know which code paths are taken
in their code i can poke around in my cache and sort of try and determine whether or not getting
the keys basically exactly i got this bite of the key and i got that bite of the key and i don't
have it all yet but that's close enough it took a long while to read this bit out because it must
have been an l3 but if it wasn't then i know it must be in someone's l2 somewhere and that someone
might be and all these kinds of things and you can imagine how terrifying that is from a point of view of
of security you like you've lost the isolation between the virtual machines that aren't even
meant to know that their siblings exist so that's your own fault ben
worth it we can talk about rohammer another time. Worth it, worth it. So in terms of deployment, though, I mean, you sort of alluded to that by saying that, like, as a developer, it's convenient to be able to go to your infrastructure folks and say, can I just have a server to run my little Node.js app?
Or not even talk to them and just, like, run a script that generates one for you.
And they keep tabs on it and they know who's allocated to
and they can call you up and say,
hey, you're using 35 servers.
Do you really need them?
But you can automate those things, right?
And it's really great when you do.
That's very true.
I mean, I forget, of course,
that this is what Terraform and what the like do
for me and Amazon, right?
I just say, I want another computer,
another computer appears.
It's never occurred to me that really
somewhere behind the scenes,
all this magic is going on to make that happen but you know it just does right and
yeah that puts a lot of power and responsibility but a lot of power into the developer's hand you
don't have to like overload a machine and you get the isolation that say a docker container would
give you but at a much deeper level now different problems again right you know at least in your own server if it's
running as root well it's only running as root because you made it run as root right as root
you got every choice you like yeah yeah so what do we think about that in terms of like
the trade-offs when what would what would make you choose one method over another? I mean, I tend to lean more toward, you know,
having virtual machines and, you know,
having more of like the,
I'm going to get this virtual machine.
I will probably build some very lightweight automation
to set it up.
But again, the setup of it is mostly just, you know,
kind of like you were saying, the app update, app upgrade.
Maybe install one or two system packages,
but hopefully not if I can avoid it.
And then just run all my applications as a user,
as an unprivileged user.
And every version is a new tarball that gets copied up to the computer
or maybe have some automated thing that pulls them down
from a central repository somewhere.
You've got like a deployment thing that you use, haven't you?
You've got like a, is it GitDeploy?
Oh, GitDeploy?
Yeah.
Is that open?
That is open source, right?
That is open source, yeah.
So GitDeploy is sort of my Heroku-style deployment script that I made
that will let you take any server that you have SSH access to and basically push to it as if it was a Git repository.
And as a side effect of that, if the push works, that is your code is not out of sync with everyone else that's deployed to it, it will deploy your application and start up. And so you get to sort of use the Git semantics around push and pull as your mechanism
to make sure that you don't accidentally
clobber someone else's deployment.
I see.
And so it's sort of a safer way
to be able to empower people to deploy locally
from their machines if that makes sense to do.
Now, sometimes that doesn't make sense.
It doesn't always make sense, right, yeah.
In fact, it sort of usually doesn't make sense,
but sometimes it makes a ton of sense.
And it's really nice to be able to do that in a way that is safer than just you know
scp right right but i mean often you know there are there are also places where or times when you
want to be able to push to like a development machine a development cluster and that seems like
a good thing there where i would actually want the feature is i have a code on my machine that i want to have running in an environment that i can't reproduce
myself locally right it's not ideal to be in a situation where you can't quite reproduce it
locally but sometimes you know i want to batter it with 200 machines that are going to send
queries to it and so i want to deploy my version that has my fixing or whatever. Yep. Yep.
And I mean, you know, you can take, speaking of virtualization, like you can take these things a lot further.
And one of the things that I've been playing around with on one of my projects is sort of getting rid of the idea of the production environment.
So all of the environments in this project that I'm working on are just branches.
There's the main environment for the main branch, and that's where
the DNS entry for the top level domain points to. But if you make a new branch, it will automatically
spin up a new environment, and it will marshal all the services that that environment needs,
and it will do everything that it does. And so if you want to make a change that involves
potentially making changes to the infrastructure, like, oh, I'm going to change a security group or I'm going to change, you know, the number of servers from four to five or whatever it might be.
Yeah.
You just create a new branch.
You push that branch to GitHub and the infrastructure magically appears.
And the name of that infrastructure is literally the name of the branch.
So they're tied together in that way.
And when you delete the branch, the infrastructure gets torn down.
That's super cool.
The main branch is always there.
That's sort of the quote-unquote
production environment.
But if you were to ever delete the main branch,
it would also actually tear down
the production environment.
I mean, that's probably what you want, though.
You know, it's like,
it's sort of a weird thing,
but it's like,
it's like just coupling those two things together
very tightly and saying a branch is an environment.
There's no such thing as the dev or the test or the UAT or the production.
They're just names of branches.
Right.
And that is only possible because of virtualization.
You couldn't do that any other way.
On a real machine.
Well, yeah, for all the reasons.
I mean, cost was what I was about to bring up because I'm sort of trying trying to move compiler explorer towards a system which is a tiny bit more like that where instead of the staging
environment that we do tests in being kind of like just a sub category under the production
environment it's like its own AWS account effectively and then I can do the kind of
things you're talking about like hey let's have a new um balancer. Let's try out a different way of doing everything
in the staging environment.
But for me, that's prohibitively expensive
because those resources are not free
and they're quite expensive.
Like having one load balancer is expensive enough.
And I can configure that one load balancer
to kind of say, well, if it has slash staging in the thing,
then go to this subsection,
right? And that's how it works at the moment. So there's a trade-off to be had there. And obviously,
in a world of infinite resources, it's no problem that if you create 12 different branches,
you've got 12 environments. Right, right, right. Well, one of my subtle motivations for doing this,
and again, I'm trying this on my own project, but maybe one day I'll get to do this in a more widely shared
company environment, is to directly manifest to the bean counters the cost of so many different
branches. That's amazing. It's sort of like, yeah, branches have a cost and it's hard for
you to measure that cost. It's mostly cognitive load on developers. What if we just turn that
into dollars? Actual dollars. actual dollars you could measure them and
be like why do then you have accountancy like why do we have so many branches as as one of those
folks that sends out the emails and the the nags to people saying like hey this pl has been open
for three years is there any chance of it being closed i totally i'm down with that right yeah
the cognitive load when i hit auto complete in uh in the for my branch name i'm like what the
heck is this this person left the
company two years ago why is it still here those kinds of things yeah yeah no that's that's i like
that approach i like the idea of of manifesting and i think you know obviously you've talked about
doing it in terms of virtualization there are ways and means of doing it with the uh docker
style approach as well.
And I know we're kind of getting close on the amount of time that we've got available, but I'd like to sort of suggest, you know,
there's the Kubernetes type approach.
There's Nomad, which is a HashiCorp system,
which can be used to run Docker containers.
And then there are definitely low balancer type things that can talk to those
containers.
And you could definitely have it so that every branch that you commit builds a docker image and pushes it to a tag a named tag for that branch and then
auto registers a container a job running in nomad to say hey i'm the your project hyphen your branch
name environment that has all these machines running so it can be done too you are still
paying the cost there are processes running in one operating system or one set of operating
systems that is the nomad cluster or the kubernetes cluster or whatever it's a sort of lower um i
guess it's higher level i don't know how you describe it where it's cutting you know is it
lower level or higher level um than than virtualization it's definitely higher level i don't know how you describe it where it's cutting you know is it lower level or higher level um than than virtualization it's definitely higher level like measured on
the axis that makes sense to me right now but um but uh yeah so so you can achieve it using that
too which is which is great um and i mean i think all of this kind of comes down to is what we're
talking about in this instance is infrastructure is code however however it's achieved, be it VMs or Nomad machines running Docker stuff.
And so we've kind of strayed from the original point about like, how does one do deployment?
How does one use virtualization?
How is it, what different things are available?
Yeah, they're all related though.
They are, I guess, related.
Yeah.
And that infrastructure as code thing is super important to be able to say like yeah it's not no one has to rack a machine no one has to um physically move any cables around
when i stand up this instance here um and that instance is defined by a piece of code or a
configuration file that's generated by code or just a configuration file that a human edits
which you know is a fabulous way of tracking i mean we're especially software engineers we know where we stand with source control and ci and things like that so
having having the machine the physical world work that way too and be able to roll back and all that
kind of stuff is super cool however so it's achieved yeah all right well this is this has
been a fun exposition of what on earth can we remember about how all this stuff fits together yeah i feel like we only really like touched on it like there's you could probably do a
whole other hour on these topics like you're talking about absolutely yeah i mean i don't
know enough about kubernetes i mean i used borg at google which i i believe to be related in some
way but i don't know either i remember them trying to pretend that it wasn't called that
and they used to pretend that it was anita borg it was named after not clearly the the the evil people in in star trek right uh which yeah and i think that
was because they leaked out because um they weren't laundering their uh referrers and so
people were running internal services from machines and then people would like link to
i know youtube not youtube videos because that would be google too but you know link to other
people's websites and then it's all went through like, what is this?
All the machines had like names that DNS names that uniquely refer to the job that we're running that was running on it, which is super convenient for everything.
You know, you want to hit your job and it's running a web server.
Then you just go to that long name and it hits the machine and the machine then looks at the name you gave it.
And then it redirects it to the correct port for that particular instance that
you were running on and then off you there you are there's your there's your job running and
you can look at it um but obviously if you then have a web page on there that has a hey click the
cat animation you click the cat animation then you've leaked you know 12.7.borg.google.dns or
whatever right right right right oops refers refer headers, man. Yeah.
Yeah.
All right.
We should stop talking.
We should stop talking.
We've got plenty of things to audit, to edit.
That included.
And audit.
And audit.
We can't let the Borg stuff leak out.
That's true.
That ship has sailed.
Cool.
All right. We'll see you next time.
Bye. out all right i'll see you next time bye you've been listening to two's compliment a programming podcast by ben rady and matt
find the show transcript and notes at two's compliment.org
contact us on twitter at two cpp that's at T-W-O-S-C-P
Theme music by Inverse Phase