Signals and Threads - What Is an Operating System? with Anil Madhavapeddy
Episode Date: November 3, 2021Anil Madhavapeddy is an academic, author, engineer, entrepreneur, and OCaml aficionado. In this episode, Anil and Ron consider the evolving role of operating systems, security on the internet, and the... pending arrival (at last!) of OCaml 5.0. They also discuss using Raspberry Pis to fight climate change; the programming inspiration found in British pubs and on Moroccan beaches; and the time Anil went to a party, got drunk, and woke up with a job working on the Mars Polar Lander.You can find the transcript for this episode on our website.Some links to topics that came up in the discussion:Ron, Anil, and Jason Hickey’s book, “Real World OCaml”Anil’s personal website and Google Scholar pageThe MirageOS library operating systemCambridge University’s OCaml LabsNASA’s Mars Polar LanderThe Xen Project, home to the hypervisorThe Tezos proof-of-stake blockchainThe Coq Proof Assistant system
Transcript
Discussion (0)
Welcome to Signals and Threads,
in-depth conversations about every layer of the tech stack from Chainstream.
I'm Ron Minsky.
It is my pleasure to introduce Anil Madhavapeti.
Anil and I have worked together for many years in lots of different contexts.
We wrote a book together.
Anil and I and Jason Hickey together wrote a book called Real World OCaml. We have spent lots of years talking about and scheming about OCaml and the
future of the language and collaborating together in many different ways, including working together
to found a lab at Cambridge University that focused on OCaml. And Anil is also a systems
researcher in his own right, an academic who's done a lot of interesting work, and also an industrial programmer who's built real systems with enormous scale
and reach.
We're going to talk about a lot of different parts of the work that Anil has done over
the years.
To start with, though, I want to focus on one particular project that you're pretty
well known for, which is Mirage.
Can you give us a capsule summary of what Mirage is?
Sure I can, and it's great to be here, Ron. The story of Mirage starts at the turn of the century.
In the early 2000s, pretty much every bit of software that ran on the internet was written
in C. And back then, we had internet worms that were just destroying and tearing through
services because there was lots of problems like buffer overflows and memory errors and
reasons why the
unreliability of all the systems code that had been written in the past was becoming really
obvious and the internet was really insecure. So there I was as a fresh graduate student in
Cambridge, and I decided that after years of doing systems programming in C, I would just have a go
and see what it was like to rewrite some common internet protocols using a modern high-level language.
And so I looked around and I looked at Java, which was obviously the big language back then.
I looked at Perl, which was heavily used for scripting purposes. But in the end, I decided
I want something that was the most Unix-like language I could find. And I ended up using
OCaml. It had fast native code compilation that just ran in Unix, could be debugged very easily.
It had a very thin layer to the operating system. I spent a great couple of years figuring out how to write really safe
applications in OCaml. So I started by rewriting the domain name service, which is how we resolve
names, human readable names like google.com to IP addresses. And I rewrote the secure shell
protocol, which is how most computers just talk to each other over remote connections. And I
rewrote all of these in pure OCaml. And I showed as part of my PhD research that you could make these not only as
high performance as the C versions, which really wasn't that well known then because there was a
perception that these high level languages would be quite slow. But then I also showed that you
could start doing some high level reasoning about them as well. You could use model checking and
early verification techniques to prove high level properties. And this is all really good fun. I
wrote loads of OCaml code and and then I published all these papers,
and then I asked myself a simple question. So I've written all of this code to rewrite network
protocols and have safe applications, but then the compiler just seemed to stop. So after all
of these beautiful abstractions and compilation processes, I got a binary at the end, and this
binary just talked to this operating system. And I might have written 100,000 lines of OCaml, but this operating system had 25 million
lines of C code, the Linux kernel. So why, after all of my hard work and perfecting this beautiful
network protocol, do I have to drag along 25 million lines of code? What value is that adding
to me when I've done so much in my high-level language? And this is where Mirage OS comes in. So Mirage OS is a system written in pure OCaml,
where not only do common network protocols and file systems and high-level things like web servers
and web stacks can all be expressed in OCaml, but the compiler just refuses to stop. We then
provide different abstractions to plug in the actual operating system as well.
And so the compiler, instead of stopping and generating a binary that you then run inside
Linux or Windows, will continue to specialize the application that it is compiling, and it will emit
a full operating system that can just boot by itself. The compiler has specialized your high
level application into an operating system that can only do one thing, the thing that is written to do. And it does this not just by looking at the source code. It also looks
at your configuration files, which are also written in OCaml. It evaluates all of those in
combination with your business logic. And then it compiles a whole thing in combination with
operating system components written in OCaml, like TCP IP stacks and low-level file systems
and network drivers and those kinds of things. And it's what's known as a unikernel. A unikernel is a highly specialized binary output.
So MirageOS started off as an experiment in my PhD 15 years ago. I've been joined by an
incredible community, initially by Thomas Rézigné and David Scott, and now by a large MirageOS core
team. And we have hundreds of protocols and file systems and pieces that can all fit together and be combined into very bespoke artisanal infrastructure. So you can design a
kernel that does exactly what you want it to do. And you don't have to drag along other people's
code unless you want to. Maybe an overly pithy summary of this is your operating system as a
library. Well, so this is an operating system, but what is an operating system? In a normal
operating system, you run a bunch of processes, as known as user land, where this is a failure domain where something goes wrong or it needs resources from the outside world. It's all of the resources in your system. It manages the hardware and it acts as a middleware to give software safe and isolated and high performance access to the
underlying hardware. So with unikernels, it uses a different approach of structuring operating
systems, one known as library operating systems. And this is one where instead of the kernel acting
as a big wrapper around all of your code, it simply is provided
as a set of libraries. So there's no different from any other library that you link to, such as
OpenSSL or some kind of graphics library, for example. The kernel is just another one of those
things. But what you sacrifice is multi-user modes, because if one application is accessing
some system libraries, it needs exclusive access to the hardware. It's quite hard to provide competing or untrusted access to different parts of your hardware stack. So library operating
systems work really well if you're trying to build a specialized application that is maximally using
the hardware at hand. If you just want to have a desktop with lots of applications running, then
you should just use conventional operating systems. It's only if you can benefit from the specialization
that you want to switch into this different mode of operating system construction.
What I like about Mirage OS as an idea is it's so weird. It's hard to know whether it's a research
project or a stunt. And it's also, I think, part of what I think of as a larger story of the
multi-decade long failure of the original idea of an operating system. Back in the day, we had this
idea of what we were really going to do is build multi-user operating systems.
Computers were really expensive, and we need to share them,
share one big computer among a bunch of people.
And so we built systems like Multics,
and then systems that took inspiration from them,
like Unix and lots of other systems along the way.
And we built all of these abstractions
that were designed to make it easy to share hardware
among multiple people and to do it safely.
And then in the last couple of decades, we have more completely and utterly given up on that project.
And things like virtualization and containers are all examples where we're like, no, no, no, that's not what an operating system is for.
Operating systems are for piling up your complicated stack
of all the different components you need to do when you throw together to build an application.
And you want to kind of add them up and freeze them in place so you can replicably build up this
weird agglomeration of stuff that you've thrown together. The original purpose of actually having
multiple users share the same operating system has basically vanished from the scene. And once you've made all those changes, the idea that instead of all of the traditional
abstractions that we needed when we were separating out different users, maybe we could do something
radically different. That's where I see something like Mirage OS showing up.
That's right. It's an interesting perspective to think that operating systems have been a failure
because what's really happened in the last 20 or 30 years is that we have invisibly added layers that provide the right level of abstractions needed
for that point in time. So for example, in the late nineties, I would spend ages building a
beautifully configured Windows machine because I knew exactly all the registry keys and all the
magic that went into it. But in the early 2000s, I worked on the Zen hypervisor and the Zen
hypervisor started off with a very simple thesis, which is it is possible
to run multiple instances of an operating system not designed to run on the same physical
hardware simultaneously and make sure it's completely isolated from the other operating
systems running on the machine, but also do so with minimal performance.
There was a serious balancing act there.
And so what we did with the Zen hypervisor was don't touch anything in the user space
because you don't want to have people rewriting all of their applications, their Oracle
databases or their SQL servers or whatever they're running. So we scooped out the guts of the kernel.
And normally the guts of the kernel in Linux is what manages the low-level hardware, the memory
management subsystem, the interrupt controller, and the things that map hardware to operating
systems. And with this simple modification, we adopted a technique called power virtualization. And what power virtualization did
was it just fooled the kernel into thinking it was running on real hardware, but we shimmed in
a little layer called a hypervisor, the Zen hypervisor, which then did all the real mapping
to real hardware. It turned out this was extraordinarily effective because we could
take entire physical operating system stacks of tens of millions of lines of code all combined and run them simultaneously in a single physical machine
and make sure that they were all utilized to their maximum potential. So if you had a bunch of
machines all being used 10% of the time, we could shove these in one place. Now this was worked out
so well because the notion of a user wasn't someone who's logging into a Windows machine,
but it became the person who's booting up an operating system. And then suddenly, the Zen hypervisor became its
own operating system, and cloud computing and all of these kind of things took off by the mid-2000s.
But they just provided a different interface. And when Mirage OS came along, it was kind of
the leftover portions of the Zen experiment. Zen also, interestingly, started off as a stunt. It
was a bet in the Castle Pub in Cambridge that Keir Fraser couldn't hack Linux over a weekend. And then Monday came
along and we had the first version of Zen and then a big team of us continued working on it.
I then spent the next few years at a startup company called Zensource building all of the
support to make it production quality so we could sell the Zen hypervisor as a product so that we
had Windows drivers and Linux drivers.
And those years were filled full of compatibility woes.
So you have to look at every single edge case and make sure it works perfectly.
And then life just got frustrating.
You just get bored of making other people's code work well in your virtualization layer.
So we had to have some way to test Zen. And so Mirage OS, the first version of it, came along because we built a minimal operating system that didn't have all of the Windows baggage and all of the Linux baggage.
And all it did was exercise the lowest levels of the Zen functionality, the device drivers,
the memory subsystem, and so on. I needed to have slightly more complicated tests.
So with Tomas Gazanier, we just linked in the OCaml runtime because we just needed to write
some high-level logic. And then that was running inside the Xen hypervisor as a minimal operating system.
So it was a few hundred kilobytes in size at most.
And then we're sending Ethernet packets.
So wouldn't it be nice if you could just hook up an OCaml library to send TCP frames instead
of low-level Ethernet?
So then I started writing a TCP IP stack in pure OCaml.
And then, you know, once you have TCP, it's a pretty small step to go
write an HTTP stack in OCaml. And then that happened. So MirageOS became this kind of organic
growth of starting from low level interfaces, figuring out what the system abstractions that
we need are, and then filling in the blanks with libraries. So it did start as a stunt. I think all
good systems projects start with a stunt because you're trying to test an experimental hypothesis.
You're trying to show that if we modify the world to be the way we want it to be
with our hypothesis, that it's worth doing. And you need that stunt to show that all of the effort
and all the hard work that goes into productizing something is actually worthwhile. So then the
hypervisor was a stunt just to show that you could just boot three Linuxes on one machine.
And then it, to this day, remains one of the industry's most popular hypervisors. And MirageOS also started as a stunt just to show you could build a credible sequence
of OCaml applications and protocols and compose them together and build something useful. MirageOS
today has tens of millions of daily active users. It's embedded in all kinds of systems that use
the libraries and the protocols in lots of different ways.
And it's invisibly servicing lots and lots of cloud infrastructure.
Yeah, I think it's hard to overstate how impactful the Zen work has been.
It's the foundation on which the entire modern internet is built, right?
The virtualization is absolutely at the core of what an enormous number of companies have
done and enormous number of different systems that have been built
have been built on top of this.
There's been a bunch of ways that MirageOS has gotten into
big and important pieces of infrastructure.
One thing I wonder about is,
are you happy with the set of abstractions
that we've started to build up around this?
In some ways, I feel like the stunt-like nature of all of this
shows a little
bit in the happenstance of what we got. A lot of the things that we've ended up building are things
that you could kind of shim in, right? We started off building a big multi-user set of operating
systems and we're like, oh, actually, the abstractions aren't good enough for supporting
multiple users truly isolated from each other. So we started doing this, in some sense, very strange thing where we said,
we know what's the right abstraction, hardware.
Like whatever the physical hardware happens to provide at the bottom layer,
that's the thing that will allow us to take our operating systems
and just port them cheaply to new places.
So let's pick hardware as the new abstraction.
And I find it hard to believe on some level that either of these are
really good choices. If you were to actually start from scratch in a way that's not just like a stunt,
but like a multi-decade long commitment to rebuild the entire world, do you have a feel for what
abstractions you'd actually pick? That's a great question. So Mirage is now 15 years old and we are
never happy with our abstractions.
I don't think there's been a single day where the core team has sat down and said,
we have the perfect set of interfaces that will survive for the next few years.
And it's worth stepping back a little bit to explain why OCaml was the right choice for Mirage OS
and why it empowers this continuous evolution of our interfaces.
In OCaml, you have the notion of modules. And this is one of the
defining features of OCaml beyond being a functional programming language. And what modules
do is that they let you define an interface. And this interface is a series of types which can then
have functions that operate over those types. And that collection is known as a module signature.
And whenever in Mirage OS we are defining some abstract hardware or even a
high-level thing, we define a module signature for this thing. And all that does is sketch out
what goes in and what goes out and how you create things of this module type. But then in OCaml,
you also have this notion of module implementations, modules themselves. And if they satisfy that
module signature, then you could apply this in a type safe way. And you can compose lots and lots of different
module types with lots and lots of different implementations. In Mirage, we have a sequence
of module types which represent the full set of our possible hardware and application level and
protocol level signatures. But then we also have hundreds and hundreds of concrete libraries which
satisfy some of those module signatures. So for example, if I have a networking module signature that just
says you can open a connection and you could read and write from it, we call this a flow in Mirage
OS, then there are several possible implementations of this flow interface. One of them is just a
normal Linux socket stack, which will compile only in Linux. And another one is a full OCaml
based implementation of TCP IP, which exports the same socket interface, but instead of delegating
the requirement to actually send the network traffic to the kernel, it actually implements
it in pure OCaml. And so in Mirage OS, whenever we're not happy with the lack of some safe code,
we go write an implementation. Whenever we're unhappy with the lack of some safe code, we go write an implementation. Whenever we're unhappy
with the evolution of some hardware interfaces or virtualization interfaces, we go rewrite our
module signatures. And all we have to do is to adjust our implementations so that they match
the new module signatures. We can do this in an incremental and evolutionary way. And so over the
years, we've learned a ton of stuff. We've seen an evolution of hardware, both in terms of performance and straight line capabilities. We've seen it change in terms of
the security model. We started with just page tables for memory. Now we have all kinds of
trusted encrypted memory enclaves and we have nested virtualization. It's become an incredibly
sophisticated interface there. And then we also have the dimensionality of distributed systems, which is just another
way of programming and abstracting across the failure domain.
So OCaml lets us split up our implementations and our signatures into two discrete halves
and then try to evolve continuously.
And that's why the Mirage project is called Mirage, because our idea was that the Mirage
project would disappear and just become the default way that people programmed systems because our signatures would just become part of
the standard community and part of the standard way that people build things. And we've been
seeing that over the last few years. And one, I think, subtle advantage of Mirage, which is not,
I think, totally obvious to someone who encounters it as an operating systems project,
is you can take a program that was built for Mirage and you can run it with an ordinary operating system.
Your point about one of the ways that you can get network services is to just use the
standard network services on the operating system of your choice.
And the other way is to have a pure OCaml implementation that goes all the way down
and run that inside of Hypervisor or maybe run it inside of on an actual bare metal server.
So there's an enormous amount of flexibility in terms of how you take these things and deploy them. inside of a hypervisor or maybe run it inside of on an actual bare metal server, right?
So there's an enormous amount of flexibility in terms of how you take these things and deploy them.
This may be not obvious if you just think about it as an operating system.
In some sense, it's both more than that and kind of less in the sense that, you know,
as you said, there's a way in which the more you look at it, the more you wonder like what
actually is here.
In some sense, the whole architecture disappears into the background.
That's right.
That's right. Well, to give you a concrete example of this, right now,
we're really worried about climate change. So we thought we would build a website that is purely solar powered. And one observation about websites, for example, the Camel Labs website is that most
people probably only look at the website when it's daytime, right? There's not much machine
access to the website. So we thought, well, what if we had a bunch of Raspberry Pis around the world
that were just solar powered? And so the process of writing this kind of thing
is, first of all, just start writing it in Unix, like a normal OCaml Unix application. And we built
the web server with my colleague, Patrick Ferris. And then at this point, we start measuring the
energy usage. And the energy usage is high because it's running Linux in the Raspberry Pi. And then
it's just taking up more budget than our solar is letting us provide. So then we wrap it in a more constrained Mirage OS interface. So one that
doesn't give you the full access to Linux and all the syscalls and only requires a small file system.
And so this is just an evolution over our existing Linux code. And then suddenly it becomes
compatible with all of the direct unikernel interfaces. And then you can replace the Raspberry
Pi with an ESP32, one of those tiny little 32-bit microcontrollers, and your
energy budget drops dramatically. But obviously your capabilities drop. But I had the luxury of
developing the Raspberry Pi environment, which is a full Linux environment. And then when I decide,
well, okay, my high-level logic is right, I can bisect it and then get rid of the lower half of
the operating system. It's all just done through iterative, normal, pure OCaml development.
It's worth noting as well that anyone can build their own custom kernel.
If you've never done any kernel hacking,
you can still use MirageOS programming pure OCaml
and have a custom kernel that you can boot.
It is really quite dramatic if you think that there's some mystique in kernel programming,
because there isn't.
It's just another very, very large program that is hard to debug.
So I think I have a pretty good sense
of what's to like about this approach.
One advantage is that you get all of the flexibility
that you get out of a powerful programming language
for building rich abstractions
in a kind of kernel environment.
You are restricted in various ways
to building abstractions that are, in some sense,
safe via the hardware support that
you have for separating kernel code and non-kernel code. And there's a bunch of constraints about how
you can build that kind of system. Here, you get to use the abstractions very freely. You can build
just what you want. And you can have a compilation process that just doesn't link in the stuff that
you're not using. So you get things that are truly minimal and, as a result, more secure.
So that all seems really exciting.
I have an enormous amount of sympathy for the idea that part of the way that you make
your world better is by extending the programming language.
I think this is a luxury that Jane Street has had over the years.
And I think that in some sense, everyone, whether they know it or not, is enormously
dependent on the fundamental tools they use, including the programming language.
And people mostly think of themselves as being in the position of victim with respect to
their programming language of choice.
They mostly use it and don't have a lot of control over how it works.
But being in a place where you can be in real conversation with the community of developers
that defines the language lets you, when you find really important ways of changing that
ecosystem, actually being able to important ways of changing that ecosystem,
actually being able to push that forward, that's a very powerful thing.
It is. And OCaml, in my mind, is a generational language. One of the properties I want from
systems I build is that they last the test of time. So it's so frustrating that a system I
built in the early 2000s, if you put it on the internet today, it would be hacked in seconds.
It would not survive for any length of time. So how do we even begin the discipline of building systems that can last for,
forget a decade, just even a year without having some kind of security holes or some kind of
terrible, terrible flaw? Now, there is one argument saying that you should build living
systems that are perpetually refreshed, but also we should have the hope of building eternal systems
that have beautiful mathematical properties and still perform useful utilitarian functions in the world.
So there's one big downside I feel like I see in all of this, which you haven't talked about yet, which is it requires you to write all of your code in OCaml.
And, you know, I really like OCaml.
You really like OCaml.
It's in some sense not a downside. But if you're trying to build software that's broadly useful and usable and can build a
big ecosystem around it, restricting down to one particular programming language can be awkward.
I mean, just to say the obvious, I would find it somewhat awkward if there's some operating system
I wanted to use and I had to use like whatever their favorite language was and I couldn't write
in my favorite language. How do you think about this trade-off? Totally. Well, first of all,
we must
use multiple languages. It's not really OCaml that is the lure for this notion of generational
computing. It's the fact that there's at the heart of it, a simple semantic that could be expressed
in a machine-specifiable form. And although we have the OCaml syntax and everything at the heart
of it, there's no formal specification about OCaml, but it's obvious that one is emerging and
one can be written in the next certainly five to 10 years.
And this means that once you have a large body of code that has semantics, it has meaning, it's possible to transform it into other languages and other future semantics.
And that kind of self-description is a really, really important part of the reason why I chose OCaml.
It's still possible to compile code I wrote in the early 2000s using the modern OCaml compiler.
So I've compiled code I wrote in the early 2000s using the modern OCaml compiler. So I've compiled code I wrote
20 years ago. In fact, it was OCaml's 25th birthday just a few months ago, and I tested out the first
program I could find. It was my CVS repository, and it compiles fine. But when you want to use
another language, then we just go through the foreign function interface, and it's just like
that process abstraction I talked about. All you have to do is spin up another process, which is another runtime, and you have to
talk to it.
And the industry has made tremendous progress in understanding how multi-language interoperability
should work, specifically through WebAssembly, for example, at the moment.
We have a substrate where modern browsers can run quite portable code.
But more importantly than the bytecode is their emerging understanding of what it means
to make function calls across languages. And all we have to do is take advantage of whatever those advances
are, and we can link multiple libraries for multiple languages together. So again, it's a
mirage, right? By using other people's advances, mirage can benefit because all we need are libraries
to build these operating systems, nothing else. Everyone loves libraries. Everyone has them.
That's the only thing we need.
And standards for how they can talk to each other.
One of the things that I think is really important
about programming language design
is building a good programming language.
It is as much about what you leave out
as about what you put in.
And having a set of abstractions
that smoothly work together,
language features that really click,
where it's really easy to use other people's code no matter which subset together, language features that really click, where it's really easy to use
other people's code, no matter which subset of the language features they tried to use,
and they'll still all hook together. It's hard to build a language that encourages that kind
of simplicity, that embodies that kind of simplicity. And if what you need is now languages
that need to kind of be fully interoperable with each other. There's a degree to which each language has to fully embrace the complexity of the other languages.
And it can get awkward fast.
I wonder if some of the simplicity that Mirage offers would get harder to maintain
in a context where you're trying to have lots and lots of different languages interacting with each other.
It definitely does because you're trying to get end-to-end guarantees.
So one of the big users of Mirage Unikernels is the Tezos proof-of-stake blockchain.
And Tezos is a complicated distributed system with lots of nodes and validators and security
keys flying around. So to build that as Unikernel involves a lot of OCaml code. It's a larger OCaml
code base, but also Rust code. There's been really interesting work on hooking together
the Rust type system, which is based around a borrowing model, so that there's a lifetime model for how long values persist,
and the OCaml model, which is based around garbage collection.
It involves dynamic collection.
But this works because typically the Rust code is at the lowest levels of the system.
It's kind of at the runtime part of the system.
So as long as you have a clean layering where you're starting from a C runtime, then you're
moving into the Rust code, which is very unopinionated from a garbage collection perspective, but very opinionated
from a lifetime perspective, and then calling it the camel code, things work out pretty well.
We've made tremendous progress in building some really complicated unikernels from a very,
very complicated distributed system, but you have to just make sure you look at your entire language
stack and your dependency stack ahead of time, make sure you understand how they interoperate
at a high level, and then dive into turning it into the unikernel. So it's definitely not a magic wand that you can
just wave and expect the build systems to just work. Another example that we use Mirage for is
in Docker, which is a container management system. And if you've ever used Docker for Mac or Docker
for Windows, then every byte of every container that you're using in your desktop is going through
a Mirage OS translation layer. Because whenever you mount a file system on the Mac, for example,
something has to translate the semantics of your Mac file system, which is APFS or HFS,
into a Linux container, which is a similar looking file system, but actually completely
different under the hood. And so what we did was we did a very special Mirage, Dave Scott's, David Sheets, and Jeremy Yallop. They figured out that if you treat one end of a Mirage
compilation target as Linux and the other end as macOS, we can build translation proxies simply by
serializing network packets into the OCaml stack and then deserializing it on the other end and
turning it into socket calls. So now the Mac transparently reconstructs traffic coming out of a container and then emits them on your Mac desktop as normal
Mac networking calls. So a lot of the tricky difficulties of network bridging and firewalls
and all of that stuff just go away. So when you run a Linux container on the Mac, it goes through
Mirage OS and it looks just like a Mac application. When we deployed that in Docker, I think our support calls went down by about 99%. So anytime this software was
deployed in the enterprise, everyone's got some crazy firewall and antivirus software and things
that break some integration of a virtualization stack with your system. Today, Docker for Mac,
you just double click on it, you install it on Mac windows, and it's like a background demon
that just runs in the system with minimal interruption. And that's the user experience we were going for. But it's only
possible because, again, we understood how to interface Go with OCaml, but made sure we did
it in exactly the right order. Then once you deploy it, it's incredibly robust in production.
But you just have to take the time to make sure you understand the lifetime of Go values,
the lifetime of OCaml values, and make sure they can interoperate correctly.
And this is another example of the flexibility of Mirage, right? It's not just an all-at-once
operating system. It needs to know everything, and then you run it on bare metal. Like here you are
integrating it as a very carefully designed shim between two operating systems running on the same
machine. That's right. So along the way, Casey Siverum-Christian joined OCaml Labs to work on
multicore parallelism. Hannah's Menherd from Robur and David
Caliper were on a beach in Morocco and they wrote us a TLS stack. And then they did this incredible
stunt where they decided they love Mirage and they'd never talked to me or any of the Mirage
team. And on this beach in Marrakesh, they wrote a complete SSL stack in the wake of the Heartbleed
attack. And then they put up what we called a Bitcoin pinata. And this Bitcoin pinata was in about 2015 or so, I think. They hid 10 Bitcoins inside a unikernel, put it on the internet,
and they left the private keys inside the unikernel. And they said to the internet,
if anyone can break into this unikernel and take those keys and trade those Bitcoin,
we can't deny the fact that this thing has been hacked and you can keep the money.
So back then, I think Bitcoin was worth not very much. But then during the course of the experiment, there was hundreds of thousands of
attacks against the system. And it got on Hacker News and all the social media networks. People
kept crashing the system by denial of servicing it. But then like a real pinata, it just bounced
back and rebooted in 20 milliseconds because that's how long a unicurl takes to reboot.
And it was back up again and no one managed to take the Bitcoin. In the end, I think we donated
it to charity because it was growing a bit much.
But it just goes to show how you can assemble all these things.
You can get a community who can then do what they want to do with it and then contribute
back to the whole.
So today, if you use a TLS stack in OCaml or indeed an HTTP stack, you're probably using
one of the Mirage libraries.
There's many, many alternatives, but for a long time, the Mirage libraries became the de facto community stacks that people used.
Right. And I would assume that Mirage in its various forms, maybe Mirage plus Zen
together, are responsible for most of the deployments of OCaml code onto people's
actual machines. How many machines do you think software that you've worked on has now
been installed on? It's a hard question to answer because we're deployed in products.
So there was an OCaml ZenStore, which is the management daemon behind ZenStore, which I believe Amazon used for many years.
So that would cover quite a lot of machines in the cloud.
I can't say exactly how many, but a lot.
And then Docker for Mac and Docker for Windows, I think, was the second most popular developer tool behind Visual Studio Code.
So it's deployed on tens of millions of desktops, for sure.
But then, of course, in the community, you have people like Facebook who have written their front end for their Messenger application in a variant of a camel known as ReasonML and compiled that to JavaScript.
So that's also, to some extent, deployed, but not deployed in the same way.
That's a good point.
That might be more desktops than all of the Docker desktops combined.
In fact, it kind of has to be. It does. That would probably address a few billion desktops. But it's a good point. That might be more desktops than all of the Docker desktops combined. In fact, it kind of has to be.
It does. That would probably address a few billion desktops. But it's a website, right? It's not an application running on the other side.
But our plans right now are even bigger. I'm working on some climate change projects where we need to deploy millions of sensors around the world.
And of course, we're using Mirage to deal with the complicated logic of carbon CO2 sensing and chemical tasting and deploying it
in RISC-V hardware that's quite embedded. So the Mirage journey is just continuing,
but on different pads and different use cases. We have in Germany, the Robor team deploying
all kinds of different unikernels for the German government. I think they have a contract to
build secure VPN tunnels and lightweight overlay networks. And all of these are unikernels that
are being deployed. So who knows how far it's going to go inside critical infrastructure on the internet in the coming years.
So a thing I've always found striking about your background is you've dug deeply into a bunch of
different areas. You've done a lot of different open source work over the years of various
different forums. You've done lots of impactful academic research, and you've been involved in
a bunch of pretty major industrial projects. Can you tell us a bit about how you got
into this whole line of work in the first place? How did you get into computers and
into systems research? Where did this journey start? Well, I'm actually not a computer scientist.
I began my training as an engineer and I actually planned to get into electrical engineering. I was
fascinated by power systems and cars and planes and so on. But then when I was studying in London, I got working on a computer game, an online MUD,
where you could program this game. And it was programmed in a really interesting language
called LPC, which is kind of a pseudo-functional object-oriented language from the late 90s.
And I went to a party. It was known as a MUD meet. And I got drunk and I woke up the next day and I'd
been offered an internship at NASA to work on the Mars Polar Lander. And this was in California, an exotic land far away from grey and dreary London.
So I ended up that summer working on the various bits of infrastructure for helping the Mars Polar
Lander land. And when it finally landed, this was the first time that we had the technology to
live stream the photographs that were coming out of Mars. I was kind of set up, I would say, as the person who set up all the infrastructure for
supporting one in three people on the internet to access a website all at once, because the
world's attention was focused on this landing in 1999. So I rapidly learned how computers worked
and stuff and operating systems and things. And I set up all of these Solaris boxes.
And the first thing that happened was those boxes got hacked. So I put them up on the internet and obviously hackers love mars.nasa.gov
as a domain to control. And so they took them over and I then looked around for more secure
alternatives. And I found this operating system called OpenBSD. And what OpenBSD is, it's an
all-in-one operating system designed with reliability and correctness in mind. And it
used a variety of
security techniques. I wiped all of these expensive Solaris boxes, installed OpenBSD,
and then managed to get the system running stably again. And then OpenBSD was open source. So I
found a few bugs because when you're deploying something as large as that, you can't not find
some bugs, right? And it turns out that I could just send in some patches and they got interested
and they accepted my patches. And this is like some massive dopamine rush because when someone takes your code and incorporates it into this operating
system used by loads of other people, it's an incredible feeling. They got more and more into
that development. And I ended up going to an OpenBSD hackathon. And these are regular semi-annual
events. And back then it was in Calgary in Canada because the US export restrictions prevented any
cryptographic code from being written in the US.
So I got to travel and go to Canada.
And then talking to Damien Miller, who's one of the core maintainers of SSH, it set me
on the path to thinking, well, how can you start rewriting systems in a more secure fashion?
And then I went back to Cambridge because the Mars Polar Lander crashed straight into
Mars at very high speed.
So all of the infrastructure we set up never actually got used.
Well, it got into CNN and lots of people looked at very high speed. So all of the infrastructure we set up never actually got used. Well, it got into CNN
and lots of people looked at our sad faces.
People got to watch the crash due to your hard work.
People got to watch the crash.
We had to wait for like two days
until we decided it had crashed.
So people stopped watching after about five minutes,
but we waited two days.
And then I had to find a new job
because I was so depressed
that all of our hard work had hit Mars at high speed.
And so I decided to go back to Cambridge and do a PhD.
And then I really started my training as a computer scientist.
So during the PhD, I did lots and lots of different projects.
But I started working in the Zen hypervisor.
I started using OCaml in functional programming more seriously in order to build the stacks
that I described earlier.
And then it became this wonderful journey where all of the code I've ever written has
pretty much been open source.
A lot of it's terrible, but it's been included in lots and lots of products.
It's really easy to move between industry and academia
and government jobs
because you're kind of taking your secret weapons with you
wherever you go.
So now it's not like I'm obsessed with the camel.
It's just the most efficient thing for me to use
to solve any given problem
because I've just deployed it in so many contexts
that if I'm doing anything for building my website
or doing a bit of data processing,
it's just what I reach for.
It's a really fun thing to work with, even after all these years.
And you've talked some about why you think OCaml is a good fit for Mirage and what you're trying to do there.
But OCaml is not a tool that systems programmers reach for early.
How did you end up coming across it in the first place? Well, in Cambridge, OCaml is now taught to first-year students because,
first of all, it's kind of a reset button because most students would come with a background of
JavaScript or Python and they'd have partial knowledge. So we wanted to find something
that's a little bit obscure, but certainly not massively in the mainstream. Secondly,
it's the easiest way to teach the foundations of computer science. So the basics of data structures
and recursion and representations and all the beautiful
logics and proofs that follow from that.
So at Cambridge, there's a long tradition of using ML style languages from standard
ML to OCaml.
So I couldn't help but be exposed to it because of the university environment.
Secondly, it was also the most practical way to do systems programming in the early 2000s.
So there weren't really any other alternatives back then.
You could go for Java, which is very heavyweight. You go for Perl, which was right once. It still is to some extent.
Python and Ruby were still very much in their fledgling phases. There weren't many other
compiled languages. So today we have this wonderful spring of programming languages,
but we didn't back then. But languages have momentum as well. So this is a generational
concept to keep going back to. It's not like we're just avoiding other languages, but when you build up such a large code base of OCaml code, it just
gets easier and easier to build and advance it every single day. So it's almost at the tipping
point now where it's easier to extend OCaml with Rust-style features than it is to rewrite all of
our code in Rust, for example, or any other language that comes along. It's easier to go
do a machine proof in using the Coq proof assistant
and extract OCaml than it is to do anything else. And so it's this reduction of friction that just
builds up over the years. I understand what you're saying, but I feel like what you're saying is also
on some level objectively false. Meaning you're saying like, well, you know, back in the nineties,
what systems programming languages were there other than OCaml? And I'm like, there was C.
And in fact, that's what everybody used, right? It is not the case that system programmers in general
in the 90s looked around and were like, oh yeah, we're definitely going to write all our systems
in OCaml. No, that's right. If I could go back in time, I would evangelize OCaml not now, but in the
late 90s, because I feel like I missed a lip of innovation there. No one had heard of OCaml back
then. And it was just this incredibly productive tool to write Unix-like code. It was just better than writing in C. And this is me
emerging out of writing lots of C code for many, many years. And indeed, writing lots of PHP code
for websites and webmail stacks and so on. But OCaml went through a period of stagnation.
Because like any open source project, if it's not invested in, if it doesn't have a large body of
programmers, then it's really hard to sustain it over the years. So around 10 years into Camel's life, which is roughly when I
was using it in about 2005, the rate of progress really stalled. And so at this point, we kind of
missed a window where we could have heavily evangelized this to more systems programmers
that didn't have the tools and the right development environment to make it easily possible.
So while we used it heavily at Zensource, it never got picked up by other developers within Zensource because of that lack of tooling.
So we talked some about your background in open source. Some of the work that you've done,
and in fact, that you and I have collaborated on over the years, has been about developing
the open source community around OCaml and helping in part, certainly not just us,
but helping in part to kind of combat some of that stagnation. And part of that was the creation of OCaml Labs. Can you tell us a little more about
where OCaml Labs came from? I can. So whenever we finished at
Zensource, it got acquired by Citrix and I left her for a few years of happily hacking on Zen
within Citrix. I went back to academia and I knew that I had this burning desire to build MirageOS
because everything was set. I had all the code from the previous startups. I had this burning desire to build Mirage OS because everything was set. I had all
the code from the previous startups. I had the problem. I had five years of funding. I had this
wonderful research fellowship to work on, but it was just me. And I knew that if I wanted to make
this as big as I wanted it to be, I needed help. And it was help on multiple fronts.
The first thing was that the OCaml development team was incredible. I remember having dinner
with Xavier Loi in about 2009, and he just said that they would maintain OCaml forever,
but they were struggling with all of the bug reports coming in
and the fact that they didn't have any dedicated staff working on it.
But he said, you know, anyone can work on it,
but why isn't anyone doing it?
I got talking to you, Ron.
And we said, well, why don't we find someone that will help us do this?
And it was really hard to find anyone
who would actually work in the core compiler,
look at bug reports,
and build out tooling because these were all the things that we needed. In the end,
it came to a hard decision. If you can't find anyone else, then perhaps I should do it myself.
And the reason I was really motivated to do this myself was because I wanted this from MirageOS.
So anything I did to improve OCaml would directly leverage and improve MirageOS,
the project I'm really passionate about. So we funded OCaml Labs in Cambridge. And one of the
beautiful things about Cambridge University is that individual staff retain their intellectual
property. It's not owned by the university. And so this meant that working in open source became
really easy because anyone we hired at the university could just write code and there
wasn't any need for any legal agreements or anything with the university. We just released
it. So I'm really, really proud that what we started with, a seed in Cambridge, has now become a diaspora of people all around the world working
in different geographies and different environments, but continue to communicate and share their code
through the open source ecosystem. And I think Cambridge as an institution deserves an enormous
amount of credit for all of this, because this thing was messy and complicated and does not fit in in an ordinary way to a kind of
simple notion of academic research. A lot of the work that needed to be done was work about
coordinating open source ecosystems and maintainership work. It's not the kind of
stuff that gets you tenure. Most institutions aren't willing to take it on. And Cambridge was,
and I think it was important to have an academic institution that was willing to do it because OCaml is, in many ways, a deeply academic language.
Its roots and much of the expertise just realistically resides in academic institutions.
There's an enormous amount of connection to various different kinds of real and legitimate research work. We saw lots of exciting things coming out
of Cambridge on that kind of research side that were secondary to this, and all of this other
real infrastructure that was created. We looked around and tried to find various
homes for OCaml Labs, and Cambridge was the place that was willing to do it. It was an enormously
important find that we found an institution that was really willing to partner with us
effectively in doing this kind of work. Another thing that strikes me about the story
you're telling is the degree to which OCaml Labs acted as a kind of effective form of glue. Like a
lot of the work you're talking about, which is important advances in the state of the art for
OCaml, they're not all things that were done at OCaml Labs, right? Merlin was created by some
INRIA undergrads, if I remember correctly, but they were later working with and supported by
OCaml Labs. OCaml Format was just done as an internal Facebook project, and then Jane Street
adopted it and made a bunch of further changes. But it was OCaml Labs that provided the glue to
kind of take it and turn it into a maintained and general purpose piece of software and figure out how to kind of share between the various different contributors.
Dune's another example.
Dune was created at Jane Street for Jane Street's kind of narrow purposes.
And now there's been a really deep collaboration between engineers at Jane Street, including Jeremy D'Amino, who wrote the first version of it and runs the team that manages it at
Jane Street and collaborates very closely with OCaml Labs.
And so both the kind of industrial side of that work and the open source side of that
work are well handled and handled by different parts of what is essentially one big team
that's working on
multiple aspects of the problem. That's right. The fundamental value that Cambridge brings is
training, mentoring, and graduation. So graduation is a really important part of Cambridge, where you
leave and you go do something else. And the same is true for INRIA and the universities in France,
where the Merlin developers came from. And I'm particularly proud of the number of people that
have learned and moved on from Cambridge to
other jobs in the ecosystem and succeeded. So Stephen Dolan and Leah White, both of whom are
on this podcast, started off their degrees in Cambridge, did their PhDs there and have moved
into James Street and many other graduates have done similar as well. And it's crucial for the
longevity of a community to have this kind of easy flow of people across jobs because obviously
people's lives change, they can't just all stay working in a university. And Cambridge was extraordinarily
flexible in figuring out how to get people in. So David Alsop, for example, who is one of the
most prolific contributors to Cora Camel, is also a countertenor singer in his spare time.
But when I say spare time, it's actually his career. So I had to convince him to come be a
developer here because he was working on Caml in his spare time while also maintaining his singing career. He successfully juggled both
of those and became an incredible contributor and an incredible singer. But explaining to
Cambridge HR exactly why I was hiring a singer to work in my research group was a challenge,
but they didn't say no. And he's still at OCaml Labs and he's still one of the prime maintainers
many years on. One of the big and long running projects that OCaml Labs has taken on and really driven
is the work towards having a multi-core garbage collector for OCaml and a multi-core capable
runtime.
This is a long-running sore point about OCaml.
You mentioned one of the limitations in Mirage is that OCaml is not multi-core capable in terms that you can't run multiple OCaml threads that share the same heap.
This has been a thing that people have talked about for a very long time, and there's been some amount of work on and some discussion about how to get there for many years.
One question I have is, why has it taken so long?
Why has this been such a big and long-running project to add multicore to the language?
A really important part of research is understanding that 90% of what we do is fail. And whenever we started adding multi-core
parallelism to Camel, we were taking an existing ecosystem, an existing semantic for the language,
and just trying to extend it with the ability to run two things at the same time instead of one
thing. And the number of assumptions that break when you do two things at the same time instead of just one thing is incredible. So our first naive attempt was in 2013. We presented our
confident plan for exactly how Multicore would go into a camel and it got okayed by Damien
Doliguez and Xavier Loire. Then a couple of years on, we just realized just how many edge cases
there were and the need for a better conceptual core for what it means to
be multi-core. So we went to a Camel consortium meeting, which was where the industrial users of
Camel a few years ago would present their needs and requirements. And we presented our work to
that team and they said, well, look, you can't add this without having a memory model to a Camel.
So without a memory model, which says, this is what happens when two threads
simultaneously access a single or camel value. Without that definition, it's really hard to
ascribe any meaning to multi-core camel, because what does the program do whenever this situation
happens? So we then had to go off for a year and figure out new theorems. And we came up with
something called LDRF, local data race freedom, which is published in PLDI, our top tier conference.
But it also crucially resulted, in addition to this nice new theorem, to a clean, well-defined semantic for a multi-core parallelism in a camel. So then we went back to the core development team
and we said, hey, here is this clean memory model semantic. It went, yeah, great. Where's the rest
of it? But remember, there's only about two or three of us working on this while juggling many
other things. So we then went off and frantically started writing the garbage collector and making
sure that we could finish off the job. The garbage collector is more difficult than a normal single
threaded one because it has to deal with multiple cores simultaneously wanting to trigger garbage
collections. And you have to make sure that irrespective of when the garbage collection
is happening, that the program is still maintaining type safety. So nothing can ever observably be violated by a garbage collection happening.
And we ended up with two separate schemes for garbage collection, and we couldn't decide
between them. We then had to write a full paper about this. We had to make sure that
we evaluated both sides. And we also had to do this against a backdrop where we could not tolerate
more than a few percent of a
performance hit for old OCaml code. So if you were building a new language, you could
just go ahead and build it and you could build the perfect parallel algorithm because you
have no compatibility to worry about. But meanwhile, we had the entire Coq proof assistant
community that said, we're not going to use multicore for a few years, but if we compile
our existing code with multicore OCaml years, but if we compile our existing code
with multicore or camel, it shouldn't get any slower. And back then we'd had maybe a 10 or 20%
performance hit. So a significant slowdown until you use multicore. And after a few years of work,
we got that down to a few percent. So it was almost indistinguishable from noise because all
of the various techniques that we put into the garbage collector and the compilation model to ensure that that happened. This was, again, real research. So it got published
in ICFP. We then had to figure out how to present this to the core development team,
get consensus, and then move it forward. I think we have been working on multicore
incrementally since OCaml 4.0.2. So OCaml 4.0.2 was where we had the first branch of OCaml for multicore. We're now
in OCaml 4.13, which has just been branched. And I think in every version since 4.0.8, we've put in
a significant chunk of work in order to get towards multicore parallelism. Most of these things are
invisible to the OCaml users. So you at James Street have been using different parts of the
multicore compiler that we have upstreamed into lots and lots of different versions of OCaml.
And we've done so in such a way that it totally respects
backwards compatibility.
Because if you don't get it just right,
then we'll end up with a split world
where the multicore OCaml compiler is a new language,
and it won't work with older existing OCaml code.
And that would be a disaster.
So the reason we're so careful in threading the needle
is that whenever OCaml 5.0 lands,
it will compile almost every bit of existing code in the last is that whenever OCaml 5.0 lands, it will compile almost
every bit of existing code in the last 25 years with a minimal performance hit. It will then allow
you to add multi-core parallelism through this domains interface. And it has one of the best
and clean memory models out of any language. So our research paper on bonding data races in space
and time showed that C++ and Java, the two kind of gold
standards for their memory models, have disastrous issues, is the best way to put it. So that's the
opening of our paper. And we show that with just no performance hit in x86 and a 2% performance
hit in ARM and PowerPC, or sorry, 0.4 on ARM and 2% on PowerPC, we could make it all work.
So that's a pretty big result. It took a lot of
theoretical computer science, a lot of experimental evaluation, and a lot of implementation. All of
these had to happen simultaneously. It wouldn't have been possible without Casey Siveramakrishnan,
who's worked with me on this project for the last six years. And we've gotten two top-tier
papers out of it. So it's not been a great ratio of coding to papers, but the end result is something
we're very, very proud of. So the story you're telling highlights a lot of the ways in which
OCaml is legitimately an academic language, and that part of the way of moving things forward
and of convincing people to accept a new feature is actually going through the trouble of writing
serious academic papers to really outline the design and explain what the novel contributions
are. And there are some novel contributions. So from a kind of more ordinary workaday systems programmer perspective,
how should someone who is used to the parallelism story in Java think about the advances in OCaml?
How, from a pragmatic point of view, is the coming OCaml multi-core runtime going to be better?
It's only going to be better because it will not have any surprises.
So whenever you use multi-core parallelism in Java,
you have to know a lot of things.
You have to know about the memory model in Java.
You have to understand the atomics
and the various interfaces they expose.
There's different levels of things exposed
in different versions of the JVM.
In OCaml, this is potentially just because
of the young age of multi-core in OCaml. We think potentially just because of the young age of multicore in OCaml,
we think we just have a cleaner model that avoids a lot of pitfalls that Java made.
Now, one of the interesting properties about programming languages
is that it's very hard to take back a semantic.
So if someone has written some code in it,
there's just a vast number of complaints if that changes,
because it can fail at runtime.
So just by waiting for this long
and observing how all the different languages have built their systems, and then doing the research to thread that needle to find the least surprising
memory model across all of the hardware deployed today, that's why I have NoCaml. So a Java
programmer should find it the most boring experience to do multi-core parallelism on
NoCaml. They'll just use high-level libraries like Domainslib that give them all of the usual
parallel programming libraries, and it'll just work. No surprises. Fast. Do you have like a pithy example of a
pitfall in multi-core Java that doesn't exist in multi-core OCaml? So there's something called a
data race. And when you have a data race, this means that two threads of parallel execution are
accessing the same memory at the same time. And at this point, the program has to decide
what the semantics are. So in C++, for example, when you have a data race, it results in undefined
behavior for the rest of the program. The program can do anything. Conventionally, demons could fly
out of your nose is the example of just what the compiler can do. In Java, you can have data races
that are bounded in time. So the fact that you change a value can mean later on in execution, because of the workings of the JVM, you can then have some kind of undefined behavior.
So it's very hard to debug because it's happening temporarily across executions of multiple threads.
In OCaml, we guarantee that the program is consistent and sequentially consistent between
data races. It's hard to explain any more without showing you fragments of code,
but conceptually, if there's a data race in OCaml code,
it will not spread in either space or time.
So in C++, if there's a data race,
it'll spread through the rest of the code base.
In Java, if there's a data race,
it'll spread through potentially multiple executions
of that bit of code in the future.
In OCaml, none of those things happen. The data race, it'll spread through potentially multiple executions of that bit of code in the future. In OCaml, none of those things happen. The data race happens, some consequence exists in that particular part of the code, but it doesn't spread to the program. So if you're
debugging it, you can spot your data race because it happens in a very constrained part of the
application. And that modularity is obviously essential for any kind of semantic reasoning
with the program, right? Because you can't be looking in your logging library for undefined behavior when you're working on a trading strategy or
something else. It's got to be in your face at the point. Yeah, it seems to me like the core
thing you're talking about is buggy code is easier to reason about. It's enormously important because
almost all code is buggy, like parts of every code base have bugs and problems. And this is why the classic
undefined behavior stance of traditional C and C++ compilers is so maddening because there's a
kind of amplification of error where you make some mistake where you step outside of the standard and
suddenly, you know, anything can happen. I've actually been seeing this happening with my son
who has a summer internship where he's off hacking out a bunch of C code.
And when you make a mistake in C code, it can be really hard to nail it down because the compiler can make all sorts of assumptions and push the mistakes into places where you totally wouldn't expect it.
It sounds like the same thing happens in the context of data races in C and C++ and to some degree in Java. And reducing that just makes
it more predictable and makes debugging easier. So I feel pretty convinced by this story.
It's quite pleasant working in Multicore OCaml when it comes to debugging things because of
this property. So are you brave enough to venture a date by which a mere mortal who installs the
latest version of OCaml will be able to run two threads in parallel that access the same heap?
Well, I can't give you a date,
but I will give you...
Well, I can give you a date.
You can do that today.
So what I did a couple of weeks ago
was to merge the multi-core OCaml working tree that we use,
which is a set of patches against the latest stable OCaml,
into the mainline OPAN repository.
So this means with one line, you can switch from OCaml 4.12.0 to OCaml 4.12.0 plus domains.
And all the work that the Multicore OCaml team has been doing has been focused around
ecosystem compatibility.
You can just start with your existing projects and you can then start adding in domain support.
And if you're really, really experimental,
we have a future-looking branch which also adds something called an effect system
on top of this patch set.
This effect system is the ability to interpret
certain external events that happen
and just deal with them through
what are known as effect handlers.
So for example, if I'm writing to a blocking network socket,
instead of having to
then use async await or LWT or monadic style concurrency, our effect system just lets another
part of the OCaml program deal with the blocking IO and then resume the thread of execution whenever
it's ready to happen again. So this is highly experimental, but it results in some of the most
pleasant and straight line OCaml code I've ever written. It reminds me of writing code in the early 2000s when we just use Pthreads and Unix for everything.
All of these different variants and levels of the Camel compiler are now available in OPAM.
So depending on how nearline features you want to test, all of the trees are available for you to try out.
And the next thing we're doing is that we're working on a Camel 5.0. And this is hopefully
going to be the release after 4.13, which contains the domains only patch set. It will expose just
two extra modules that provide you with the ability to launch multiple threads of execution.
After six years of work, it's two modules. But those two modules obviously have enormous power
because you can then use those to spin up without having to fork multiple processes and do lots of complicated serialization, multiple threads.
And then our plan gets more experimental.
5.0 is the sole focus of features that we have been approved to get into core OCaml because they've gone through extensive peer review.
Then for 5.1, our plan is to propose the runtime parts of this effect system. This lets us not only express
parallelism, which is what you get in OCaml 5.0, but concurrency directly in the language. So the
ability to interleave multiple threads of control in a very natural way. This is original research
that we just published in PLDI this year on how we made the runtime part of the effect system as
flexible as possible. And again, without breaking any compatibility
with your existing tools.
So it uses GDB
and all of the familiar debugging tools you're used to.
And then later on at 5.2,
we're going to expose that effect system
into the core OCaml language
using something known as effect handlers
and typed effect handlers.
We're doing that in close collaboration
with Jamesford Engineers as well.
So this roadmap is multiple years of work,
but the first step, OCaml 5.0,
we'll get into your hands as soon as we can.
But all the trees are in open source
and the way to speed it up is by giving it a try,
trying your applications against it
and giving us bug reports.
So that's the heart of open source
and how you get a concrete date.
Help us to help you.
Question well dodged.
By the way, just to highlight a little point, of open source and how you get a concrete date. Help us to help you. Question well dodged.
By the way, just to highlight a little point you said there, you mentioned how the domains only version of it is meant to provide the basic parallelism.
And then on top of that, you want to add some notion of concurrency.
In some sense, once you add parallelism, there's some amount of now concurrent execution.
But I guess this reminds me of the old Solaris style.
You have some number of kernel provided truly in parallel threads, and then you have some kind of micro thread notion that operates inside of there that's lighter weight.
And that's the split that's really being talked about here. is you have something like one domain that you'd run per, say, physical CPU that you have,
and then you might have tens or hundreds or tens of thousands of little micro threads that are running inside each domain, and importantly, migratable. So you can take one of these and
pick them up and move them to a different core. So that's, I think, an important part of that model.
It's a really important point. Instead of calling them micro threads, we call them fibers. So these
are really lightweight data structures.
You can have millions of these in your heap.
Resuming them on a different core
is just a matter of writing some OCaml code.
The really nice thing with effect handlers
is that your schedulers,
the things that normally the operating system
would decide to do for you,
like thread scheduling,
are written in OCaml as well.
And so this means that you can write
application-specific logic
for things that conventionally
the kernel would take care of for you.
And the kernel doesn't really know how to do things optimally.
It knows how to do things to cause the least harm.
And so by this kind of domain specialization, your applications in OCaml can get really, really fast.
Now, this should be familiar to you, right?
Because this is the future of Mirage OS. The goal of the effects system is to internalize about a decade's worth of learnings about how to build portability libraries, how to build abstractions and device
drivers. And now we're having the time of our lives rebuilding all of these things in direct
style code using the effects system. So we have a new effects stack called EIO, which is pure
direct line code. Its performance is competitive with Rust and Go and so on. I think
it's faster than Go by quite a long way, and it's competitive with Rust. And it uses all of the new
features in operating systems, IO, Uring, and Linux. It uses Grand Central Dispatch in macOS and iOS,
and it uses IO CP subsystem in Windows. And all of these things happen invisibly inside the IO
subsystem written in OCaml. But as a programmer, you just write normal straight line OCaml code
and the effects system takes care of all of that for you.
So it's a very, very exciting frontier
for what's coming in OCaml in the future.
And it makes MirageOS code even more Mirage-y
because it's just normal OCaml code that you write
and all of this stuff is being handled
for you in the background
through various effect handlers.
Well, I think that's a fantastic place to stop
as you kind of tie a little bow around connecting Mirage and the most recent work you've been doing in OCaml.
Anil, thank you so much for joining me. This has been a real pleasure.
Thanks, Rod. Fun as always.
All right. Cheers.
You'll find a complete transcript of the episode, along with links to some of the things that we discussed, including Mirage and some of Anil's other research, at signalsandthreads.com.
Thanks for joining us, and see you next time.