Signals and Threads - Building a functional email server with Dominick LoBraico
Episode Date: October 28, 2020Despite a steady trickle of newcomers, email still reigns supreme as the chief communication mechanism for the Information Age. At Jane Street, it’s just as critical as anywhere, but there’s one d...ifference: the system at the heart of our email infrastructure is homegrown. This week, Ron talks to Dominick LoBraico, an engineer working on Jane Street’s technology infrastructure, about how and why we built Mailcore, an email server written and configured in OCaml. They delve into questions around how best to represent the configuration of a complex system, when you should build your own and when you shouldn’t, and the benefits of bringing a code-focused approach to solving systems problems.You can find the transcript for this episode along with links to things we discussed on our website.
Transcript
Discussion (0)
Welcome to Signals and Threads, in-depth conversations about every layer of the tech stack from Jane Street. I'm Ron Minsky.
All right, so it's my pleasure today to sit down and have a conversation with Dominic Labreco about email.
In particular, we're going to talk about a system that Dominic architected and led the development of called MailCore, which is Jane Street's own
homegrown mail server. And I think this is interesting on its own because email is an
interesting topic and the whole architecture behind it. But I think it's also a lens into
some interesting questions about software design and how you manage infrastructure,
some questions about how you make this choice of when you build your own thing and when you use
standard existing tools, and also some interesting questions about how programming languages play a role in systems design.
Hi, Ron.
Hey, Dilo.
So to get started, can you tell us a little bit about how email works?
Sure. Yeah.
So email is based on an old and venerable protocol on the internet called the Simple Mail Transfer Protocol, SMTP. And SMTP, you can kind of think of it as playing the role
that the postal service plays in delivering regular mail. It is a way for one server that
wants to deliver a message somewhere to hand that message off to another party who can get it to its
final destination, whether that is the eventual destination server itself or some intermediary
who can help you get a little bit closer. Email itself came into fruition, as we know today, in the early days of the internet.
And the protocol itself is very simple. You basically have the actual body of the message
itself, which has its own separate format and specification. And then you have a set of
instructions for expressing who that message is destined for and who it's coming from. And so one server connects to another and it says, I've got a message. It's coming from so-and-so and it's meant to be it from here on out. It can say, nope, I don't
know anything about that person. You have to find somebody else to deliver that to or reject it for
any number of other reasons, like this looks like it has a virus or you're not allowed to connect
to me or I'm not available for receiving mail right now. And one thing that always strikes me
about email is it's this kind of wondrous artifact from the early internet, which is a truly open
social network.
There's lots of things that people talk about, about could we make existing social networks
better and more open and all of that. And just email just is from its initial design and its
complete history has been this very open thing. And as you point out, the core protocols and
transports are relatively simple, although there is actually a surprising amount of complexity in the RFCs that tell you how to parse a particular email. The overall system is
pretty simple, but there's a lot of complexity in all of the different players who build systems
that actually manage and transfer email around and how they deal with the various problems that
happen like spam and people attacking systems via email and all of that. So the
foundations are relatively simple, but the emergent complexity of the system is actually pretty high.
Like with many protocols of the old internet, it was designed in a time where the world was
much simpler than it is today, especially the internet connected world. There were probably 50
institutions that had internet connections or ARPANET connections at the time. And you didn't
really have to worry that anybody was going to be spamming because barely anybody even knew what
email was in the first place. When you start and build a new thing, the early properties of the
thing that you build can often be really sticky and really matter in a way that's kind of hard to
predict. So this one early property of being open has stayed there. Email is a thing that anyone can
participate in. Organizations can kind of build their own infrastructure to connect to it. And through all the rather large
transformations that the email system has gone through, that openness remains as a core property.
This is the horrible thing about designing to build a new thing. Like when you want to design
something new, you have to make a bunch of choices. And clearly you shouldn't worry about
them that much because probably the thing you build is going to fail and isn't going to work
out. And even if it does, you're going to learn more about the problem later. And so you shouldn't worry about them that much, because probably the thing you build is going to fail and isn't going to work out. And even if it does, you're going to learn more about
the problem later. And so you shouldn't worry too much about the early decisions. But also,
some of the early decisions, you don't know which ones are going to turn out to be very hard to
change. And you'll be stuck with till the end of time. And in fact, you know, the big players in
email today, you know, obviously, Google and Gmail are a really large percentage of the email sending
and receiving on the internet. But they are still wrestling large percentage of the email sending and receiving on the internet,
but they are still wrestling with some of those early decisions and some of that openness that
was architected in as they try to figure out how they can make email more secure and how they can
protect their users and rein in some of the malicious actors on the internet. And that's
just a hard thing to do while trying to maintain the existing openness that email has, but it cuts
both ways, I guess.
That openness in the end has a lot of value.
Absolutely, yeah.
So the story here is about how you ended up building the system called MailCore.
What did email at Jane Street look like when you first ran into the problem?
So you might think that there's really not much special about the way Jane Street uses email compared to any other company, and largely that's true. I think we have a few special requirements by dint of the fact that we are in a regulated industry. So we have some
requirements around logging for compliance purposes, every message that is sent or received
by somebody at Jane Street. But other than that, our email system looks pretty similar or has looked
in the past pretty similar to the way an email system at any organization might look. And the rough summary is we have some mail gateways that sit on the outside
of our network for receiving email from foreign servers, you know, from external parties. And then
we have some mail server or set of servers inside of our network that handle all of the complicated
business logic around what to do with those messages. So in some cases, it's as simple as receive the message and deliver it on to the mailbox of a user if we are the intended recipient.
In other cases, it is apply filtering for things like spam and viruses and other things that we
might want to extract from messages before we deliver them to expansion for mailing lists.
So if you send an email to some group at Jane Street, you want to
be able to expand that group name to the actual list of recipient mailboxes to make sure that it
actually ends up in the inboxes of the recipients who it's destined for. And then this extra
compliance implication of making sure that we're logging all the right messages with all of the
right metadata. And at the time that I started, the mail infrastructure here was all based on an
open source mail server that has its own config language and is pretty widely used on the internet
at large. And we had about four or 500 lines of configuration in the most complex case, I think,
for this system to get it to do all of these different things that we wanted it to be able to
do. Great. So that sounds like a reasonable approach in terms of how to build oneself a
mail system. What problems did we run into with it? Yeah. So the biggest problem here at the end
of the day was the complexity required for configuring this system to do all of the things
that we needed it to do. So I said four or 500 lines of configuration that probably doesn't sound like a huge number. But when it's in a kind of bespoke configuration language, that's unlike
the configuration of any other system. And unlike any programming language that a developer or
engineer at Jane Street would be familiar with, the complexity of four or 500 lines in a foreign
language is pretty large and can be a little bit imposing to deal with. And in particular, we had some scary
near misses where we realized that we had done the wrong thing in terms of archiving some email for
compliance purposes that we were supposed to archive. And luckily, in each of those cases,
there were mitigating factors such that it didn't end up being a big deal. But the kind of near miss
gave us a little bit of a scare because we went and looked at the configuration and wanted to
understand how we had gotten ourselves into this position. And it was harder than it felt like it should be
to understand what had gone wrong and how to fix it. It's maybe also worth mentioning that the
problem of logging all of your messages for compliance purposes may sound easy, but it's
made more complicated by the fact that Jane Street is a company that operates in lots of different
regulatory regimes and has actually different rules for some of the different places it operates. So even the sort of seemingly simple, let's just write everything down
is more complicated than it might appear at first. That's right. Yeah. We have different
requirements in terms of what has to be written down and what kinds of metadata we need to store
and where the extra copies need to be physically located around the world and things like that,
which are reasonable sounding when you think about the human aspects of it, when you reason about it. Okay, yeah, you need a copy for this and a copy for that. But actually
implementing the rules in practice ends up being pretty complicated. So one of the things that
motivated you to try and do something new was this kind of near-miss situation of things almost going
horribly astray. Were there any other reasons that you wanted to try something different?
As I said, one aspect of it was certainly this realization that the complexity of the system had gotten to a point where we just actually were scared to make changes
to it. Another came from the fact that it required this kind of specialized knowledge. We have a team
at the time, we were much smaller than we are now. But you know, even today, we have a team
made up primarily of generalists, people who are able to work on a lot of different kinds of
problems and have a kind of general background across an area of technology and understanding the configuration for this particular open source mail server is not something that you just have as part of a general knowledge.
You know, it really required specialized understanding and background more so than the general skills required to administer an email system or understand the concepts behind email.
You really needed just to know the particular weird semantics and dark corners of this particular
language. And the idea that we needed to sort of build a team or have a team to specifically
understand and be comfortable working with this just didn't feel like a good use of our people
resources. There are a lot of other problems that we need to be solving, and we'd much rather be
able to take as general approach to them as we can.
Can you give an example of the way in which the config language was hard to reason about?
I think this is an example of a pretty common pattern that you see in a lot of systems that
are intended to be highly flexible and configurable.
They start with a relatively simple core that handles the basic functionality.
And over time, as they try to add more features to the system, they add more and more knobs that you can turn, more and more configuration parameters or elements in the configuration language to make it possible to express all of those different things you might want to be able to express.
And in this particular case, the configuration language is a bespoke domain-specific language developed just for this system. It kind of resembles in some places the kind of old school.ini format of
having like a key and then an equal sign and then a value and sections separated with kind of headers
and brackets and things like that. But then when you look a little bit closer, you realize it has
all this extra power layered on top. So in particular, it had support for these kind of advanced macros that look a little bit like
function calls, where you can call a macro with some set of arguments, and it expands to something
else. And there are these different phases of expansion of these configuration elements where
you can do this kind of metaprogramming, where you can have macros that produce macros that then get
expanded to some
resulting values. And then on top of that, the set of fields that are required in the configuration
and the interaction between those things is not made very clear and it's not really very consistent.
So for example, you might have a section that defines the way that you can route a message,
the way that you can decide where a particular incoming message should go, whether
you're going to send it to a mailbox or relay it to some other server. And you can define multiple
routers. And the semantics in terms of which router is going to get selected for a given message are
not made explicit by the configuration language. And there are a bunch of other examples like this,
where there are some set of elements that you define and the semantics for how the
system chooses which of those to apply in a given case are not explicit and clear from the
configuration superficially speaking. You just have to know. You have to go and read the documentation
and understand how it is that these things interact with each other.
Right. And the rules for picking which particular rule fires in a particular case,
I assume those rules are not simple themselves.
They're not simple. And in some cases, for good reason. I mean, the system,
it's worth saying, is highly, highly flexible. And it is the case that it could do all of the
things that we wanted it to do at the time. But ultimately, the way in which you needed to kind
of contort yourself to understand how it was going to do that and how to fit those different pieces
together required a expert level knowledge of the semantics of the particular system.
So you had a clear problem in front of you. What approach did you decide to follow to address it?
Yeah, so ultimately what we decided to do was the ostensibly crazy sounding thing of writing our own
email server. In particular, we wrote a new email server in OCaml, the functional programming
language that we use here at Jane Street.
And crucially, and maybe the most interesting part of this is that the system was also configured
in OCaml.
The real problem that we had come to here was we were happy with the core functionality
of the old system that we were using, but the configuration language was what we felt
like was really limiting us.
And we came to this fundamental realization that ultimately, the role of an
email server, you can think of as a function, you can think of it as kind of a black box that
implements a function that takes a message and outputs one or more resulting messages. And that
black box is responsible for making all of the decisions about how to transform those messages
and how to route those messages to further servers or to inboxes. And at the end of the day, you can kind of encapsulate everything
in a function that looks roughly like that. And OCaml, like I said, is a functional language,
as you know, and it really lends itself to writing functions in this way, composable units
that you can kind of stitch together to implement some bit of functionality that ultimately takes some inputs and generates some outputs without any side effects.
And that was what we realized we needed. And so we started down that path.
That simple pivot in the design, it lets you bypass all of this complexity of this custom
language. And you just get to pick a really well thought through, well engineered abstraction in
the middle of OCaml, which is the function, and use the ordinary tools for software composition that you have there for building the abstractions
that you want. And then this kind of gets you out of this problem of having to think about
a weird, complicated special case that comes up for mail and for nothing else.
Exactly. We already had OCaml developers and we already had a lot of people who understood the
semantics of OCaml and the way in which the various language features might interact with
each other. So we didn't now have to go out and find a bunch of people who understood this
esoteric configuration language. We could just find people who knew OCaml. So one thing that
strikes me about the story is in some sense, the story sounds very familiar, which is the thing
that you're describing about this mail server configuration language actually sounds an enormous amount like the story around,
say, build systems, make. Make has a relatively simple core domain-specific language, which
instead of talking about things you do with mail, it talks about rules for building things with
dependencies and targets and all of that. And that language is indeed insufficient for doing
big and complicated things. So people have built complicated macro systems.
In fact, there's a macro system inside of make so that you can write make rules that generate
make rules that generate make rules. And that's kind of horrible. And no one's super happy about
that as a way of doing complex builds. But there's another way that people sometimes use of getting
out of this problem, which is not always create their own build system, although plenty of people do that, including us.
Us twice.
Embarrassingly.
That's right.
But another approach is what you might call
the config gen approach, which is to say,
okay, there's a simple core configuration language,
and there's a bunch of complicated stuff on top,
which is about increasing the generality of language.
Let's forget all of that terrible stuff
and then write code in another language
where we have
better abstractions and better tools and have it just generate things in this kind of simple core
calculus that's exposed by the underlying configgen language and then we get the best of both worlds
we get to write our configurations in a nice high-level language that we understand well and
isn't a special purpose skill and we get to use the core engine that has been built and maintained by
other people, then we don't have to reimplement it. So why wasn't that the path that you chose
with MailCore? I think there's three reasons. I think two are good reasons and one is a bad
reason. I'll start with the good reasons. The first is we really at the time were not happy
with the primitives that the system that we were using provided to us. So the configuration language was complicated even in its simplest form. And it's not like we had some nice
primitives that we could work with where we just needed to generate those and we could do anything
with those and everything else was built on top of it. We would have had to generate complex macros
and some of the config elements I was talking about before. And we didn't feel like we would
be saving ourselves very much by generating those versus writing them by hand. In fact, we still would have needed to
understand it just as well. It's not like we could have limited our understanding to a subset of the
language and just implemented everything we needed using that. That was the first reason.
The second reason is that we did want some runtime dynamism. We did want the ability,
in some cases, to actually change behavior based on
other things out in the environment, other things out in the world. And the configuration that we
would have had to generate to do that, we would have been back in the exact same position that
we were in before. So we ended up feeling like it would have been better, we'd be happier implementing
those more dynamic features in a language that we are much more familiar with and more comfortable
with, rather than trying to implement those via config generation into some lower level language.
The third reason and the kind of worst reason, like I said, is ultimately at the time, I think
configuration was a much less popular, much less widely used technique at Jane Street. And I think
we probably didn't consider it as seriously as we should have at the time because the tooling and the kind of prior art and other examples of it internally
just wasn't widespread enough for it to be top of mind as a possible solution.
So here's another alternative idea of how you might've gotten yourself out of the problem.
It sounds like if you looked at the config language and said, wow, this is incredibly hard
to reason about. It's hard to understand. Let's move to a different language. There's another way you could respond to the problem of, wow, this thing is really hard to reason about, it's hard to understand, let's move to a different language. There's
another way you could respond to the problem of, wow, this thing is really hard to understand,
which is you could have approached it by trying to test it much better. You say from the outside,
step back, what does a mail system look like? A mail server looks like a big bundle of functions,
or maybe one big function that takes in an email and decides what emails need to be emitted out of
the other end. You can imagine taking that view
as an approach to testing it,
which is that you could build some framework
around the system
and you could make a bunch of assertions
where it's like, oh yeah, if we put this email in,
we expect these emails to go out.
And that's another way to build confidence
that the system behaves in the way that you expect.
Even if the underlying config language
is kind of a disaster,
you can do a nice job of the testing framework
on the outside to not completely nail down, you can do a nice job of the testing framework on the outside
to not completely nail down, but to get yourself a lot of confidence about the way in which the
system behaves. We did consider that at the time. And I think the reason that we didn't feel like
that was sufficient was primarily that while we could have tested the full end-to-end system that
way, the units of configuration are not composable enough for
us to be able to test smaller subsets of it. And so you might be able to say, yep, this didn't do
what I expected it to do. This broke. But that doesn't necessarily help you figure out why it
broke or what it was that changed, especially in the face of these confusing semantics that I
talked about before. And then beyond that, we would really have been fighting kind of an uphill
battle in the sense that this is not a software system that was designed to be testable in this way or a configuration language that was
designed to be testable in this way. And so we ultimately would have had to build a lot of our
own tools and a whole harness to run this system within to be able to even get there and then get
these suboptimal results. There's this general problem. If the basic system isn't composable,
that's a problem that's hard to get around. One thing we did consider is moving to a different open source system or just another
mail server implementation. This isn't the only one in the world. There are others. And the reason
we ended up ruling that out is we went and looked around and sort of looked at the most common
other mail servers out there. And we saw basically two variants, like two different potential alternatives that we could have looked at. One was a class of very popular and widely
used systems that look pretty similar to the system that we were already using in terms of
how they were configured and the kind of complexity and sort of system specific knowledge required
to work with them. It didn't feel like there was enough justification to migrate to some other
system just to understand a whole new set of semantics
and a whole new set of complexities. And then the other flavor of system that we came across
was a much newer, much less widely used, much less popular in the world at large set of systems that
were implemented in ways similar to how we eventually architected MailCore. A small core
implemented in some language and a very flexible configuration
based on a common programming language, something like Python or Lua or something like that.
And there are a handful of those around.
I think that the two reasons that we didn't go down that path, one is none of them were
widely used and baked in enough for us to feel confident that they were the right choice.
You know, there wasn't kind of an obvious one that was a front runner that we could just say, ah, yes,
everyone's using that. It must be good. It must be well tested and used in production, so to speak.
And the other reason is if we were going to switch to a configuration language that was
an actual programming language, we would be much happier using the language that we use for almost
everything else where we have great tooling and a lot of experienced engineers around who are
already familiar with the language.
It just didn't seem like switching to Python was going to be a net win for us in the long term.
You would have had to have gotten a lot of benefit from the engineering that had gone into the other system to compensate for the fact that you have to switch languages.
There's the language and the tooling, which is a big deal.
Exactly.
Okay. So you had an architecture in mind, you had an approach to take.
How did it go? What
were the problems as you ran down this path? Initially, things moved really quickly and
went really well. We were able to implement core SMTP protocol pretty quickly. Like I said,
it's a relatively simple protocol. So that went relatively smoothly. And then we started down the
path of writing the configuration, writing this pile of OCaml that was meant to replicate the functionality
that we had in the old system. And this was kind of an interesting experience because we found
pretty quickly cases where the old system had either non-deterministic behavior or was just
doing the wrong thing in some case that we hadn't noticed in production or hadn't really bitten us
yet, but could have. And sort of the exercise of reverse engineering many years of configuration changes
and people slapping things in to fix issues or to add functionality and trying to figure out what
the intent behind those changes was so that we could then reproduce the intent in a new system.
And I think we eventually got to the point where we felt pretty confident that we had addressed
most of the existing functionality. But then we were faced with a new problem,
which is how do you build enough confidence
in a completely new system
that's never been run in production anywhere before
that has a completely rewritten configuration
enough to want to move
the entire firm's communications over to it?
It's not something you want to do overnight.
It's maybe worth highlighting,
email is absolutely critical to Jane Street,
sometimes in a way that's incredibly important for short periods of time. Like if you turn off traders email,
there are things that go wrong right quick and a trading business needs to be able to respond to
information quickly. So problems in the communication systems are incredibly critical.
And we're a global team, you know, we're spread across three continents and a lot of the way that we make sure that we're keeping things consistent and that
we're keeping in touch between regions is email. And so it's not like you can say, oh, well, we'll
do it overnight. The Hong Kong office isn't going to be happy about that. And similarly, you know,
you don't necessarily want to just do a big bang migration over a weekend and hope that Monday
morning goes smoothly. It's something that's kind of fraught with peril. So we started thinking about how to build this confidence, you know, what we could do to test the new system enough that we
would feel ready to actually make that flip. And we came back to something that we talked about a
little bit earlier in this conversation, which is this idea of testing the end-to-end behavior
of the system and kind of demonstrating that it was doing what you expected it to do in
all cases. But we didn't really have a reference implementation that we could use in OCaml.
We weren't sure how to produce something like that. And so we looked back at our existing
configuration and we had this thing that had been working for years or at least working
enough that we thought it was working. And so we thought about how we could leverage that. And what we ended up doing is setting up basically what we called a shadow instance of our new OCaml based email
server and mail core and running it in parallel with the existing system. And so for each message
that came in to our walls, we would fork off a second copy of the message and send it to the
new system in addition to sending the original message to the old system. And then we set up basically some endpoints sitting on the
other end, on the output side of the old system and of our new system to just keep track of the
output that was generated by each, you know, what messages got generated, what the transformations
that have been applied were, where the messages were going to be directed. And then we just diffed
those. We just set up essentially a streaming diff of all the messages coming through
both. And we made a lot of noise to ourselves each time we saw a case where the new system and the
old system didn't behave the same way. And we found like five or 10 different cases where there
was just like entire classes of mail where we were doing like a slightly wrong thing. And in a bunch
of those cases, I think the majority of those cases, it was actually the old system that was doing the
wrong thing and not the new system that was doing the wrong thing. But it still made us feel sort of
warm and fuzzy to know that for the vast majority of email, the behavior was the same. And so we ran
that for a long time, for months. And then once we'd built up enough confidence, we started cutting
users over one by one. We started basically at that stage where we were forking off a copy of the message.
We sort of added some logic to decide which primary server a given user's mail was supposed
to be going through and sent our own mail through the new system for a little while
and kind of ramped it up that way until all mail was going through the new system.
One of the things that strikes me about this story is the thing that, you know,
Yee Standard Software Engineer thinks about as the problem to be solved is a fairly small part
of the story that you're talking about solving. The writing of the software that actually does
the thing, that's like a little bit of work. And then there's a significant chunk of work,
which is writing the config, which is again, a programming task. And then there's a bunch
of just careful operational thinking about how the overall system works.
And writing more software to set up this harness and the monitoring and the kind of
diffing and all of that stuff as well. The other notable thing to me is a big part of the work here
was essentially wrestling knowledge out of the old system into the new system. Whereas when I came to
Jane Street, a naive young person out of grad school and thought about writing software, I thought, oh, software is where like someone has an idea of a thing
that they want to happen.
And then you write software that makes that thing happen.
It's like, no, a lot of the time software is replacing some old thing and there's no
idea about what should happen or rather no human understands specifically what needs
to be done.
There's just some old system that encodes all of this knowledge in it
in a way that maybe no individual human ever knew all of it, but a bunch of people over time slowly
added knowledge to this weirdly encoded knowledge base. And then you as a software engineer had to
figure out how to wrestle it out of that. I remember running into this many years ago with
us replacing our early version of our order engines, which were the systems that connected
to other brokers and exchanges and routed our orders there. And I got to this problem after
working on trading systems. And trading systems was like a really smart person had written a
thoughtful spec about how this thing was supposed to behave. I was like, oh yeah, okay, I can write
to the spec. That was relatively easy. Whereas we had some order engine, which had buried inside of
it knowledge about how Bear Stearns' internal infrastructure worked, which is like, again, not anything that anybody internally really knew explicitly.
And it took a long time to claw that knowledge out.
And it sounds like you ran into more or less the same problem here.
Absolutely.
One of the problems I think one always runs into with the decision of whether should we use some external thing or should we build something on our own, is the question of like, how deeply do you
mis-underestimate the size of the problem? How did that part go? How hard did you think it was
going to be and how hard did it turn out to be? If I'm completely honest, I don't really remember
what our estimate was at the time. This is now probably five or six years ago when we originally
started on this effort. And I don't remember what we thought would happen. I think it definitely
took longer than we expected because I think that's a
rule basically for almost everything that I've ever been involved in. But I think maybe the
interesting point to highlight is we probably were closer than we expected to be in terms of
the implementation of the core system. But the implementation of the actual configuration and
the migration to the new system, I think,
exceeded the estimate that we would have made at the time. I think it just took a lot longer
to build that confidence and to get to the point where we really did feel ready to flip to the new
system. I think it took probably on the order of a year total from start to finish or start to
some version of finished. And then, of course, it's been a long tail of improvements and changes and
extensions since then. Right. I guess one of the funny questions there is what's the alternative?
I guess one alternative was not doing anything new and just kind of suffering with the system
as it was and kind of incrementally moving along. And there, I think that time estimate
matters a lot. But if the alternative is switching to some other system that you hope will be better,
it sounds like most of the work that you had to do, the stuff that took a long time, was stuff you would have had to do anyway.
I think that's right.
Yeah.
Might have taken longer the other way. Yeah.
And it would have been harder to find people with the right expertise and knowledge to work on it as well in some sense.
I think if we had switched to some other system with some other arcane configuration language, we would have had to learn all about that first and then start down those other paths.
And at least in this case, we could say, okay, you know, you and you and you, you know, some
OCaml engineers already working at Jane Street. Here's a new domain to apply your existing
knowledge and sort of expertise to. So once the project landed and we started using as our primary
mail system, what were the benefits that we got as an organization from this work? How is Jane Street's setup now better for all of this change?
There are a lot of reasons. I think the very first one is maybe obvious from the way I described the
migration, but we now had all this infrastructure around making changes. First of all, now we could
implement tests. Now we could implement sort of our normal inline tests to use for units of OCaml
code and the configuration was composable,
so we could reuse bits of it across different instances of our email server internally and
across different use cases where we needed to run separate mail servers for some reason or another.
But more importantly, we now had this system that we could use to gain confidence in changes that
were going to be far reaching in the environment. So we could run a new version of MailCore and an
old version of MailCore next to each other. And we could diff their behavior
in the same way that we had diffed the old system versus MailCore. And we use that to great effect,
we still use that. We also as part of this implemented a nice OCaml library for working
with SMTP, both as a server and as a client, because obviously, we needed to implement this
core functionality as part of the construction of this system. And we found a bunch more use cases for that internally, you know, other cases
where it's useful to be able to stand up a small email server and take some automated action or
write down a copy of a message or something like that. The biggest and most important impact that
the system had, though, in the end was it really lowered the barrier to
making changes to the system. So for a long time, we had kind of trained ourselves to not make
changes, to not make improvements, to not really touch it when dealing with the old system, because
we were just ultimately scared that we were going to break something and that we didn't have good
tools to confirm or to give us confidence that we weren't breaking something. But with MailCore, we now had this system that looked very familiar.
It looked like just about any other software system at Jane Street.
It lived in our normal code-reviewed repository and was built with our normal build tools
and had all of our normal OCaml-specific tooling and functionality.
And that means that the set of people who felt like they could propose or make changes went way up. So, you know, if somebody in the cybersecurity team here wanted to implement a new kind of scanner for a particular kind of malicious attachment or something like that, they could just easily go and write a feature, write some OCaml code to integrate that scanner or to implement that check. And it didn't require, you know, finding the one
person wearing a cape and pointed hat that happened to know the particular dark corners of the old
systems configuration language. It just really, you know, required the same kind of knowledge
that we expect any of our software engineers to have. Yeah. And the cybersecurity example is not
a random one in the sense that I think one of the big wins from all of this change to our mail, which I think wasn't really contemplated as
much when we made the decision to start it on MailCore, was that it gave a lot more power
to people thinking about security to put all sorts of remediations in place.
It's hard to overstate how important email is as an attack vector and how much work you
have to do to protect yourself.
And the ability to just widen
out the set of people who could make those changes and to be able to kind of accelerate
the level of work there, I think had a very powerful effect on improving our cybersecurity
protections.
It absolutely did.
Yeah.
And I think that is one example, like you said, but there are other cases too, where
not only from a security perspective, but from a kind of functionality and sort of productivity
perspective, we were able to make functionality and sort of productivity perspective,
we were able to make changes that we wouldn't have even contemplated in the original system.
So, you know, if we wanted to, for example, do things like add support for new kinds of mailing
lists, you know, mailing lists that had different behavior or that sent messages to somewhere else
besides somebody's inbox, because we wanted to get something out of
email or something like that, or a mailing list that we wanted to kind of note was deprecated so
that it would kind of alert the sender that we're not using this mailing list anymore. These are
like little things, little enhancements that smooth over paper cuts, but the flexibility to
be able to throw in a 50-line feature to implement something like that just opened our eyes to the
set of customizations that we could make to the way that we work with email. Yeah. And simplifying
and improving and removing paper cuts in the primary communication mechanism of a 1300 person
company is a surprisingly powerful thing, right? When you step back and think about it. So the
individual things seem small, but the values of the organization is quite large. And I think people are actually just kind of on a person-to-person basis,
incredibly grateful for the work that's gone into email because it's so much better than it used to
be in a bunch of ways that I think really affect the quality of people's lives here.
A great example of something that at the time seemed small, but that has been really widely
used and that a lot of people have gotten a lot
of value out of is we implemented something internally that we call a relay list, which is
like a regular mailing list. So a mailing list is normally we think of it as sort of an address that
contains some set of members. And when you send mail to that address, it goes to each of the
members of the list. So we might have a distribution list that's for everybody at Jane Street so that we can send out announcements from wide. Well,
a relay list is a special kind of list where instead of including human user email addresses
as the members, it includes hosts and ports. So some internal host name and a port on that machine.
And what happens is when you send an email to a relay list,
MailCore actually relays that message on to a mail server listening on that host and port.
You pair that with a small library for easily standing up a little mail server. And we've now
given the power to automate various workflows related to email to everybody around the firm
without requiring them to have any special privileges or any access to actually make
changes to this core super critical piece of infrastructure. We've kind of
federated that ability out to everyone and people have made use of it. They've implemented little
enhancements to their own workflows for their specific teams around support rotations and
little tools for improving their monitoring and all sorts of things like that, that they just
wouldn't have been able to do or that we maybe wouldn't have wanted them to do if it was going to be in the core system that we're using to handle
everyone's important email. And part of, I guess, what you're doing here is, again, leveraging the
openness of the email architecture, where you can just have routing between different hosts that are
implemented differently and are doing different things. But of course, in this case, they're not
implemented completely differently. We get to reuse the same libraries that you built for MailCore.
Those now get to be leveraged in all sorts of places. Exactly. So one of the things you were
pointing out there is the fact that you were now able, in writing the configuration, to leverage
the standard software tools that we used for building all sorts of software. Can you say a
bit more about how that played out and what were the valuable pieces of those software-oriented
workflows? I think this is a pattern that you see a lot around Jane Street.
We're a place that really highly values automation.
And so we'd much rather take a problem that maybe is traditionally viewed
as an administration or an operational problem
and turn it into a software problem if we can,
because we do get to benefit from the kind of work that's being done
around the firm to
improve our ability to work with software. And what do I mean when I say that? I think there
are a lot of very concrete, specific technical things that I mean. So things like editor
integration, you know, the syntax highlighting, the integration with the build system, tools that
help us view the type definition for a given value, tools that help us jump to the definition
of some bit of code so that we can kind of move around the source code repository, tools for
writing automated tests and for kind of demonstrating that the behavior of some
system hasn't changed over time, all different kinds of things like that. But the other thing
that I think is important about taking this kind of software approach to what are traditionally
viewed as more systems-y or more administration problems is, I think, kind of a cultural one. I think people
treat code differently from the way they treat other things. Like there's some switch in our
brains where if we're messing with a config file, then we're much more willing to copy and paste
some stanza or to, you know, hack something together and just throw it into a repo without
a good commit message, or maybe to not even put it in a repo in the first place. Whereas when
we're dealing with code, the expectations just change. There's this general shared understanding
that code should be reviewed and code should be tested. And there should be a description for why
you're making a particular change. And we work with it in a different way. And we hold ourselves
to a different standard when working with code.
And we refactor code.
How often do you refactor configuration?
And I think ultimately the thing that we were most excited about here was being able to leverage that kind of cultural shift.
Just this inclination to work with this pile of stuff in a different way, even if it was, you know, for a system where normally it wouldn't be handled that way.
So part of it is about the tools of software and part of it is about the culture of software.
That's right.
So why do you think the culture of software and the culture of configuration are as different
as they are? Sort of step back, thinking about it from an abstract perspective, it doesn't
feel very different. Configuration languages are languages, essentially very restrictive
programming languages.
Why should they develop such different cultures? Yeah, it's a good question. I'm not really sure.
I can speculate a little bit. I think one potential reason is that the tools are just not as available to you. So sort of the tools breed the culture in some sense with software.
The fact that you have all of these nice tools for writing tests and for doing code review and for working with code definitely encourage you to do good things.
You know, it's much easier to refactor some code if you have good tools for helping you refactor it.
And with configuration, you often don't have those things because in many cases it's a bespoke language for a particular system.
And it's just not worth the effort of going and building all of that tooling specifically for the system.
I think that's probably a big part of it. I think another part of it is that in many cases,
we store configurations separate from where we store source code. A common pattern is you build
some core functionality, some basic system that just handles the most common kernel of operations
that you need to handle. And then we have many instances of that system, each with their own configuration. And the result of that is you end up with configuration
kind of strewn all about managed by different teams, maybe if the system is being run by
different groups or something like that. And in general, you don't get the same kind of consistency
that you might get out of the way that we would approach making changes to the core functionality.
And I think one of the other big takeaways with with Melcore was we actually just store the configuration right next
to the core functionality, the configuration lives in the same repo right next to all of the other
code. And when we roll it out, we deploy everything all as one big bundle. And we don't have this
problem of like, oh, well, the configuration lives over here. And the code for the implementation of
the core functionality lives over here. And we get some tooling that works well over there and some tooling that works well over there. And it's,
you know, it's kind of annoying to interrupt between them or something like that. And I think
that that plays a role in a lot of cases as well. At least the second problem you described, the one
about where you store the config versus where you store the code, that at least is a thing that,
you know, thinking is enough to make it so. Like if you just have different ideas about how you
should store a configuration, you can adopt that one.
Where your previous point about the culture
depends on the tools being there,
that's a much harder problem to fix.
And just even from Jane Street's own history,
I think if Jane Street is having a very good
and well-developed culture around testing,
but it didn't always, right?
The tools used to be for testing used to be much worse
and the practices around testing were much worse.
And I think the thing you described is exactly right, that the culture was able to be much worse. And the practices around testing were much worse. And I think the thing you described is exactly right, that the culture was able to be established only in concert with
building the tools. Like people decided testing was important. We spent more time doing it.
People got frustrated about how hard it was. They spent more time building tools to make it easier.
And then when it got to be really easy, which is kind of how I think of it now,
that culture gets really widely spread. You mentioned this refactoring thing,
which is another thing that struck me,
where one of the practices I think is very common in code
is you said talk about refactoring configs.
We have a fairly strong approach
of trying to avoid repeating things in code
because if you cut and paste things,
it's a super easy way to introduce bugs.
Like, you know, you cut and paste it
and then make just the changes you need to make it right. But it's so easy to miss something.
But OCaml has incredibly good, very lightweight tools for essentially making very simple templates,
typically in the form of functions, so that you can just figure out what are the parts that really
need to differ and avoid any excess duplication. And it doesn't even make your code necessarily shorter,
but it does very often make it cleaner and less likely to be buggy.
And the tendency to do that depends critically
on having a system in which you operate
that's friendly to that kind of refactoring.
And if you're in like some random config language
that was never really designed as a programming language,
which has no kind of core principles
on how it's organized,
that stuff is just not going to go so well that's right yeah and i think this kind of speaks to the
specific uh functionality of ocaml or the specific um uh capabilities of ocaml that make it such a
good language for a wide array of problems but but you know including this one and i think you
highlighted one another that i I think is pretty important
for the config management case
is it's really nice to have checking at compile time
for unused values,
for things that you specified somewhere
but then you never did anything with
because a really common mistake
in a config language that lets you do this
is to go and define some value somewhere
and then forget to list it in the place
where you meant to list it to say, oh, and thing now and having an ocaml the ability to kind of move
things around and reorganize the config and and get alerted by the compiler if we forgot to make
use of the value or if we left some stale bit of code around helps us keep the keep the implementation
as lean as we can and make sure that we just
prevent a wide class of mistakes. Yeah, I think that's incredibly important. It's a simple
decision, but the fact that we make fairly aggressive choices about turning on warnings
in the OCaml compiler, including that one, and not just warnings, but turn them to error. So
you cannot even compile your code when you have an unused variable. That can be annoying in some
contexts, but it's so incredibly useful and it catches so many bugs.
I was talking with a guy who works in the tools and compilers team
who had previously worked at various other big tech companies.
And he was talking about various like fancy techniques
that are out there for like machine learning, blah, blah, blah,
for catching common bugs.
And he was like, yeah, this seems interesting.
But honestly, the fact that we have things like
you know automatic detection of unused variables just smashes a lot of bugs this stuff would catch
anyway and so it's not clear it's worth the complexity a pattern match exhaustivity is
another one that's like that where the fact that you can know that you you matched on all of the
possible values of this type just eliminates a whole class of bugs that you might easily make
in other languages that don't have that for anyone who's thinking about whether a language like
OCaml is interesting you want to understand why people like it pattern matching and the
exhaustivity check on pattern matching is the single best feature and it continues to mystify
me that more programming languages have not picked it up like you don't have to take all
of the decisions but like that one is so good.
So you talked a bunch about what was good about MailCore, what the advantages are,
but going and building your own thing isn't all sunshine and roses. Like what are the downsides of having built our own homegrown mail server? There are plenty. One obvious one that stands
out to me is I've been referring to SMTP as this simple protocol. And the core of it is a simple protocol.
You know, there really are very few things
you need to implement to support the functionality
that you would expect of a basic mail server.
But as we alluded to at the very beginning,
when we were talking about the openness of email
and the reality of the modern internet
and all of the things that you have to consider
that weren't considered when it was originally designed,
there have been many extensions to SMTP
and many extensions to the kind of surrounding mail ecosystem to add on an extra
level of security or extra functionality. And MailCore has to implement all of those if we want
to get that functionality. We can't rely on some open source community or some vendor maintaining
the mail server that we're using and just adding functionality as new specifications become approved or go into wide use. That, I think,
means that this is a kind of a never-ending project in some sense.
How about from a security perspective? I can imagine that thing cutting both ways, which is to
say we have a lot more power to decide exactly how it works. At the same time, I imagine there are
like rookie email mistakes that someone
implementing a mail server can get wrong. And a mail server that's existed for 20 odd years
has had the opportunity to fix some of those. And we get to make those mistakes from scratch.
How much of a role does that play? I actually expected more issues of this type, but we've seen
fewer than I would have thought. And we have taken a good hard look at it
and and considered that that angle i think a big relevant detail of the way the popular open source
mail servers is written is that they're mostly written in c and so a lot of the security issues
that they have run into over the years have been of the normal c memory unsafe style security flaw
that many many many systems have been bitten by and writing our
system in OCaml rules out that whole class of things or at least limits them to you know bugs
in the in the OCaml compiler or in in external libraries that we link in or something like that
so that's a big plus the other thing that we get out of this is because we implemented you know
we've written our own thing that means means that it's going to be a lot
less common on the internet. There aren't that many other people using it. It's a lot less
interesting to find a vulnerability in our mail server versus in some popular mail server that's
on, you know, 50 million servers around the internet. And so I think we get a little bit
of security by obscurity from that fact. Right. And the buffer overrun story you talk about, mail servers written in C, is really no joke.
Well, I think Microsoft in the last little bit came out with a study that something like
70% of their vulnerabilities were buffer overruns for things that were written in C and C++.
And I think it just highlights to me the importance, again, of programming languages in systems
design.
Problems at the programming language layer
are incredibly hard to solve at higher levels, right?
If you use a safe programming language
like Java or OCaml or Rust,
then there's a whole class of bugs that just go away.
And you can do things to try
and smash the bug count above that.
And do address randomization
and all sorts of fuzzing testing and all that.
And you can do that and it's effectiveization and all sorts of fuzzing testing and all that.
And you can do that and it's effective, but it's an enormous amount of work and it doesn't get you to anywhere near as good of a situation as you would have been if you just use a safe language
to begin with. So the kind of ongoing train wreck of people building internet facing software in C
and C++ and other unsafe languages, like it to amaze me. It seems like a really
serious mistake, just from a security perspective, all other aspects of software engineering and
language design aside. Totally. Yeah. We get a big win out of just not having to think about that
and being able to focus our energies on other things. So we've spent a lot of time talking
very positively about email, but at the same time,
email is terrible, right?
Like we all live in a world where we have way too much email.
Certainly I live in a world where I have too much email.
And email is, I think, kind of clearly the best collaboration tool I have used, the best
communication tool I've used.
And for lots of things like Slack and whatever I like and I think
are useful in various contexts, but, you know, you can pry email from my cold dead hands at the same
time. Oh Lord, I wish it was better. And I'm curious, you've spent a lot of time thinking
about email and about how email works at Jane street and not just the technical, but also the
kind of organizational and human concerns. How do you wish email was better? There are kind of two trains of thought here that I want to cover.
One is, how do I wish email was better as a protocol and as a citizen on the internet?
And the other is, how do I wish email was better at Jane Street?
Or what changes do I think that we need to make?
I'll start with the first one.
I think the biggest thing that the world at large, the email world at large, has wrestled
with for the past, I don't know,
20 years probably, and continues to wrestle with is a consequence of this openness that we've
talked about a few times in the architecture and specification of the way email works. And
the consequence is essentially that email is really difficult to authenticate. It's really
difficult to know that in the kind
of core SMTP specification that a message was actually sent by the person who claims to have
sent it. So this is where we get things like spoofing and phishing and other kinds of malicious
impersonation, things that at their most mundane just result in more spam, more junk mail for you
to clean up. But at their worst, this is how you
get things like people pretending to be you and asking your bank to wire all your money to some
offshore untraceable account or something like that. So it's a huge problem. And it's kind of
a fundamental problem in the way that email is designed. And lots of people have made attempts
at improving it, adding extensions to the
email specification and new protocols and things like that. To fix this, there are things like SPF,
the sender policy framework, or DKIM, the domain keys identified mail. Both of these are just
attempts to further lock down and authenticate email, whether it is, you know, authenticate that
the person who sent it is who they claim they are, or authenticating that like the actual contents of the message are
the same contents that the original sender intended to send to you. So these help a lot.
And they definitely make a big difference. But but one of the issues that that crops up with
with both of these things is that they require participation by both the senders and the
recipients. So, you know, the sender has to be configured to authenticate and say, yes,
this email was sent by me. But the recipient also has to be configured to check for it.
You know, it's kind of like the equivalent of, you know, it's one thing for me to carry around
my driver's license and, you know, have a nice picture on it and my name and my license number
and all that. But if you don't ask me to see it and you don't look at it and make sure that it looks like it's a real driver's
license and that it was actually issued by the state and all that kind of stuff, then it doesn't
really do anybody any good. It doesn't actually demonstrate any identity or validity. So this
ends up being a big problem because if you're using a big provider, you know, somebody like Gmail or Microsoft 365, Google or Microsoft are going to be highly incentivized to build in a lot of good
tooling and implement all of these things and do as much as they can to help you authenticate the
mail that you're sending and check that the mail that you're receiving was also authenticated.
But if you're trying to run your own mail server or or even your own your own mail client that doesn't do some of the things that gmail or office 365 would do and you're trying
to keep up with all these things as as new improvements crop up and and things like that
it's just really really difficult and it's kind of a continuing problem you know once you've
authenticated that the sender of the message is who they said they were,
now you have this whole separate problem, which is, well, that's great.
But what if other people aren't authenticating that mail claiming to come from you was actually sent by you?
It's great if you check that the mail that I sent you actually came from me.
But if my bank isn't checking, then it's not doing me any good.
And so this ends up being kind of a pretty hard problem to solve in a uniform and global way.
You know, there's progress being made and, you know, continued improvements to some of these things and new ideas cropping up for how to make this better.
But it's just a really hard problem.
And a lot of it stems from all those nice things that we talked about with SMTP and all of its openness.
And so it's just kind of a double edged sword on the Jane Street side of things. Kind of ironically, I think the biggest problem that
we have with email is we send too much of it. I think email is great and I feel the same as you.
I think it's a I think it's an awesome tool and it's a really, really effective way for a lot of
kinds of communication. But I think it's too easy to send an email and it's too easy to send an email
to a large number of people.
And it's too hard to remove yourself
from a list of recipients in some case.
So we have all these mailing lists internally
that we use for organizing ourselves
and making sure that people who want to follow along
with different kinds of discussions can follow along.
But we don't have enough tooling to make it easy
for people to understand the sheer impact
that consuming email can have on your productivity, your ability to focus, your ability to do anything
besides read and respond to email. So this is something that we are actually focusing on
within our team right now, which is what kinds of information can we put in front of people?
What kinds of tools can we build for people to either get things out of email, you know, to move things that don't actually belong
in email into other systems? You know, you probably don't want your monitoring system to
be primarily alerting you via email. That's just not the place, you know, you don't need everybody
a week later to see that you got close to running out of memory on some server at some point. You
know, that's just something that's a transient fact about the world that you kind of don't want to deal with ever again once it's
resolved. But we're working on a lot of tooling to make it easier for people to get those things
out of email and into other systems, and also for people to kind of wrangle their inboxes,
better understand what it is that's coming into their inbox and where it's coming from and why
they're receiving it and how much of it they're getting so that they can make better decisions
about what they should and shouldn't be getting. I literally ran into this issue this morning in
that yesterday, shockingly, for the first time in about a year, I got myself down to inbox zero,
which is a mythical stage that one almost never gets to. And so, you know, as your inbox fills
up again, you're like, oh, what is this stuff? And can I please turn on the stuff that's irrelevant?
And there's emails I looked at.
I'm like, how do I even know why I am on this mailing list? And how do I unsubscribe from it in a clean way?
And it's all way more complicated than it feels like it should be.
And then filters, which feel like they should be a good answer to this problem, are actually
a surprisingly bad answer in a few different ways.
One way is the filter language is
like Google's filter language in Gmail is surprisingly primitive. I can't say like,
I want to not receive emails that I received only because I was on this. I don't want to,
but if there's some other reason that I should have received it, I still want to receive it.
And expressing that is surprisingly hard. And the other thing about filters in email that are
difficult, some people at Jane Street have taken a kind of radical, extreme view of email where they like block
everything and then whitelist the things that they want to see. And that means it can be very hard to
know whether the email that you've sent to someone has actually gotten through or has just been
filtered out by their system. But yeah, I the maybe the most important thing you said was this one about the cost issue that somehow giving some way of making people who
send emails feel the cost of sending it to those people right if you're going to write that email
to 1500 people uh it's probably worth spending an extra five minutes editing it to make it as
short and concise and direct as possible whereas if you're sending it just to your buddy who sits
down the row okay that's fine you know send it in whatever form you want. But I think at times,
it can be easy to forget the impact that sending an email that takes just an extra 30 seconds to
read multiplied out over 1,500 people can have. That's just a big cost. And so we're working on
ways of making that better. How? What are your ideas for making that better? I'm fascinated.
I think the biggest one is putting that information in front of you when you send an email. So because
we have mailing lists, it's easy to just say, oh, I'm going to send this message to everybody
at Janestreet.com and forget that everybody at Janestreet.com is a mailing list that contains
everybody at Janestreet.com and just how many people that is and just what the cost of sending
an email to that wide of an audience is. So we're working on ways to put that information in front of people at the
moment when they're writing the email so that they can at least make a more informed decision
so they don't sort of forget the impact that their message might have. I've seen some of the reverse
problem. There are lists that sometimes people really want to be bothered by, like they want to
lurk on. And I kind of want the opposite thing. I want to say like bothered by, like they want to lurk on.
And I kind of want the opposite thing there.
I want to say like, yes, there's a lot of people on here and you shouldn't worry about
them.
They've signed onto this fire hose, but I don't want to be slowed down.
If it's too much for them, they should sign off.
One of these at Jane Street is the mailing list called compiler dev.
And it turns out a lot of people like lurking on compiler dev because compiler questions
are interesting and people like to kind of pick through them.
And we have this problem where people will, we have this organized.
So compiler dev is actually the merger of two lists, compiler dev actual, the people really on the team and compiler dev also for other people who just kind of want to hang on.
And then people will, taking seriously what you said, will email compiler dev
actual instead because they're like, well, I don't want to email all of those people.
And we have to go and manually redirect and be like, no, no, no, you should normally worry about
the safety of your coworkers. But in this one case, people have decided that they really want
this and have asked us to make it so that all the emails go here. So please redirect.
Right. I totally agree. And I think I'm probably infamous internally for the amount of dispatching
between mailing lists that I do because I am like militant about making sure that a message
has gone to the right mailing list. Even if you send a message directly to me, I might redirect
you to a mailing list that contains only me just so that like if I ever decide to stop working on
that thing, somebody else will start receiving your emails and you won't just cash that you
should always send emails to me. So yeah, it's a problem that definitely cuts both ways. I think the other thing that I
wanted to highlight that you reminded me of is when we were talking about filters, another big
problem that we run into is a lot of the filtering technology, in addition to not being flexible
enough to express some of the things that we want to to express is not really built to be used by groups. Whereas in practice, we organize ourselves in groups in many cases. And so if you have a team
of people that are on some support rotation, for example, there's really not a lot of value in each
of them independently coming to their own conclusions about what kinds of emails they
need to see first and what kinds of emails they can kind of just have to skim later on. And so
we'd really like a better way to build tooling that allows for sharing of some of
this stuff so that we can kind of implement it once and people can sign on to use the
same well-developed general set of rules that somebody else on their team has decided on.
And we've actually built some tooling to this effect.
We've actually started doing the maybe predictable thing of generating some of our filters in OCaml and allowing for better sharing of those OCaml filters and code reviewing them
and doing all the things that we do with everything. And that has helped a lot. But we're
we're looking for ways to kind of expand some of that functionality to, you know, support more
things and add support for just some more expressiveness to the tune of the things that
you were talking about before. And presumably, even in that group-oriented environment, you also want a composability
story.
You'd like some ways of sharing among a team a set of decisions about how to handle emails
and also allow customizations.
You could both want to use the trade support email filters and then also some friend of
yours who came up with a good set of filters for some particular case.
You want to be able to mix that in and have some way of having the semantics at the end
of that be something that you can reason about. Yeah. And there's an obvious scary
thing that can happen here, which is a kind of smaller version of the scary thing that we were
worried about when we were rolling out Mailcore initially, which is once you start sharing filters,
now you've given somebody the ability to black hole all of your email. And it does happen from
time to time with some of this tooling where,
you know, somebody new to the team is like, oh, I'm going to add a filter. I'm going to, you know,
try to add support for this new thing that I ran into. And they accidentally confuse the rules in
some way or end up with a filter that says send everything to the archive. And it takes a little
while for somebody to notice sometimes. I think this highlights why wanting to have this kind of
more complex shareable system quite naturally goes highlights why wanting to have this kind of more complex shareable system
quite naturally goes along with wanting to have things like code review and testing,
because suddenly you've taken what had been a very low impact thing of like, you're just
mucking with your own filters, do a thing where you might black hole all the email for
the entire trade support team, which is now a critical firm risk issue.
Like now no one who's supposed to support the trading sees any of the things you're
supposed to be able to see.
So maybe to close it out.
So email's, you know, a big long-term system, which has been around for a long time and
changed a lot over the years.
And every now and then you hear people coming up with new things that are going to kill
email and replace email.
Projects like Google Wave, which was this grand new thing that was going to replace
email.
And then instead the wave crashed and that was the end of that. I'm wondering, are you optimistic about the future
of email? I am optimistic about it. I think that email in its flexibility and openness is something
that it would be really hard to replace with any of these other systems. And I think there's a
reason why it's had the staying power that it's had. You know, it's been around for 40 years. And
while it's changed around the edges, and while we've had to adapt to some of the developments
on the internet, at the end of the day, the core functionality has stayed basically the same that
entire time. And I think the fact that it is so open and so flexible and makes it so easy for you
to build your own things on top of it
means that it's got a long, bright future. And I certainly think that it's far from
being outmoded at this point. Well, thank you very much for joining me. This has been a real pleasure.
Thanks, Ron.
You can find a full transcript of the episode, along with more information about some other
topics we discussed, including a link to a talk that Dilo gave about MailCore, and also links to some of our mail handling libraries
at signalsandthreads.com. Thanks for joining us, and see you next week.