Signals and Threads - Building a functional email server with Dominick LoBraico

Episode Date: October 28, 2020

Despite a steady trickle of newcomers, email still reigns supreme as the chief communication mechanism for the Information Age. At Jane Street, it’s just as critical as anywhere, but there’s one d...ifference: the system at the heart of our email infrastructure is homegrown. This week, Ron talks to Dominick LoBraico, an engineer working on Jane Street’s technology infrastructure, about how and why we built Mailcore, an email server written and configured in OCaml. They delve into questions around how best to represent the configuration of a complex system, when you should build your own and when you shouldn’t, and the benefits of bringing a code-focused approach to solving systems problems.You can find the transcript for this episode along with links to things we discussed on our website.

Transcript
Discussion (0)
Starting point is 00:00:00 Welcome to Signals and Threads, in-depth conversations about every layer of the tech stack from Jane Street. I'm Ron Minsky. All right, so it's my pleasure today to sit down and have a conversation with Dominic Labreco about email. In particular, we're going to talk about a system that Dominic architected and led the development of called MailCore, which is Jane Street's own homegrown mail server. And I think this is interesting on its own because email is an interesting topic and the whole architecture behind it. But I think it's also a lens into some interesting questions about software design and how you manage infrastructure, some questions about how you make this choice of when you build your own thing and when you use standard existing tools, and also some interesting questions about how programming languages play a role in systems design.
Starting point is 00:00:48 Hi, Ron. Hey, Dilo. So to get started, can you tell us a little bit about how email works? Sure. Yeah. So email is based on an old and venerable protocol on the internet called the Simple Mail Transfer Protocol, SMTP. And SMTP, you can kind of think of it as playing the role that the postal service plays in delivering regular mail. It is a way for one server that wants to deliver a message somewhere to hand that message off to another party who can get it to its final destination, whether that is the eventual destination server itself or some intermediary
Starting point is 00:01:23 who can help you get a little bit closer. Email itself came into fruition, as we know today, in the early days of the internet. And the protocol itself is very simple. You basically have the actual body of the message itself, which has its own separate format and specification. And then you have a set of instructions for expressing who that message is destined for and who it's coming from. And so one server connects to another and it says, I've got a message. It's coming from so-and-so and it's meant to be it from here on out. It can say, nope, I don't know anything about that person. You have to find somebody else to deliver that to or reject it for any number of other reasons, like this looks like it has a virus or you're not allowed to connect to me or I'm not available for receiving mail right now. And one thing that always strikes me about email is it's this kind of wondrous artifact from the early internet, which is a truly open
Starting point is 00:02:24 social network. There's lots of things that people talk about, about could we make existing social networks better and more open and all of that. And just email just is from its initial design and its complete history has been this very open thing. And as you point out, the core protocols and transports are relatively simple, although there is actually a surprising amount of complexity in the RFCs that tell you how to parse a particular email. The overall system is pretty simple, but there's a lot of complexity in all of the different players who build systems that actually manage and transfer email around and how they deal with the various problems that happen like spam and people attacking systems via email and all of that. So the
Starting point is 00:03:05 foundations are relatively simple, but the emergent complexity of the system is actually pretty high. Like with many protocols of the old internet, it was designed in a time where the world was much simpler than it is today, especially the internet connected world. There were probably 50 institutions that had internet connections or ARPANET connections at the time. And you didn't really have to worry that anybody was going to be spamming because barely anybody even knew what email was in the first place. When you start and build a new thing, the early properties of the thing that you build can often be really sticky and really matter in a way that's kind of hard to predict. So this one early property of being open has stayed there. Email is a thing that anyone can
Starting point is 00:03:42 participate in. Organizations can kind of build their own infrastructure to connect to it. And through all the rather large transformations that the email system has gone through, that openness remains as a core property. This is the horrible thing about designing to build a new thing. Like when you want to design something new, you have to make a bunch of choices. And clearly you shouldn't worry about them that much because probably the thing you build is going to fail and isn't going to work out. And even if it does, you're going to learn more about the problem later. And so you shouldn't worry about them that much, because probably the thing you build is going to fail and isn't going to work out. And even if it does, you're going to learn more about the problem later. And so you shouldn't worry too much about the early decisions. But also, some of the early decisions, you don't know which ones are going to turn out to be very hard to
Starting point is 00:04:13 change. And you'll be stuck with till the end of time. And in fact, you know, the big players in email today, you know, obviously, Google and Gmail are a really large percentage of the email sending and receiving on the internet. But they are still wrestling large percentage of the email sending and receiving on the internet, but they are still wrestling with some of those early decisions and some of that openness that was architected in as they try to figure out how they can make email more secure and how they can protect their users and rein in some of the malicious actors on the internet. And that's just a hard thing to do while trying to maintain the existing openness that email has, but it cuts both ways, I guess.
Starting point is 00:04:46 That openness in the end has a lot of value. Absolutely, yeah. So the story here is about how you ended up building the system called MailCore. What did email at Jane Street look like when you first ran into the problem? So you might think that there's really not much special about the way Jane Street uses email compared to any other company, and largely that's true. I think we have a few special requirements by dint of the fact that we are in a regulated industry. So we have some requirements around logging for compliance purposes, every message that is sent or received by somebody at Jane Street. But other than that, our email system looks pretty similar or has looked in the past pretty similar to the way an email system at any organization might look. And the rough summary is we have some mail gateways that sit on the outside
Starting point is 00:05:30 of our network for receiving email from foreign servers, you know, from external parties. And then we have some mail server or set of servers inside of our network that handle all of the complicated business logic around what to do with those messages. So in some cases, it's as simple as receive the message and deliver it on to the mailbox of a user if we are the intended recipient. In other cases, it is apply filtering for things like spam and viruses and other things that we might want to extract from messages before we deliver them to expansion for mailing lists. So if you send an email to some group at Jane Street, you want to be able to expand that group name to the actual list of recipient mailboxes to make sure that it actually ends up in the inboxes of the recipients who it's destined for. And then this extra
Starting point is 00:06:15 compliance implication of making sure that we're logging all the right messages with all of the right metadata. And at the time that I started, the mail infrastructure here was all based on an open source mail server that has its own config language and is pretty widely used on the internet at large. And we had about four or 500 lines of configuration in the most complex case, I think, for this system to get it to do all of these different things that we wanted it to be able to do. Great. So that sounds like a reasonable approach in terms of how to build oneself a mail system. What problems did we run into with it? Yeah. So the biggest problem here at the end of the day was the complexity required for configuring this system to do all of the things
Starting point is 00:07:00 that we needed it to do. So I said four or 500 lines of configuration that probably doesn't sound like a huge number. But when it's in a kind of bespoke configuration language, that's unlike the configuration of any other system. And unlike any programming language that a developer or engineer at Jane Street would be familiar with, the complexity of four or 500 lines in a foreign language is pretty large and can be a little bit imposing to deal with. And in particular, we had some scary near misses where we realized that we had done the wrong thing in terms of archiving some email for compliance purposes that we were supposed to archive. And luckily, in each of those cases, there were mitigating factors such that it didn't end up being a big deal. But the kind of near miss gave us a little bit of a scare because we went and looked at the configuration and wanted to
Starting point is 00:07:42 understand how we had gotten ourselves into this position. And it was harder than it felt like it should be to understand what had gone wrong and how to fix it. It's maybe also worth mentioning that the problem of logging all of your messages for compliance purposes may sound easy, but it's made more complicated by the fact that Jane Street is a company that operates in lots of different regulatory regimes and has actually different rules for some of the different places it operates. So even the sort of seemingly simple, let's just write everything down is more complicated than it might appear at first. That's right. Yeah. We have different requirements in terms of what has to be written down and what kinds of metadata we need to store and where the extra copies need to be physically located around the world and things like that,
Starting point is 00:08:20 which are reasonable sounding when you think about the human aspects of it, when you reason about it. Okay, yeah, you need a copy for this and a copy for that. But actually implementing the rules in practice ends up being pretty complicated. So one of the things that motivated you to try and do something new was this kind of near-miss situation of things almost going horribly astray. Were there any other reasons that you wanted to try something different? As I said, one aspect of it was certainly this realization that the complexity of the system had gotten to a point where we just actually were scared to make changes to it. Another came from the fact that it required this kind of specialized knowledge. We have a team at the time, we were much smaller than we are now. But you know, even today, we have a team made up primarily of generalists, people who are able to work on a lot of different kinds of
Starting point is 00:09:02 problems and have a kind of general background across an area of technology and understanding the configuration for this particular open source mail server is not something that you just have as part of a general knowledge. You know, it really required specialized understanding and background more so than the general skills required to administer an email system or understand the concepts behind email. You really needed just to know the particular weird semantics and dark corners of this particular language. And the idea that we needed to sort of build a team or have a team to specifically understand and be comfortable working with this just didn't feel like a good use of our people resources. There are a lot of other problems that we need to be solving, and we'd much rather be able to take as general approach to them as we can. Can you give an example of the way in which the config language was hard to reason about?
Starting point is 00:09:52 I think this is an example of a pretty common pattern that you see in a lot of systems that are intended to be highly flexible and configurable. They start with a relatively simple core that handles the basic functionality. And over time, as they try to add more features to the system, they add more and more knobs that you can turn, more and more configuration parameters or elements in the configuration language to make it possible to express all of those different things you might want to be able to express. And in this particular case, the configuration language is a bespoke domain-specific language developed just for this system. It kind of resembles in some places the kind of old school.ini format of having like a key and then an equal sign and then a value and sections separated with kind of headers and brackets and things like that. But then when you look a little bit closer, you realize it has all this extra power layered on top. So in particular, it had support for these kind of advanced macros that look a little bit like
Starting point is 00:10:49 function calls, where you can call a macro with some set of arguments, and it expands to something else. And there are these different phases of expansion of these configuration elements where you can do this kind of metaprogramming, where you can have macros that produce macros that then get expanded to some resulting values. And then on top of that, the set of fields that are required in the configuration and the interaction between those things is not made very clear and it's not really very consistent. So for example, you might have a section that defines the way that you can route a message, the way that you can decide where a particular incoming message should go, whether
Starting point is 00:11:25 you're going to send it to a mailbox or relay it to some other server. And you can define multiple routers. And the semantics in terms of which router is going to get selected for a given message are not made explicit by the configuration language. And there are a bunch of other examples like this, where there are some set of elements that you define and the semantics for how the system chooses which of those to apply in a given case are not explicit and clear from the configuration superficially speaking. You just have to know. You have to go and read the documentation and understand how it is that these things interact with each other. Right. And the rules for picking which particular rule fires in a particular case,
Starting point is 00:12:04 I assume those rules are not simple themselves. They're not simple. And in some cases, for good reason. I mean, the system, it's worth saying, is highly, highly flexible. And it is the case that it could do all of the things that we wanted it to do at the time. But ultimately, the way in which you needed to kind of contort yourself to understand how it was going to do that and how to fit those different pieces together required a expert level knowledge of the semantics of the particular system. So you had a clear problem in front of you. What approach did you decide to follow to address it? Yeah, so ultimately what we decided to do was the ostensibly crazy sounding thing of writing our own
Starting point is 00:12:39 email server. In particular, we wrote a new email server in OCaml, the functional programming language that we use here at Jane Street. And crucially, and maybe the most interesting part of this is that the system was also configured in OCaml. The real problem that we had come to here was we were happy with the core functionality of the old system that we were using, but the configuration language was what we felt like was really limiting us. And we came to this fundamental realization that ultimately, the role of an
Starting point is 00:13:06 email server, you can think of as a function, you can think of it as kind of a black box that implements a function that takes a message and outputs one or more resulting messages. And that black box is responsible for making all of the decisions about how to transform those messages and how to route those messages to further servers or to inboxes. And at the end of the day, you can kind of encapsulate everything in a function that looks roughly like that. And OCaml, like I said, is a functional language, as you know, and it really lends itself to writing functions in this way, composable units that you can kind of stitch together to implement some bit of functionality that ultimately takes some inputs and generates some outputs without any side effects. And that was what we realized we needed. And so we started down that path.
Starting point is 00:13:53 That simple pivot in the design, it lets you bypass all of this complexity of this custom language. And you just get to pick a really well thought through, well engineered abstraction in the middle of OCaml, which is the function, and use the ordinary tools for software composition that you have there for building the abstractions that you want. And then this kind of gets you out of this problem of having to think about a weird, complicated special case that comes up for mail and for nothing else. Exactly. We already had OCaml developers and we already had a lot of people who understood the semantics of OCaml and the way in which the various language features might interact with each other. So we didn't now have to go out and find a bunch of people who understood this
Starting point is 00:14:28 esoteric configuration language. We could just find people who knew OCaml. So one thing that strikes me about the story is in some sense, the story sounds very familiar, which is the thing that you're describing about this mail server configuration language actually sounds an enormous amount like the story around, say, build systems, make. Make has a relatively simple core domain-specific language, which instead of talking about things you do with mail, it talks about rules for building things with dependencies and targets and all of that. And that language is indeed insufficient for doing big and complicated things. So people have built complicated macro systems. In fact, there's a macro system inside of make so that you can write make rules that generate
Starting point is 00:15:09 make rules that generate make rules. And that's kind of horrible. And no one's super happy about that as a way of doing complex builds. But there's another way that people sometimes use of getting out of this problem, which is not always create their own build system, although plenty of people do that, including us. Us twice. Embarrassingly. That's right. But another approach is what you might call the config gen approach, which is to say,
Starting point is 00:15:32 okay, there's a simple core configuration language, and there's a bunch of complicated stuff on top, which is about increasing the generality of language. Let's forget all of that terrible stuff and then write code in another language where we have better abstractions and better tools and have it just generate things in this kind of simple core calculus that's exposed by the underlying configgen language and then we get the best of both worlds
Starting point is 00:15:55 we get to write our configurations in a nice high-level language that we understand well and isn't a special purpose skill and we get to use the core engine that has been built and maintained by other people, then we don't have to reimplement it. So why wasn't that the path that you chose with MailCore? I think there's three reasons. I think two are good reasons and one is a bad reason. I'll start with the good reasons. The first is we really at the time were not happy with the primitives that the system that we were using provided to us. So the configuration language was complicated even in its simplest form. And it's not like we had some nice primitives that we could work with where we just needed to generate those and we could do anything with those and everything else was built on top of it. We would have had to generate complex macros
Starting point is 00:16:38 and some of the config elements I was talking about before. And we didn't feel like we would be saving ourselves very much by generating those versus writing them by hand. In fact, we still would have needed to understand it just as well. It's not like we could have limited our understanding to a subset of the language and just implemented everything we needed using that. That was the first reason. The second reason is that we did want some runtime dynamism. We did want the ability, in some cases, to actually change behavior based on other things out in the environment, other things out in the world. And the configuration that we would have had to generate to do that, we would have been back in the exact same position that
Starting point is 00:17:14 we were in before. So we ended up feeling like it would have been better, we'd be happier implementing those more dynamic features in a language that we are much more familiar with and more comfortable with, rather than trying to implement those via config generation into some lower level language. The third reason and the kind of worst reason, like I said, is ultimately at the time, I think configuration was a much less popular, much less widely used technique at Jane Street. And I think we probably didn't consider it as seriously as we should have at the time because the tooling and the kind of prior art and other examples of it internally just wasn't widespread enough for it to be top of mind as a possible solution. So here's another alternative idea of how you might've gotten yourself out of the problem.
Starting point is 00:17:56 It sounds like if you looked at the config language and said, wow, this is incredibly hard to reason about. It's hard to understand. Let's move to a different language. There's another way you could respond to the problem of, wow, this thing is really hard to reason about, it's hard to understand, let's move to a different language. There's another way you could respond to the problem of, wow, this thing is really hard to understand, which is you could have approached it by trying to test it much better. You say from the outside, step back, what does a mail system look like? A mail server looks like a big bundle of functions, or maybe one big function that takes in an email and decides what emails need to be emitted out of the other end. You can imagine taking that view as an approach to testing it,
Starting point is 00:18:27 which is that you could build some framework around the system and you could make a bunch of assertions where it's like, oh yeah, if we put this email in, we expect these emails to go out. And that's another way to build confidence that the system behaves in the way that you expect. Even if the underlying config language
Starting point is 00:18:40 is kind of a disaster, you can do a nice job of the testing framework on the outside to not completely nail down, you can do a nice job of the testing framework on the outside to not completely nail down, but to get yourself a lot of confidence about the way in which the system behaves. We did consider that at the time. And I think the reason that we didn't feel like that was sufficient was primarily that while we could have tested the full end-to-end system that way, the units of configuration are not composable enough for us to be able to test smaller subsets of it. And so you might be able to say, yep, this didn't do
Starting point is 00:19:10 what I expected it to do. This broke. But that doesn't necessarily help you figure out why it broke or what it was that changed, especially in the face of these confusing semantics that I talked about before. And then beyond that, we would really have been fighting kind of an uphill battle in the sense that this is not a software system that was designed to be testable in this way or a configuration language that was designed to be testable in this way. And so we ultimately would have had to build a lot of our own tools and a whole harness to run this system within to be able to even get there and then get these suboptimal results. There's this general problem. If the basic system isn't composable, that's a problem that's hard to get around. One thing we did consider is moving to a different open source system or just another
Starting point is 00:19:49 mail server implementation. This isn't the only one in the world. There are others. And the reason we ended up ruling that out is we went and looked around and sort of looked at the most common other mail servers out there. And we saw basically two variants, like two different potential alternatives that we could have looked at. One was a class of very popular and widely used systems that look pretty similar to the system that we were already using in terms of how they were configured and the kind of complexity and sort of system specific knowledge required to work with them. It didn't feel like there was enough justification to migrate to some other system just to understand a whole new set of semantics and a whole new set of complexities. And then the other flavor of system that we came across
Starting point is 00:20:30 was a much newer, much less widely used, much less popular in the world at large set of systems that were implemented in ways similar to how we eventually architected MailCore. A small core implemented in some language and a very flexible configuration based on a common programming language, something like Python or Lua or something like that. And there are a handful of those around. I think that the two reasons that we didn't go down that path, one is none of them were widely used and baked in enough for us to feel confident that they were the right choice. You know, there wasn't kind of an obvious one that was a front runner that we could just say, ah, yes,
Starting point is 00:21:06 everyone's using that. It must be good. It must be well tested and used in production, so to speak. And the other reason is if we were going to switch to a configuration language that was an actual programming language, we would be much happier using the language that we use for almost everything else where we have great tooling and a lot of experienced engineers around who are already familiar with the language. It just didn't seem like switching to Python was going to be a net win for us in the long term. You would have had to have gotten a lot of benefit from the engineering that had gone into the other system to compensate for the fact that you have to switch languages. There's the language and the tooling, which is a big deal.
Starting point is 00:21:39 Exactly. Okay. So you had an architecture in mind, you had an approach to take. How did it go? What were the problems as you ran down this path? Initially, things moved really quickly and went really well. We were able to implement core SMTP protocol pretty quickly. Like I said, it's a relatively simple protocol. So that went relatively smoothly. And then we started down the path of writing the configuration, writing this pile of OCaml that was meant to replicate the functionality that we had in the old system. And this was kind of an interesting experience because we found
Starting point is 00:22:10 pretty quickly cases where the old system had either non-deterministic behavior or was just doing the wrong thing in some case that we hadn't noticed in production or hadn't really bitten us yet, but could have. And sort of the exercise of reverse engineering many years of configuration changes and people slapping things in to fix issues or to add functionality and trying to figure out what the intent behind those changes was so that we could then reproduce the intent in a new system. And I think we eventually got to the point where we felt pretty confident that we had addressed most of the existing functionality. But then we were faced with a new problem, which is how do you build enough confidence
Starting point is 00:22:47 in a completely new system that's never been run in production anywhere before that has a completely rewritten configuration enough to want to move the entire firm's communications over to it? It's not something you want to do overnight. It's maybe worth highlighting, email is absolutely critical to Jane Street,
Starting point is 00:23:06 sometimes in a way that's incredibly important for short periods of time. Like if you turn off traders email, there are things that go wrong right quick and a trading business needs to be able to respond to information quickly. So problems in the communication systems are incredibly critical. And we're a global team, you know, we're spread across three continents and a lot of the way that we make sure that we're keeping things consistent and that we're keeping in touch between regions is email. And so it's not like you can say, oh, well, we'll do it overnight. The Hong Kong office isn't going to be happy about that. And similarly, you know, you don't necessarily want to just do a big bang migration over a weekend and hope that Monday morning goes smoothly. It's something that's kind of fraught with peril. So we started thinking about how to build this confidence, you know, what we could do to test the new system enough that we
Starting point is 00:23:50 would feel ready to actually make that flip. And we came back to something that we talked about a little bit earlier in this conversation, which is this idea of testing the end-to-end behavior of the system and kind of demonstrating that it was doing what you expected it to do in all cases. But we didn't really have a reference implementation that we could use in OCaml. We weren't sure how to produce something like that. And so we looked back at our existing configuration and we had this thing that had been working for years or at least working enough that we thought it was working. And so we thought about how we could leverage that. And what we ended up doing is setting up basically what we called a shadow instance of our new OCaml based email server and mail core and running it in parallel with the existing system. And so for each message
Starting point is 00:24:37 that came in to our walls, we would fork off a second copy of the message and send it to the new system in addition to sending the original message to the old system. And then we set up basically some endpoints sitting on the other end, on the output side of the old system and of our new system to just keep track of the output that was generated by each, you know, what messages got generated, what the transformations that have been applied were, where the messages were going to be directed. And then we just diffed those. We just set up essentially a streaming diff of all the messages coming through both. And we made a lot of noise to ourselves each time we saw a case where the new system and the old system didn't behave the same way. And we found like five or 10 different cases where there
Starting point is 00:25:18 was just like entire classes of mail where we were doing like a slightly wrong thing. And in a bunch of those cases, I think the majority of those cases, it was actually the old system that was doing the wrong thing and not the new system that was doing the wrong thing. But it still made us feel sort of warm and fuzzy to know that for the vast majority of email, the behavior was the same. And so we ran that for a long time, for months. And then once we'd built up enough confidence, we started cutting users over one by one. We started basically at that stage where we were forking off a copy of the message. We sort of added some logic to decide which primary server a given user's mail was supposed to be going through and sent our own mail through the new system for a little while
Starting point is 00:25:56 and kind of ramped it up that way until all mail was going through the new system. One of the things that strikes me about this story is the thing that, you know, Yee Standard Software Engineer thinks about as the problem to be solved is a fairly small part of the story that you're talking about solving. The writing of the software that actually does the thing, that's like a little bit of work. And then there's a significant chunk of work, which is writing the config, which is again, a programming task. And then there's a bunch of just careful operational thinking about how the overall system works. And writing more software to set up this harness and the monitoring and the kind of
Starting point is 00:26:30 diffing and all of that stuff as well. The other notable thing to me is a big part of the work here was essentially wrestling knowledge out of the old system into the new system. Whereas when I came to Jane Street, a naive young person out of grad school and thought about writing software, I thought, oh, software is where like someone has an idea of a thing that they want to happen. And then you write software that makes that thing happen. It's like, no, a lot of the time software is replacing some old thing and there's no idea about what should happen or rather no human understands specifically what needs to be done.
Starting point is 00:27:01 There's just some old system that encodes all of this knowledge in it in a way that maybe no individual human ever knew all of it, but a bunch of people over time slowly added knowledge to this weirdly encoded knowledge base. And then you as a software engineer had to figure out how to wrestle it out of that. I remember running into this many years ago with us replacing our early version of our order engines, which were the systems that connected to other brokers and exchanges and routed our orders there. And I got to this problem after working on trading systems. And trading systems was like a really smart person had written a thoughtful spec about how this thing was supposed to behave. I was like, oh yeah, okay, I can write
Starting point is 00:27:38 to the spec. That was relatively easy. Whereas we had some order engine, which had buried inside of it knowledge about how Bear Stearns' internal infrastructure worked, which is like, again, not anything that anybody internally really knew explicitly. And it took a long time to claw that knowledge out. And it sounds like you ran into more or less the same problem here. Absolutely. One of the problems I think one always runs into with the decision of whether should we use some external thing or should we build something on our own, is the question of like, how deeply do you mis-underestimate the size of the problem? How did that part go? How hard did you think it was going to be and how hard did it turn out to be? If I'm completely honest, I don't really remember
Starting point is 00:28:16 what our estimate was at the time. This is now probably five or six years ago when we originally started on this effort. And I don't remember what we thought would happen. I think it definitely took longer than we expected because I think that's a rule basically for almost everything that I've ever been involved in. But I think maybe the interesting point to highlight is we probably were closer than we expected to be in terms of the implementation of the core system. But the implementation of the actual configuration and the migration to the new system, I think, exceeded the estimate that we would have made at the time. I think it just took a lot longer
Starting point is 00:28:49 to build that confidence and to get to the point where we really did feel ready to flip to the new system. I think it took probably on the order of a year total from start to finish or start to some version of finished. And then, of course, it's been a long tail of improvements and changes and extensions since then. Right. I guess one of the funny questions there is what's the alternative? I guess one alternative was not doing anything new and just kind of suffering with the system as it was and kind of incrementally moving along. And there, I think that time estimate matters a lot. But if the alternative is switching to some other system that you hope will be better, it sounds like most of the work that you had to do, the stuff that took a long time, was stuff you would have had to do anyway.
Starting point is 00:29:32 I think that's right. Yeah. Might have taken longer the other way. Yeah. And it would have been harder to find people with the right expertise and knowledge to work on it as well in some sense. I think if we had switched to some other system with some other arcane configuration language, we would have had to learn all about that first and then start down those other paths. And at least in this case, we could say, okay, you know, you and you and you, you know, some OCaml engineers already working at Jane Street. Here's a new domain to apply your existing knowledge and sort of expertise to. So once the project landed and we started using as our primary
Starting point is 00:30:01 mail system, what were the benefits that we got as an organization from this work? How is Jane Street's setup now better for all of this change? There are a lot of reasons. I think the very first one is maybe obvious from the way I described the migration, but we now had all this infrastructure around making changes. First of all, now we could implement tests. Now we could implement sort of our normal inline tests to use for units of OCaml code and the configuration was composable, so we could reuse bits of it across different instances of our email server internally and across different use cases where we needed to run separate mail servers for some reason or another. But more importantly, we now had this system that we could use to gain confidence in changes that
Starting point is 00:30:40 were going to be far reaching in the environment. So we could run a new version of MailCore and an old version of MailCore next to each other. And we could diff their behavior in the same way that we had diffed the old system versus MailCore. And we use that to great effect, we still use that. We also as part of this implemented a nice OCaml library for working with SMTP, both as a server and as a client, because obviously, we needed to implement this core functionality as part of the construction of this system. And we found a bunch more use cases for that internally, you know, other cases where it's useful to be able to stand up a small email server and take some automated action or write down a copy of a message or something like that. The biggest and most important impact that
Starting point is 00:31:21 the system had, though, in the end was it really lowered the barrier to making changes to the system. So for a long time, we had kind of trained ourselves to not make changes, to not make improvements, to not really touch it when dealing with the old system, because we were just ultimately scared that we were going to break something and that we didn't have good tools to confirm or to give us confidence that we weren't breaking something. But with MailCore, we now had this system that looked very familiar. It looked like just about any other software system at Jane Street. It lived in our normal code-reviewed repository and was built with our normal build tools and had all of our normal OCaml-specific tooling and functionality.
Starting point is 00:32:00 And that means that the set of people who felt like they could propose or make changes went way up. So, you know, if somebody in the cybersecurity team here wanted to implement a new kind of scanner for a particular kind of malicious attachment or something like that, they could just easily go and write a feature, write some OCaml code to integrate that scanner or to implement that check. And it didn't require, you know, finding the one person wearing a cape and pointed hat that happened to know the particular dark corners of the old systems configuration language. It just really, you know, required the same kind of knowledge that we expect any of our software engineers to have. Yeah. And the cybersecurity example is not a random one in the sense that I think one of the big wins from all of this change to our mail, which I think wasn't really contemplated as much when we made the decision to start it on MailCore, was that it gave a lot more power to people thinking about security to put all sorts of remediations in place. It's hard to overstate how important email is as an attack vector and how much work you
Starting point is 00:33:01 have to do to protect yourself. And the ability to just widen out the set of people who could make those changes and to be able to kind of accelerate the level of work there, I think had a very powerful effect on improving our cybersecurity protections. It absolutely did. Yeah. And I think that is one example, like you said, but there are other cases too, where
Starting point is 00:33:18 not only from a security perspective, but from a kind of functionality and sort of productivity perspective, we were able to make functionality and sort of productivity perspective, we were able to make changes that we wouldn't have even contemplated in the original system. So, you know, if we wanted to, for example, do things like add support for new kinds of mailing lists, you know, mailing lists that had different behavior or that sent messages to somewhere else besides somebody's inbox, because we wanted to get something out of email or something like that, or a mailing list that we wanted to kind of note was deprecated so that it would kind of alert the sender that we're not using this mailing list anymore. These are
Starting point is 00:33:54 like little things, little enhancements that smooth over paper cuts, but the flexibility to be able to throw in a 50-line feature to implement something like that just opened our eyes to the set of customizations that we could make to the way that we work with email. Yeah. And simplifying and improving and removing paper cuts in the primary communication mechanism of a 1300 person company is a surprisingly powerful thing, right? When you step back and think about it. So the individual things seem small, but the values of the organization is quite large. And I think people are actually just kind of on a person-to-person basis, incredibly grateful for the work that's gone into email because it's so much better than it used to be in a bunch of ways that I think really affect the quality of people's lives here.
Starting point is 00:34:37 A great example of something that at the time seemed small, but that has been really widely used and that a lot of people have gotten a lot of value out of is we implemented something internally that we call a relay list, which is like a regular mailing list. So a mailing list is normally we think of it as sort of an address that contains some set of members. And when you send mail to that address, it goes to each of the members of the list. So we might have a distribution list that's for everybody at Jane Street so that we can send out announcements from wide. Well, a relay list is a special kind of list where instead of including human user email addresses as the members, it includes hosts and ports. So some internal host name and a port on that machine.
Starting point is 00:35:21 And what happens is when you send an email to a relay list, MailCore actually relays that message on to a mail server listening on that host and port. You pair that with a small library for easily standing up a little mail server. And we've now given the power to automate various workflows related to email to everybody around the firm without requiring them to have any special privileges or any access to actually make changes to this core super critical piece of infrastructure. We've kind of federated that ability out to everyone and people have made use of it. They've implemented little enhancements to their own workflows for their specific teams around support rotations and
Starting point is 00:35:57 little tools for improving their monitoring and all sorts of things like that, that they just wouldn't have been able to do or that we maybe wouldn't have wanted them to do if it was going to be in the core system that we're using to handle everyone's important email. And part of, I guess, what you're doing here is, again, leveraging the openness of the email architecture, where you can just have routing between different hosts that are implemented differently and are doing different things. But of course, in this case, they're not implemented completely differently. We get to reuse the same libraries that you built for MailCore. Those now get to be leveraged in all sorts of places. Exactly. So one of the things you were pointing out there is the fact that you were now able, in writing the configuration, to leverage
Starting point is 00:36:33 the standard software tools that we used for building all sorts of software. Can you say a bit more about how that played out and what were the valuable pieces of those software-oriented workflows? I think this is a pattern that you see a lot around Jane Street. We're a place that really highly values automation. And so we'd much rather take a problem that maybe is traditionally viewed as an administration or an operational problem and turn it into a software problem if we can, because we do get to benefit from the kind of work that's being done
Starting point is 00:37:04 around the firm to improve our ability to work with software. And what do I mean when I say that? I think there are a lot of very concrete, specific technical things that I mean. So things like editor integration, you know, the syntax highlighting, the integration with the build system, tools that help us view the type definition for a given value, tools that help us jump to the definition of some bit of code so that we can kind of move around the source code repository, tools for writing automated tests and for kind of demonstrating that the behavior of some system hasn't changed over time, all different kinds of things like that. But the other thing
Starting point is 00:37:39 that I think is important about taking this kind of software approach to what are traditionally viewed as more systems-y or more administration problems is, I think, kind of a cultural one. I think people treat code differently from the way they treat other things. Like there's some switch in our brains where if we're messing with a config file, then we're much more willing to copy and paste some stanza or to, you know, hack something together and just throw it into a repo without a good commit message, or maybe to not even put it in a repo in the first place. Whereas when we're dealing with code, the expectations just change. There's this general shared understanding that code should be reviewed and code should be tested. And there should be a description for why
Starting point is 00:38:18 you're making a particular change. And we work with it in a different way. And we hold ourselves to a different standard when working with code. And we refactor code. How often do you refactor configuration? And I think ultimately the thing that we were most excited about here was being able to leverage that kind of cultural shift. Just this inclination to work with this pile of stuff in a different way, even if it was, you know, for a system where normally it wouldn't be handled that way. So part of it is about the tools of software and part of it is about the culture of software. That's right.
Starting point is 00:38:50 So why do you think the culture of software and the culture of configuration are as different as they are? Sort of step back, thinking about it from an abstract perspective, it doesn't feel very different. Configuration languages are languages, essentially very restrictive programming languages. Why should they develop such different cultures? Yeah, it's a good question. I'm not really sure. I can speculate a little bit. I think one potential reason is that the tools are just not as available to you. So sort of the tools breed the culture in some sense with software. The fact that you have all of these nice tools for writing tests and for doing code review and for working with code definitely encourage you to do good things. You know, it's much easier to refactor some code if you have good tools for helping you refactor it.
Starting point is 00:39:33 And with configuration, you often don't have those things because in many cases it's a bespoke language for a particular system. And it's just not worth the effort of going and building all of that tooling specifically for the system. I think that's probably a big part of it. I think another part of it is that in many cases, we store configurations separate from where we store source code. A common pattern is you build some core functionality, some basic system that just handles the most common kernel of operations that you need to handle. And then we have many instances of that system, each with their own configuration. And the result of that is you end up with configuration kind of strewn all about managed by different teams, maybe if the system is being run by different groups or something like that. And in general, you don't get the same kind of consistency
Starting point is 00:40:18 that you might get out of the way that we would approach making changes to the core functionality. And I think one of the other big takeaways with with Melcore was we actually just store the configuration right next to the core functionality, the configuration lives in the same repo right next to all of the other code. And when we roll it out, we deploy everything all as one big bundle. And we don't have this problem of like, oh, well, the configuration lives over here. And the code for the implementation of the core functionality lives over here. And we get some tooling that works well over there and some tooling that works well over there. And it's, you know, it's kind of annoying to interrupt between them or something like that. And I think that that plays a role in a lot of cases as well. At least the second problem you described, the one
Starting point is 00:40:55 about where you store the config versus where you store the code, that at least is a thing that, you know, thinking is enough to make it so. Like if you just have different ideas about how you should store a configuration, you can adopt that one. Where your previous point about the culture depends on the tools being there, that's a much harder problem to fix. And just even from Jane Street's own history, I think if Jane Street is having a very good
Starting point is 00:41:15 and well-developed culture around testing, but it didn't always, right? The tools used to be for testing used to be much worse and the practices around testing were much worse. And I think the thing you described is exactly right, that the culture was able to be much worse. And the practices around testing were much worse. And I think the thing you described is exactly right, that the culture was able to be established only in concert with building the tools. Like people decided testing was important. We spent more time doing it. People got frustrated about how hard it was. They spent more time building tools to make it easier. And then when it got to be really easy, which is kind of how I think of it now,
Starting point is 00:41:42 that culture gets really widely spread. You mentioned this refactoring thing, which is another thing that struck me, where one of the practices I think is very common in code is you said talk about refactoring configs. We have a fairly strong approach of trying to avoid repeating things in code because if you cut and paste things, it's a super easy way to introduce bugs.
Starting point is 00:42:03 Like, you know, you cut and paste it and then make just the changes you need to make it right. But it's so easy to miss something. But OCaml has incredibly good, very lightweight tools for essentially making very simple templates, typically in the form of functions, so that you can just figure out what are the parts that really need to differ and avoid any excess duplication. And it doesn't even make your code necessarily shorter, but it does very often make it cleaner and less likely to be buggy. And the tendency to do that depends critically on having a system in which you operate
Starting point is 00:42:34 that's friendly to that kind of refactoring. And if you're in like some random config language that was never really designed as a programming language, which has no kind of core principles on how it's organized, that stuff is just not going to go so well that's right yeah and i think this kind of speaks to the specific uh functionality of ocaml or the specific um uh capabilities of ocaml that make it such a good language for a wide array of problems but but you know including this one and i think you
Starting point is 00:43:02 highlighted one another that i I think is pretty important for the config management case is it's really nice to have checking at compile time for unused values, for things that you specified somewhere but then you never did anything with because a really common mistake in a config language that lets you do this
Starting point is 00:43:20 is to go and define some value somewhere and then forget to list it in the place where you meant to list it to say, oh, and thing now and having an ocaml the ability to kind of move things around and reorganize the config and and get alerted by the compiler if we forgot to make use of the value or if we left some stale bit of code around helps us keep the keep the implementation as lean as we can and make sure that we just prevent a wide class of mistakes. Yeah, I think that's incredibly important. It's a simple decision, but the fact that we make fairly aggressive choices about turning on warnings
Starting point is 00:43:54 in the OCaml compiler, including that one, and not just warnings, but turn them to error. So you cannot even compile your code when you have an unused variable. That can be annoying in some contexts, but it's so incredibly useful and it catches so many bugs. I was talking with a guy who works in the tools and compilers team who had previously worked at various other big tech companies. And he was talking about various like fancy techniques that are out there for like machine learning, blah, blah, blah, for catching common bugs.
Starting point is 00:44:20 And he was like, yeah, this seems interesting. But honestly, the fact that we have things like you know automatic detection of unused variables just smashes a lot of bugs this stuff would catch anyway and so it's not clear it's worth the complexity a pattern match exhaustivity is another one that's like that where the fact that you can know that you you matched on all of the possible values of this type just eliminates a whole class of bugs that you might easily make in other languages that don't have that for anyone who's thinking about whether a language like OCaml is interesting you want to understand why people like it pattern matching and the
Starting point is 00:44:52 exhaustivity check on pattern matching is the single best feature and it continues to mystify me that more programming languages have not picked it up like you don't have to take all of the decisions but like that one is so good. So you talked a bunch about what was good about MailCore, what the advantages are, but going and building your own thing isn't all sunshine and roses. Like what are the downsides of having built our own homegrown mail server? There are plenty. One obvious one that stands out to me is I've been referring to SMTP as this simple protocol. And the core of it is a simple protocol. You know, there really are very few things you need to implement to support the functionality
Starting point is 00:45:30 that you would expect of a basic mail server. But as we alluded to at the very beginning, when we were talking about the openness of email and the reality of the modern internet and all of the things that you have to consider that weren't considered when it was originally designed, there have been many extensions to SMTP and many extensions to the kind of surrounding mail ecosystem to add on an extra
Starting point is 00:45:50 level of security or extra functionality. And MailCore has to implement all of those if we want to get that functionality. We can't rely on some open source community or some vendor maintaining the mail server that we're using and just adding functionality as new specifications become approved or go into wide use. That, I think, means that this is a kind of a never-ending project in some sense. How about from a security perspective? I can imagine that thing cutting both ways, which is to say we have a lot more power to decide exactly how it works. At the same time, I imagine there are like rookie email mistakes that someone implementing a mail server can get wrong. And a mail server that's existed for 20 odd years
Starting point is 00:46:30 has had the opportunity to fix some of those. And we get to make those mistakes from scratch. How much of a role does that play? I actually expected more issues of this type, but we've seen fewer than I would have thought. And we have taken a good hard look at it and and considered that that angle i think a big relevant detail of the way the popular open source mail servers is written is that they're mostly written in c and so a lot of the security issues that they have run into over the years have been of the normal c memory unsafe style security flaw that many many many systems have been bitten by and writing our system in OCaml rules out that whole class of things or at least limits them to you know bugs
Starting point is 00:47:11 in the in the OCaml compiler or in in external libraries that we link in or something like that so that's a big plus the other thing that we get out of this is because we implemented you know we've written our own thing that means means that it's going to be a lot less common on the internet. There aren't that many other people using it. It's a lot less interesting to find a vulnerability in our mail server versus in some popular mail server that's on, you know, 50 million servers around the internet. And so I think we get a little bit of security by obscurity from that fact. Right. And the buffer overrun story you talk about, mail servers written in C, is really no joke. Well, I think Microsoft in the last little bit came out with a study that something like
Starting point is 00:47:51 70% of their vulnerabilities were buffer overruns for things that were written in C and C++. And I think it just highlights to me the importance, again, of programming languages in systems design. Problems at the programming language layer are incredibly hard to solve at higher levels, right? If you use a safe programming language like Java or OCaml or Rust, then there's a whole class of bugs that just go away.
Starting point is 00:48:17 And you can do things to try and smash the bug count above that. And do address randomization and all sorts of fuzzing testing and all that. And you can do that and it's effectiveization and all sorts of fuzzing testing and all that. And you can do that and it's effective, but it's an enormous amount of work and it doesn't get you to anywhere near as good of a situation as you would have been if you just use a safe language to begin with. So the kind of ongoing train wreck of people building internet facing software in C and C++ and other unsafe languages, like it to amaze me. It seems like a really
Starting point is 00:48:46 serious mistake, just from a security perspective, all other aspects of software engineering and language design aside. Totally. Yeah. We get a big win out of just not having to think about that and being able to focus our energies on other things. So we've spent a lot of time talking very positively about email, but at the same time, email is terrible, right? Like we all live in a world where we have way too much email. Certainly I live in a world where I have too much email. And email is, I think, kind of clearly the best collaboration tool I have used, the best
Starting point is 00:49:18 communication tool I've used. And for lots of things like Slack and whatever I like and I think are useful in various contexts, but, you know, you can pry email from my cold dead hands at the same time. Oh Lord, I wish it was better. And I'm curious, you've spent a lot of time thinking about email and about how email works at Jane street and not just the technical, but also the kind of organizational and human concerns. How do you wish email was better? There are kind of two trains of thought here that I want to cover. One is, how do I wish email was better as a protocol and as a citizen on the internet? And the other is, how do I wish email was better at Jane Street?
Starting point is 00:49:56 Or what changes do I think that we need to make? I'll start with the first one. I think the biggest thing that the world at large, the email world at large, has wrestled with for the past, I don't know, 20 years probably, and continues to wrestle with is a consequence of this openness that we've talked about a few times in the architecture and specification of the way email works. And the consequence is essentially that email is really difficult to authenticate. It's really difficult to know that in the kind
Starting point is 00:50:25 of core SMTP specification that a message was actually sent by the person who claims to have sent it. So this is where we get things like spoofing and phishing and other kinds of malicious impersonation, things that at their most mundane just result in more spam, more junk mail for you to clean up. But at their worst, this is how you get things like people pretending to be you and asking your bank to wire all your money to some offshore untraceable account or something like that. So it's a huge problem. And it's kind of a fundamental problem in the way that email is designed. And lots of people have made attempts at improving it, adding extensions to the
Starting point is 00:51:06 email specification and new protocols and things like that. To fix this, there are things like SPF, the sender policy framework, or DKIM, the domain keys identified mail. Both of these are just attempts to further lock down and authenticate email, whether it is, you know, authenticate that the person who sent it is who they claim they are, or authenticating that like the actual contents of the message are the same contents that the original sender intended to send to you. So these help a lot. And they definitely make a big difference. But but one of the issues that that crops up with with both of these things is that they require participation by both the senders and the recipients. So, you know, the sender has to be configured to authenticate and say, yes,
Starting point is 00:51:50 this email was sent by me. But the recipient also has to be configured to check for it. You know, it's kind of like the equivalent of, you know, it's one thing for me to carry around my driver's license and, you know, have a nice picture on it and my name and my license number and all that. But if you don't ask me to see it and you don't look at it and make sure that it looks like it's a real driver's license and that it was actually issued by the state and all that kind of stuff, then it doesn't really do anybody any good. It doesn't actually demonstrate any identity or validity. So this ends up being a big problem because if you're using a big provider, you know, somebody like Gmail or Microsoft 365, Google or Microsoft are going to be highly incentivized to build in a lot of good tooling and implement all of these things and do as much as they can to help you authenticate the
Starting point is 00:52:36 mail that you're sending and check that the mail that you're receiving was also authenticated. But if you're trying to run your own mail server or or even your own your own mail client that doesn't do some of the things that gmail or office 365 would do and you're trying to keep up with all these things as as new improvements crop up and and things like that it's just really really difficult and it's kind of a continuing problem you know once you've authenticated that the sender of the message is who they said they were, now you have this whole separate problem, which is, well, that's great. But what if other people aren't authenticating that mail claiming to come from you was actually sent by you? It's great if you check that the mail that I sent you actually came from me.
Starting point is 00:53:17 But if my bank isn't checking, then it's not doing me any good. And so this ends up being kind of a pretty hard problem to solve in a uniform and global way. You know, there's progress being made and, you know, continued improvements to some of these things and new ideas cropping up for how to make this better. But it's just a really hard problem. And a lot of it stems from all those nice things that we talked about with SMTP and all of its openness. And so it's just kind of a double edged sword on the Jane Street side of things. Kind of ironically, I think the biggest problem that we have with email is we send too much of it. I think email is great and I feel the same as you. I think it's a I think it's an awesome tool and it's a really, really effective way for a lot of
Starting point is 00:54:00 kinds of communication. But I think it's too easy to send an email and it's too easy to send an email to a large number of people. And it's too hard to remove yourself from a list of recipients in some case. So we have all these mailing lists internally that we use for organizing ourselves and making sure that people who want to follow along with different kinds of discussions can follow along.
Starting point is 00:54:20 But we don't have enough tooling to make it easy for people to understand the sheer impact that consuming email can have on your productivity, your ability to focus, your ability to do anything besides read and respond to email. So this is something that we are actually focusing on within our team right now, which is what kinds of information can we put in front of people? What kinds of tools can we build for people to either get things out of email, you know, to move things that don't actually belong in email into other systems? You know, you probably don't want your monitoring system to be primarily alerting you via email. That's just not the place, you know, you don't need everybody
Starting point is 00:54:56 a week later to see that you got close to running out of memory on some server at some point. You know, that's just something that's a transient fact about the world that you kind of don't want to deal with ever again once it's resolved. But we're working on a lot of tooling to make it easier for people to get those things out of email and into other systems, and also for people to kind of wrangle their inboxes, better understand what it is that's coming into their inbox and where it's coming from and why they're receiving it and how much of it they're getting so that they can make better decisions about what they should and shouldn't be getting. I literally ran into this issue this morning in that yesterday, shockingly, for the first time in about a year, I got myself down to inbox zero,
Starting point is 00:55:36 which is a mythical stage that one almost never gets to. And so, you know, as your inbox fills up again, you're like, oh, what is this stuff? And can I please turn on the stuff that's irrelevant? And there's emails I looked at. I'm like, how do I even know why I am on this mailing list? And how do I unsubscribe from it in a clean way? And it's all way more complicated than it feels like it should be. And then filters, which feel like they should be a good answer to this problem, are actually a surprisingly bad answer in a few different ways. One way is the filter language is
Starting point is 00:56:05 like Google's filter language in Gmail is surprisingly primitive. I can't say like, I want to not receive emails that I received only because I was on this. I don't want to, but if there's some other reason that I should have received it, I still want to receive it. And expressing that is surprisingly hard. And the other thing about filters in email that are difficult, some people at Jane Street have taken a kind of radical, extreme view of email where they like block everything and then whitelist the things that they want to see. And that means it can be very hard to know whether the email that you've sent to someone has actually gotten through or has just been filtered out by their system. But yeah, I the maybe the most important thing you said was this one about the cost issue that somehow giving some way of making people who
Starting point is 00:56:49 send emails feel the cost of sending it to those people right if you're going to write that email to 1500 people uh it's probably worth spending an extra five minutes editing it to make it as short and concise and direct as possible whereas if you're sending it just to your buddy who sits down the row okay that's fine you know send it in whatever form you want. But I think at times, it can be easy to forget the impact that sending an email that takes just an extra 30 seconds to read multiplied out over 1,500 people can have. That's just a big cost. And so we're working on ways of making that better. How? What are your ideas for making that better? I'm fascinated. I think the biggest one is putting that information in front of you when you send an email. So because
Starting point is 00:57:28 we have mailing lists, it's easy to just say, oh, I'm going to send this message to everybody at Janestreet.com and forget that everybody at Janestreet.com is a mailing list that contains everybody at Janestreet.com and just how many people that is and just what the cost of sending an email to that wide of an audience is. So we're working on ways to put that information in front of people at the moment when they're writing the email so that they can at least make a more informed decision so they don't sort of forget the impact that their message might have. I've seen some of the reverse problem. There are lists that sometimes people really want to be bothered by, like they want to lurk on. And I kind of want the opposite thing. I want to say like bothered by, like they want to lurk on.
Starting point is 00:58:05 And I kind of want the opposite thing there. I want to say like, yes, there's a lot of people on here and you shouldn't worry about them. They've signed onto this fire hose, but I don't want to be slowed down. If it's too much for them, they should sign off. One of these at Jane Street is the mailing list called compiler dev. And it turns out a lot of people like lurking on compiler dev because compiler questions are interesting and people like to kind of pick through them.
Starting point is 00:58:27 And we have this problem where people will, we have this organized. So compiler dev is actually the merger of two lists, compiler dev actual, the people really on the team and compiler dev also for other people who just kind of want to hang on. And then people will, taking seriously what you said, will email compiler dev actual instead because they're like, well, I don't want to email all of those people. And we have to go and manually redirect and be like, no, no, no, you should normally worry about the safety of your coworkers. But in this one case, people have decided that they really want this and have asked us to make it so that all the emails go here. So please redirect. Right. I totally agree. And I think I'm probably infamous internally for the amount of dispatching
Starting point is 00:59:05 between mailing lists that I do because I am like militant about making sure that a message has gone to the right mailing list. Even if you send a message directly to me, I might redirect you to a mailing list that contains only me just so that like if I ever decide to stop working on that thing, somebody else will start receiving your emails and you won't just cash that you should always send emails to me. So yeah, it's a problem that definitely cuts both ways. I think the other thing that I wanted to highlight that you reminded me of is when we were talking about filters, another big problem that we run into is a lot of the filtering technology, in addition to not being flexible enough to express some of the things that we want to to express is not really built to be used by groups. Whereas in practice, we organize ourselves in groups in many cases. And so if you have a team
Starting point is 00:59:50 of people that are on some support rotation, for example, there's really not a lot of value in each of them independently coming to their own conclusions about what kinds of emails they need to see first and what kinds of emails they can kind of just have to skim later on. And so we'd really like a better way to build tooling that allows for sharing of some of this stuff so that we can kind of implement it once and people can sign on to use the same well-developed general set of rules that somebody else on their team has decided on. And we've actually built some tooling to this effect. We've actually started doing the maybe predictable thing of generating some of our filters in OCaml and allowing for better sharing of those OCaml filters and code reviewing them
Starting point is 01:00:29 and doing all the things that we do with everything. And that has helped a lot. But we're we're looking for ways to kind of expand some of that functionality to, you know, support more things and add support for just some more expressiveness to the tune of the things that you were talking about before. And presumably, even in that group-oriented environment, you also want a composability story. You'd like some ways of sharing among a team a set of decisions about how to handle emails and also allow customizations. You could both want to use the trade support email filters and then also some friend of
Starting point is 01:00:58 yours who came up with a good set of filters for some particular case. You want to be able to mix that in and have some way of having the semantics at the end of that be something that you can reason about. Yeah. And there's an obvious scary thing that can happen here, which is a kind of smaller version of the scary thing that we were worried about when we were rolling out Mailcore initially, which is once you start sharing filters, now you've given somebody the ability to black hole all of your email. And it does happen from time to time with some of this tooling where, you know, somebody new to the team is like, oh, I'm going to add a filter. I'm going to, you know,
Starting point is 01:01:29 try to add support for this new thing that I ran into. And they accidentally confuse the rules in some way or end up with a filter that says send everything to the archive. And it takes a little while for somebody to notice sometimes. I think this highlights why wanting to have this kind of more complex shareable system quite naturally goes highlights why wanting to have this kind of more complex shareable system quite naturally goes along with wanting to have things like code review and testing, because suddenly you've taken what had been a very low impact thing of like, you're just mucking with your own filters, do a thing where you might black hole all the email for the entire trade support team, which is now a critical firm risk issue.
Starting point is 01:02:00 Like now no one who's supposed to support the trading sees any of the things you're supposed to be able to see. So maybe to close it out. So email's, you know, a big long-term system, which has been around for a long time and changed a lot over the years. And every now and then you hear people coming up with new things that are going to kill email and replace email. Projects like Google Wave, which was this grand new thing that was going to replace
Starting point is 01:02:23 email. And then instead the wave crashed and that was the end of that. I'm wondering, are you optimistic about the future of email? I am optimistic about it. I think that email in its flexibility and openness is something that it would be really hard to replace with any of these other systems. And I think there's a reason why it's had the staying power that it's had. You know, it's been around for 40 years. And while it's changed around the edges, and while we've had to adapt to some of the developments on the internet, at the end of the day, the core functionality has stayed basically the same that entire time. And I think the fact that it is so open and so flexible and makes it so easy for you
Starting point is 01:03:03 to build your own things on top of it means that it's got a long, bright future. And I certainly think that it's far from being outmoded at this point. Well, thank you very much for joining me. This has been a real pleasure. Thanks, Ron. You can find a full transcript of the episode, along with more information about some other topics we discussed, including a link to a talk that Dilo gave about MailCore, and also links to some of our mail handling libraries at signalsandthreads.com. Thanks for joining us, and see you next week.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.