Signals and Threads - Swapping the Engine Out of a Moving Race Car with Ella Ehrlich

Starting point is 00:00:00 Welcome to Signals and Threads, in-depth conversations about every layer of the tech stack from Jane Street. I'm Ron Minsky. It is my pleasure to introduce Ella Erlich, who is a software engineer here at Jane Street and who's a real Jane Street lifer. She's been here for about a decade. And we're going to talk today about the task of engineering legacy systems and a particular

Starting point is 00:00:25 system that Ella has worked on for a long time called Gord, which has been here for about five years longer than she has. And we're going to talk about some of the challenges and interesting work that comes out of all of that. But to start with Ella, can you tell us a little bit about how you came here? Hi. Yeah. I actually took an internship at Jane Street after my junior summer. I had no interest in finance. I did not think I was going to come my junior summer. I had no interest in finance. I did not think I was going to come here full time. I thought I wanted to do game design. And I was deciding on my internship between Jane Street and going to EA Games.

Starting point is 00:00:54 And a combination of two things happened. One is I was like, well, I'm definitely going to be in the Bay Area after I graduate from college. So let's spend a summer in New York and I'll go to some Broadway shows. I'll eat at some good restaurants, like this change thing, whatever. And I came and I had such a great summer and I learned so much. And I felt like I grew so much during that summer that I decided, you know what? This seems like a cool place to start my career. And, you know, maybe in a couple of years, I'll go out and do some game design stuff.

Starting point is 00:01:19 But like, that's fine. I'll go back to school and get a master's. And then, you know, 10 years later. Do you still spend real time on games these days? I spend a lot of time playing games and thinking about games and watching games. I'm a big pro League of Legends fan, go EG. But it's now I think a thing that I've learned that I'm very happy that is my hobby. And I think the more I've learned about the gaming industry, the more I'm like, Jane Street is the right job for me. It is the place I'm very, very happy to spend the past decade of my career on, a place I continue to be happy to come

Starting point is 00:01:49 to work every day. And I think that gaming is a great hobby, but I wouldn't want to do that full time because, you know, I think to some extent, based on talking to friends who are in game development and in that space, to some extent, I think game developers are a little bit the musicians of software. Most places, software companies, it's like a nice cushy job and game development is such a competitive environment where there are so many companies and they work long hours. The amount of growth and investment that Jane Street has been willing to put into me and the people in Jane Street have been willing to put into me, I think it is harder to find that in the gaming industry, again, based on the experience of talking to my friends who are

Starting point is 00:02:21 in that space. Yeah, hopefully people in the gaming world aren't too angry at us. I've heard similar things. I feel like being a game developer is a little bit like being a violinist. It's a hard thing and lots of people love it and want to do it. And so it can be very competitive. That being said, they make amazing things.

Starting point is 00:02:35 And I am so grateful that so many people are passionate about that because I benefit from it. It is indeed very impressive. Okay, so let's talk a little bit more about your actual work life here at Jane Street. I mentioned the system Gord before. Can you tell us a little more about Gord and what it does and then distributes it as a service within the firm so that we can do things like calculate our positions, upload our trades to clearing firms, settle our trades, and make sure that the money and stuff actually change hands at some point in the future. So you just said a bunch of words that are probably unfamiliar to people. Just to explain one of them, what do you mean when you say positions?

Starting point is 00:03:20 Yeah. So the idea is, right, suppose that you buy 100 shares of Apple. You can think of it now, instead of having some number of dollars in your bank account, you have that 100 shares. And it's important to know that because you need to be able to understand, like, the value of all the stuff you're holding, which means you need to know all the stuff you're holding. If you buy 100 shares of Apple, you'd say your position is 100 shares. And if you sell 20, your position is 80. So it's just tracking all of the stuff that we're holding that's, like, not just the money.

Starting point is 00:03:44 Right. And maybe a thing that's not obvious to someone who isn't in this world is there are many different things that you need to do with this data. So you might in real time at a trading desk want to know what your positions are, what your risks are, how much money you have made, all of that as you're going throughout the day. At the same time, you might want to do firm-oriented worst-case risk analysis based on the same data. Also, you need to make regulatory decisions. It turns out, for example, a common one is that when we sell something,

Starting point is 00:04:11 a weird thing that maybe people don't realize is sometimes you sell things you have. Sometimes you sell things merely that you've borrowed. And that latter thing is called a short sale. And you have to know what your positions are to know whether a given sale is a regular sale or a short sale. And there are regulatory marking requirements. So your moment-to-moment real-time interactions with the exchange are driven by and affected by this data because you have to change what you say to the exchange based on positions. And then there's all this often end-of-the-day reporting, uploading, synchronizing your information with various other players in the market,

Starting point is 00:04:41 and all of these things are driven off of this one central source of data. Yes. Okay, so that's what it does. From an engineering perspective, what are the requirements out of that system? So it has a bunch of interesting requirements that other systems might not have. So one of the ones that obviously matters is performance. We are very actively trading all of the time. And it's very important for Gore to be performant

Starting point is 00:05:05 to its peaks because when trading is the busiest, that is when it is most important for us to have accurate information to make good decisions. And so we need to make sure that we are able to perform to, you know, two or three X. And what we've seen on the most recent busiest day ever, because the next time it comes along, our busiest days tend to blow our previous busiest day out of the water by large margins rather than being an occasional like, you know, oh, we get a little bit bigger and a little bit bigger and a little bit bigger over time. It tends to be the case that busy days are significantly busier than we've seen in the past. And so we need to be always thinking about how much overhead do we have in the system and can we keep up with much busier than we've currently seen. And what's like the rough multiplier between a regular day and like the busiest day of a year? I think that it could be as much as, you know, 5x is pretty reasonable. Trying to think back to some historical ones. Part of it is, one of the

Starting point is 00:05:59 things that's interesting is over the last five years, our standard days have become what our busiest day ever was when I started at the firm or something like that, even probably more than that. And so the most recent busiest day ever we saw was about 2x, but also previous comparison points have done 5x or sometimes more than that. And so we are always waiting with bated breath for what the next multiplier will be, but we try and keep in mind at least 2x and probably more than that as a regular thing. 2x more than the worst thing that you've ever seen is something you want to be able to tolerate gracefully. Yes.

Starting point is 00:06:33 And maybe a thing to just keep in mind is like all of these services that are back-ended off of Gord means if Gord is down, the firm isn't trading. Yeah. And in addition to that, or I guess as a result of that, we have very hard uptime guarantees. As you said, if Gord goes down, the entire firm stops trading automatically because we can't make good decisions if we have no idea what trading we've done. So we work incredibly hard to make sure that does not happen basically ever. The system is designed to be able to fail in pieces in constrained ways that will hopefully avoid, you know, any sort of catastrophic failure. We have sort of an individual exchange might have problems, but the system is designed

Starting point is 00:07:10 to make it so that if that exchange were to have problems, it would not affect the rest of our trading. In addition to that, whenever we're rolling the system, we have extensive and somewhat arduous testing procedures that we go through to ensure that we haven't made any of these sort of fatal flaws that might cause a catastrophic outage. We also always run, in addition to having the sort of real production version of the system called the primary, we have a standby, which is also we consider production and is just processing all the same activity and, you know, keeping up with everything in real time, but is in an entirely different data center and on, you know, fully separate as much as we can make

Starting point is 00:07:46 them so that if we were to have a network partition and lose a data center or just lose a box, which with the number of actual physical machines we have happens with a surprising frequency, we could gracefully fail over as quickly as possible to continue trading and be ready to go and keep things up and running. So when I think about Gord, one of the things I think makes it challenging is the system is old, right? As I said, like you've been here about a decade. It's been here for five years longer than that. And it's, you know, to say kind of frankly,

Starting point is 00:08:12 it was written when we were less good at writing software and knew less what we were doing. And it's kind of a sign of that. Like I was involved in a bunch of the early design decisions around Gord, some of which I continue to regret to this day. And so a lot of the kind of story of the last few years of working on Gord, some of which I continue to regret to this day. And so a lot of the kind of story of the last few years of working on Gord has been about extending and growing and hardening the

Starting point is 00:08:30 system and figuring out how to engineer out of problems that were built in and baked into the original design. So maybe as a kind of way of looking at that, it's worth talking about what Gord looked like when you first came to the project. When I started, Gord was sort of the core of what it is, is very similar. So the pipeline we have is there is sort of sources in the world, which are like a data feed from an exchange. And there's a process called a parser that gets a data feed by SSHing to the box and tailing a file on disk that is in a pipe separated numberparated number equals values format. Maybe someday we'll change that. But that's not when we've had actually a huge amount of success changing this far,

Starting point is 00:09:13 though we do have some plans on that front. But so we SSH to the boxes, we tail files. And then we need to take that data. And one of the things that was, I think, a decision that was made early on was that Gord would deliver quote-unquote correct data all the time. And what this means is that Gord actually does a lot of making up of data when we get bad data from exchanges. So for instance, there was a promise that was made that Gord would deliver an order before it saw a fill. A fill is another word that

Starting point is 00:09:40 we might use for trade. You might also hear me use exec interchangeably. I will try and be consistent, but we use fill and exec very interchangeably within the team. And so sometimes you may hear that. So when we get a fill, if we have not seen an order, Gord makes one up. We think in retrospect, this was a mistake, but this is baked into a ton of things. And a lot of things assume that when they see a fill from Gord, they will have previously seen an order. So we've been sort of dealing with the process of unwinding all sorts of legacy decisions like this. In addition to that, Gord is going to always deliver normalized symbology. And so maybe you have the stock Apple US and you're trading that on the New York Stock Exchange. The way they represent that might be something called a Reuters code, which is something that

Starting point is 00:10:24 might look like AAPL dot P or dot some other thing, depending on the extension they need. You might be trading this on an exchange that uses a Bloomberg, which will look something like AAPL US. Bloomberg, when I say Bloomberg, I mean like Bloomberg symbol, one of the standard symbology formats in the industry. Yeah. And it's maybe worth saying Bloomberg, not just the mayor of New York, also this company that makes an extremely pervasive terminal that all sorts of financial professionals, including us, use to access all sorts of data. And they're kind of an excellent data provider of last resort. You might want to get direct data from a particular exchange to get the most granular or most timely version of something, but Bloomberg has everything. And so they're a kind of foundational data supplier and they have their own symbology. And like, actually this is like,

Starting point is 00:11:09 I think summarizes a kind of terrible thing about this whole financial world. It's a big and weird source of complexity is that just understanding what the names of things are is actually incredibly complicated because you have different sources of different names from different places. The exchange names things one way, data providers name things other ways. And actually, the U.S. is relatively tame. U.S. equities markets in particular are relatively tame in this regard. But if you look at foreign exchanges, you look at all sorts of different asset classes, it gets exponentially more complicated. The task of just figuring out what things are called is a surprisingly complicated normalization job that Gord is involved in. Yeah, it's one of those definitely standards

Starting point is 00:11:48 problems of everyone has some way of describing their thing, and they look at all of the ways people describe symbology in the past and say, oh, I'm going to do something better. And then you have, you know, 18 million standards, and every one of them is bespoke, and every one of them has edge cases and corner cases that are weird and hard to deal with. But yes, so one of the things we have to do is normalize symbology. We also normalize a ton of other information, like usernames are an easy thing where you probably want to know which trader at Jane Street was the person who did the trade. And if they're trading on a different platform, it might represent that with their email address, or they might have a specialized login. So we need to take all of

Starting point is 00:12:21 those different bits of information, have a mapping that turns them into the actual Jane Street username that we use internally, and do this for a really large number of fields. One of the things that we've been sort of working towards over the years is moving more of this normalization into the sources where that information is more naturally contained. But this is still a problem that we absolutely have to deal with. And as with sort of all of the things, because it's a legacy system, which has a lot of reliability guarantees, a common thread will be, we're moving slowly in this direction,

Starting point is 00:12:51 but it's difficult to make progress because you can't break the system out from under anybody. And this, by the way, this particular bit of complexity is very old because it goes back to the original design and purpose of Gord, where you said, oh, we have all these data sources that we're changing so that they do the normalization first and Gord where you said, oh, we have all these data sources that we're changing so that they do the normalization first and Gord doesn't have to. But when Gord was first written, that was not an option because the original version of Gord was just reading out, like we

Starting point is 00:13:15 would get data that we just morally speaking, just downloaded from exchanges and brokers and dumped into a file. And then the task of Gord was to slurp all this data up together and normalize it. And there was no other intermediary that could do that. Now, today, most of the trading activity we get is not something that comes from some third-party system. We have some system that's directly interfacing with and understands the details of the particular place that's being traded with. And the software developers and people who are working with that system are well positioned to understand what's going on and think about the normalization. So that's a kind of shift that you're talking about of going from one system in the middle of the world that has to

Starting point is 00:13:50 get everything right to a kind of more distributed thing where people in different parts of the organization understand what they're doing and can solve the normalization problem locally, rather than shoving all this crazy normalization on one small harried team. Yes. There is still a surprising amount of trading that we do that comes from these sort of weird direct third-party sources, but the majority of our trading is now done through systems we write. And we've introduced another step that sits in the process between just that direct download of data from the weird third-party place that can also add in a normalization layer there that can hopefully get that specialization into a better spot for it.

Starting point is 00:14:27 Got it. So it sounds like one of the core responsibilities of Gord is normalization. And some of that normalization is what you might call kind of field normalization. Like there's some piece of data and we have to say it in the right language. And some of it is kind of more almost like protocol level enforcing of invariance, making sure that messages come in the right order and have the right meanings and translating the transactions you see from the other side into a single consistent language of transactions. Yeah. And I think a number of choices that were made on that second one, especially are ones that we're trying to unwind or avoid or fix going forward. Easy example of this is like the sort of core types that Gord knew about when I joined the

Starting point is 00:15:05 team. There were orders, there were outs, which is basically when saying an order is closed, and there were fills. And if we wanted to encode anything else, we had to shove it into one of these types, often in ways that didn't well fit. So for instance, maybe you send a message to an exchange, and rather than getting filled, you get a reject. Rather than adding a first-class reject type to represent what just happened, this was represented in Gord by making up an order when we saw a reject and then making up a closed message. And so there was a field inside the order that was like, it's a synthesized from a reject. And so to figure out, you know, are we getting a lot of rejects? You had to go look at the internals of the field as opposed to having a first class type that represented this. And this sort of thing happened over and over where it was, oh, we need to represent a new thing. Let's just

Starting point is 00:15:52 figure out how we can kind of shove it into the existing data model. And one of the reasons that we had to do it this way is because the original way that clients connected to Gord was directly over TCP. And so the thing that we did is we had basically a type that represented our messages and we used a protocol called binio, which just turned it into a binary format and they had to be able to read it out the other side. And the versioning story for this was there wasn't one. The way that you added fields to Gord was we had a string, string map of fields, because if you wanted to add a new field at the sort of top level type here, you would actually need to make all clients roll

Starting point is 00:16:30 their system forward to be able to understand that. Just to jump in there, when you say a string-string map, what you're saying is the basic data representation, instead of being something that had a fixed clear type, it was just like a bag of fields. Key value pairs and you threw them together so you could freely add new fields and take away fields and change the composition of the fields without having to change the format. On the face of it, that description sounds like, oh, this is great. You want to evolve the schema. You want to evolve the way in which you represent the data. There's nothing in the data format that gets in your way. So what got in your way? Yeah. So first of all, things that are difficult here are it makes it harder for people who are

Starting point is 00:17:03 using things to know what fields they should be using and how they should be using them. Again, we were able to represent new things by being like, well, here are more things we can put into our bag of fields. But we couldn't actually change the top level representation. So when we wanted to do something new, we had to simply figure out how to represent it as just more fields attached to a thing that was increasingly less representative of what the original design of like an order was. When you see something that's an order, surely that's an order. And the answer is no, sometimes it's a reject is a confusing one to explain to people. Another example of this is when you see a fill. A thing that might happen to your fill is it might be corrected or you might get something called an allocation, which is

Starting point is 00:17:43 basically when we do the trade initially, we don't necessarily know what account it's going into for some complicated reasons. And later you get a message saying, put that in this account and that's called an allocation. And again, all of these things were just represented as adjustments to the fill by sending the full fill message again. So you say, cancel this fill and rebook it. You had no ability to do like smaller adjustments on top of your data in any way. Got it. So in some sense, there was this gooey, extremely flexible, dynamic representation of the data, which you might think would free you up, but actually it locked you down because what happened was people on the other side who were consuming the data just implicitly in their code depended on the structure that was embedded in there. And so despite the fact that the file

Starting point is 00:18:23 format or the message format lets you change things however you want, in practice, you couldn't change things without breaking your clients. That's right. And because of where Gord sits in the firm and the role that we play, we have hundreds of clients and many of them are incredibly critical to trading. And so breaking our clients is not really an option for us. A thing that we used to do in the previous version, which I'll explain what we do now, thing that we used to do in the previous version, which I'll explain what we do now, but when we wanted to do a fundamental change to the version protocol, we had to basically add upgrading or downgrading inside of Gord and know the version our clients were connected to and then do the downgrading ourselves for them. And this is actually just

Starting point is 00:19:00 kind of expensive. It turns out one of the most expensive parts of Gord is serializing information. And when you're sort of downgrading things for clients, again, the way this was initially done was clients would open up a direct TCP connection to Gord and they would get the messages sent to them and the downgrading would happen server side. And there was some stuff baked in to try and make it, you know, reuse stuff appropriately. But if you had four versions of your clients, you're now serializing the message four times. And then when you're doing this and trying to send data to hundreds of clients, this actually gets very expensive. So we avoided this by trying to make the type as sort of gooey and flexible as possible and not really having the ability to change it at all.

Starting point is 00:19:37 And this, to be fair, worked for a very long time. But as it expanded to cover more things, and as we've, as a firm, have gotten into more new and different kinds of trading where we wanted to be able to represent more interesting message types. One thing you said in there was that this GUI representation didn't make change at all. But I would think that's not quite right in the sense that there are some kinds of changes that you could make freely. Anything where you're just like, here's an extra optional piece of information and I just want to add it on. The GUI, I just have a collection of key value pairs. Yeah, you can just add a new key value pair, and anyone who didn't know about it will continue to not know about it. And anyone who wants to see the data can and can interpret it. So some kind of changes were easy, and I guess some kind of changes were hard. And maybe

Starting point is 00:20:16 an example of the kind of change is if you want to add a new kind of transaction, well, that's a thing that everyone who consumes the feed needs to understand. And that's the kind of thing that has to be one of these breaking changes. Yeah, that's true. And the sort of core types we were representing things with were relatively large. And so one of the things that's expensive about Gord is serializing all this information. And so as we were starting to do new kinds of trading that required many more messages, but most of the information on them was very, very small. We did not want to encode that in our existing types because it would be very, very wasteful in terms of the amount of overhead we would have to include. So we made a new format that we use to represent things. We've talked a lot about the data and kind of at a,

Starting point is 00:21:00 in some sense, what was Gord's responsibility in terms of getting and normalizing and distributing that data? But what was the system architecture itself like? You said this was Gord is like a global order database, which isn't exactly an order database, though it is global. But what is it? How is data distributed in Gord? Yeah. So I mentioned we had parser and then there was a system called the DB that basically collected the information from the parser, maybe made up some more extra fake information around if we got to fill without an order, adding that information, then it would distribute it to a normalizer in each office over TCP. And the normalizer's job was to apply all of those normalization passes in terms of, you know, fixing the symbology, fixing the users, all the things

Starting point is 00:21:39 we've kind of mentioned there. And then it would distribute it on to clients again, directly over TCP. And so one of the biggest problems that Gord had at that time was fan out, because we had hundreds of clients. And so we were sending the same messages for our full data stream out to all of them. And that was a pretty expensive process. So you talked before about the contract that Gord had with its clients, i.e. the things that were consuming data from it. What did the contract between Gord and the data sources it had, the things that were consuming data from it. What did the contract between Gord and the data sources it had, the things that were feeding data into it, the exchange connections and things like that,

Starting point is 00:22:09 what did that contract look like? That contract was mostly, you will give us things and we will keep up and we will never fail. It is a hard contract to maintain. So yeah, we didn't actually have a spelled out contract with our sources of like how fast they could give us information. We had things that, you know, hand waved, like maybe a Gord dev had agreed with a friend of when they first wrote it, friend being one of the automated systems that writes activity to Gord.

Starting point is 00:22:33 But maybe two developers had at some point agreed to some value, but that was certainly not written down anywhere. And there was nothing that enforced it in any way. And if Gord ever fell behind, again, we would halt things. So our policy was we didn't fall behind. This was obviously kind of an impossible thing to maintain as we grew, especially with a lot of the parts of the architecture. So we had to figure out, like, when I started on the team, there was no performance testing that we did on a regular basis. I had to build performance tests to know what our maximum rates were and how close to them we were. We were lucky at the time that the firm's activity was small enough that we could keep up with it on a regular basis. But we had to learn, at what point do we require

Starting point is 00:23:10 our upstream sources to shard their activity into multiple sources? Because one of the things that is really nice about Gord's architecture is it parallelizes incredibly well. A single pipeline of PurserDB normalizer can keep up with about, it can do 50,000 messages per second, but there's some things on the end now that cause that to be slower for good reasons. So we're going to say 25,000 messages per second is like allowable, but we can parallelize that across many boxes and many copies of this pipeline. So the actual throughput rate of Gord scales arbitrarily, and we have not run into real issues with just kind of continuing to expand horizontally. But understanding what our sort of one single source pipe throughput was was a thing we

Starting point is 00:23:52 did not know, and we had no promises about at the time. I guess one of the things that backs this is that the basic architecture is what you might call in other contexts, eventually consistent. Meaning you have all these data sources that are providing data. And the ordering guarantee when you consume data is that the data from a particular source will come in the order that it was entered into the system. But you can get kind of shearing between different data sources. Things might come in different orders for different consumers. And that allows you to build a system where there aren't a lot of tight dependencies. In some sense, in exchange for having lighter guarantees

Starting point is 00:24:25 that you provide to the users, you make the scaling story simpler and easier. Yes, that is a fundamental thing that we kind of can't build the system without because trying to keep up with the actual full throughput of Gord in a single thread or a single process would be well beyond what especially our architecture of OCaml at the time could handle.

Starting point is 00:24:44 Even now that we have put a lot more work into understanding how to make very performant OCaml, it would still be a very difficult task to handle the sort of entire stream in one single thread or in one single process or even on one single box. But it is the case that because basically none of our users actually care about the hard ordering guarantees between things that happen on different exchanges.

Starting point is 00:25:05 Because, frankly, you kind of can't care about that because, again, they happen on different exchanges. So maybe there's a delay from the exchange sending it to us rather than from our own internals. You can't actually guarantee that you're going to see every event in the order that it happened in the real world because that's kind of not a meaningful statement. So we just make promises within a single upstream flow through. Even if like the ordering isn't externally meaningful, you could have a thing where the system agrees upon a single ordering and everyone always sees things in the same order. And I think that has some upside. It lets you do replication and things like that in a simpler way, but it also has massive downside because it limits the ability into which you can recover. Like one thing that I remember us thinking about a lot is, well, it turns out we are a distributed operation. We have

Starting point is 00:25:48 offices in New York and London and Hong Kong. And if we lose a connection or have a degraded connection between two different offices, we might want to be able to kind of continue operation concurrently in both of those offices, even though the information from the other office is somewhat degraded, right? And so that's an example where you, in some sense, are relying on the fact that you have this kind of softer guarantee to give you better availability than you could get otherwise. Yeah. Like I said, very, very, very useful for us. If we gave up on that guarantee, we would be very sad, or that lack of guarantee, perhaps. Okay. That's kind of a tour of what Gord was like when you first came to it. What are the

Starting point is 00:26:22 things that you've worked on since then to kind of grow the system and kind of repair some of the problems that were there in the original design as you saw it? One of the big things that was being worked on when I joined the team was trying to change this process of all of our clients connecting to us directly over TCP. This was a very difficult scaling problem for us

Starting point is 00:26:41 that was causing real problems because every time you added a new client, you had to serialize the data to them. And so the amount of work you had to do as the sort of gourd service scaled with the number of clients. And as the firm was doing more things and getting bigger, this was just unsustainable. So what we did is we actually looked externally. We saw Kafka, which is a thing developed by LinkedIn, which is a distributed message queue that has really good properties for scalability. And in addition to that, one of the properties that was really, really nice

Starting point is 00:27:10 is rather than being a push-based system, it is a pull-based system for clients. So one of the things is when Gord was delivering activity to its users over TCP, it had to basically just like clients would open up a pipe and then Gord would just send them messages whenever it had them. And if the client was slow, Gord would have to buffer data because it has to get buffered somewhere. And so because we did not want to hold arbitrary amounts of messages in memory per client, if a client was too slow, we would just kick them off. And it turns out, while many systems do care about being highly performant, there are a lot of other systems where their performance just doesn't have to be that high. For instance, if you're an end-of-the-day regulatory reporting thing, you don't actually need to keep up with Gord at its busiest. You just need to make sure that by the end of the day, you've been able to process everything.

Starting point is 00:28:04 And so this notion of people had to be able to perform to Gord's peaks to write a Gord client was very painful. And so Kafka has the property that clients ask for messages when they're ready for them. And it just stores them in memory and can give you an arbitrary message out of the stream that you want if you have the right sort of pointer to it. And so this allowed us to sort of change that dynamic, which meant that if a client was slow, they would fall behind the tip of the stream. And we might be able to monitor and alert them about that, but we wouldn't actually make them die. And again, if you're a process that has to actually process every message

Starting point is 00:28:31 and you can't keep up, well, when you die, all you're going to be able to do is restart and just be sad. And make the upstream service sad again. And make the upstream service sad again. You're just creating more work. Yeah. There's something confusing you said before where you said, oh, we had this problem where we opened all these TCP IP connections.

Starting point is 00:28:47 We had to do a bunch of work per client as the result of that, of sending the data to them over TCP IP. And then we switched to Kafka and that made things better. But Kafka also uses TCP. So like, where's the magic? Yeah. Really what was going on before is not just that we're using TCP, but that we were doing a very naive thing with our use of TCP. So when a client connected to us, we would simply buffer the data for them. And if they fell behind, we would just continue buffering the data in memory. And just to interrupt for a second,

Starting point is 00:29:13 when you said we would, right, because this is like a part of Gord that is a very old mistake. We just, you know, like morons, picked up the operating system APIs and used them without thinking very hard. And so we said, oh, there's a client. They have a TCP connection. We send them a message. We send them a message. We send them a message. And the buffering is then done by like some combination of the operating system

Starting point is 00:29:33 and the low-level systems code that manages the individual TCP IP connection. So in some sense, separated from what you think of as the application layer of Gord itself, there's all this behind-the-scenes work and buffering that occurs. Yeah. And we weren't really thinking about that. We did think a little bit about the fact that serialization is expensive. So we tried to serialize only once, but there was still a lot of stuff with this buffering that we were just accepting what the operating system did and not really thinking about it. And this turned out to get just more and more expensive as we had more clients. Kafka solves this problem a lot more intelligently because it's in the sort of architecture of the system.

Starting point is 00:30:06 So when you write a message into Kafka, it like stores it on disk. And when a client is connected, they ask for messages when they're ready to handle them rather than being push messages. And so if a client is not keeping up, then Kafka is not trying to buffer the messages in memory that they're going to need eventually.

Starting point is 00:30:29 They're there on disk if the client wants to ask for them, but it's not holding them and trying to be ready and also give them to the client actively. It's worth saying this question of what's push and what's pull is kind of subtle. At a low level, TCP is kind of doing something like this, where there's like, there are receive buffers that the operating system can, and when the receive buffer is full, it is in fact going to stop pulling data in. But at least from the APIs that we surfaced to programmers, we gave people an API where like, oh, here's a TCP IP connection. Just send data that you want to get there. And eventually we'll make sure it gets there. From the point of view of the sending program, it was push oriented. And we kind of weren't keeping track and weren't thinking intelligently about like how the resources and buffering of data was happening in between in that case. And at the level of the TCP IP connection, it's kind of too

Starting point is 00:31:09 low level to do anything smart. Like it can't understand that they're sharing at that level without having something that's explicitly binding all those things together and understanding them as a kind of single logical message queue. Yeah. One of the other things that was really expensive when a client connected to Gord that I haven't really touched on here, but also Kafka made much better for us, was the idea when someone connects to Gord, they often want to know everything that had happened up until that point. So we would generate a snapshot for them of all of the activity. And that generation of a snapshot had to happen per client because people would connect to us at random times. And so again, Kafka allowed us to basically just have a side-along process that was consuming the stream and occasionally snapshotting,

Starting point is 00:31:47 because it is nice for people to be able to catch up more quickly. But again, there, the snapshots could be stored in a ready-to-go format, and then you would just read from the latest snapshot and then pick up some point in the pipe after that, as opposed to snapshotting per client that connected. In some sense, the answer to my question of where is the magic is, there is no magic. It's just a bunch of engineering that you need to do in order to build a message queue. And rather than have this kind of message queue functionality

Starting point is 00:32:13 be sitting in a deadly embrace for the rest of the functionality of Gord, what you did was separate them out. Use some standard, well-engineered, already existing piece of infrastructure for doing the message queue part of the work, and then focusing the work that we did on Gord on the actual part where there was a unique value add, which is the core part of the pipeline of consuming and normalizing all this

Starting point is 00:32:35 data. Yes. So that's one big move you guys made to improve the architecture of the system. What else have you done? One of the other ones that was a pretty big deal is I mentioned briefly, so the way that our upstream sources wrote us messages was using this protocol called fix, which is we represent internally for the files that Gord was consuming as a tag equals value pipe separated thing. And there were no constraints on this. So a common problem we would run into is someone wanted to write a new source to Gord, and they would have to contact us and be like, so there's like 100 different fields I could provide. What ones do I need to provide, and why, and how, and what's relevant? And we as Gord devs kind of

Starting point is 00:33:16 had to know, oh, transaction time is required for all execs. And for bonds execs, you must provide a settlement date. But for other kinds of execs, that's actually an optional field. And there were a number of these invariants in how our upstream sources had to write us information to deliver that information in a way Gord would interpret properly that were not reified anywhere. And we sort of provided no help in people creating these. So we've made a new library that we are currently in the process of working on and continuing to iterate on. This is a slow process to get it into a polished state where if someone wants to write us a message to Gord, they have a library where they can say,

Starting point is 00:33:53 like, here's the type I want to fill out, which is, you know, a record with all the fields that are required for a bond exec or a wholesaling fill or, you know, a different kind of message that they might care about. And we tell them what fields they need to provide to us. And then they don't have to think about the translation layer into how Gord is actually going to consume it, which also gives us the ability to hopefully someday change that format from being this, you know, files on disk thing, which, you know, we have some stuff in the process we're hoping to do there. But that's a big one where it's like making it easier for people to write correct data to Gord with less iteration on our part.

Starting point is 00:34:28 So one way of thinking about that in some sense is there's a lack of types or schemas or something like that in the story, right? Both when people are handing data over to you and when people are consuming. Gord is acting as this very complicated rendezvous point. There's all these people who are handing data over to it. There's all these different sets of people and hundreds of applications, hundreds of code bases that are consuming and reacting to that data. And the Gore team is sitting in the middle, needing to understand important things about all the consumers and all the producers and try and guide all of those pieces together. And you're kind of in this middle step. What you're doing is providing a library, which is at least a place where Gord devs and the people who are providing data can collaborate and think at the level of logical and highly structured transactions, right?

Starting point is 00:35:15 The other team can say, here is the type that represents the thing that's happening. And then the Gord dev team can think about how to translate that into the GUI internal representation and think about its processing through the rest of the pipeline. And the longer term game plan is to actually get rid of the GUI internal representation and go to something that has more structure the whole way through. Because the amazing thing about type systems in some sense, and like there are type systems that sit inside of programming languages and there are type systems that you have that are on message types that span across different applications. But they have this lovely property of like locking systems together and translating force across them.

Starting point is 00:35:49 If someone wants to make a change and they need to modify the types and you see how those type changes flow through, you can understand by that process of flowing the type changes through the effect on a broad set of the infrastructure. And it helps that we have everything in this big monorepo world

Starting point is 00:36:04 where all of the different pieces of code, at least the most up-to-date versions that you would get if you rolled a new one, are all sitting together and compiled together. And you can have these invariants that cross widely separated systems. So in some sense, the world you're moving towards is one where you get to leverage types kind of more consistently in the way that you build systems. Yeah, yeah, for sure. Another big one that we've been, again, working towards over the course of many years is when I first started on the team, you could consume the entire Gord stream

Starting point is 00:36:31 or you could consume just the fills. And those were kind of your two options. Around when I was joining the team, we added a new stream called monitoring, which was a place where people could put sources that didn't affect our bookings and our positions. But it turns out this process Gord was providing of normalizing data was more useful than just for the things that we were making sure we had to actually track in our positions or upload to clearing firms. There were more use cases people might want to have for it.

Starting point is 00:36:57 So we introduced the monitoring stream. And it turned out two things, either the real thing or sort of all of the real thing or all of the fake thing were actually insufficient as we continue to grow and do more things. It was useful for people to be able to subscribe to all ofits of activity people might want to subscribe to and give people pointers to options data or other things like that. Rather than if you want the options data, you have to subscribe to the full stream and just filter out the stuff you care about. So is this a kind of static segmentation of the flow or a more dynamic system where you get to express, here's like a predicate that represents the data that I want, or here's some kind of logical specification of the data I want, and you get that? Or is it just like a kind of physical breakdown of it into different substreams?

Starting point is 00:37:52 It is currently relatively static in that we have to know ahead of time where these things are going to end up. However, we have some dreams about making that a little bit more dynamic over time, but that is something that is still quite a ways out. But even so, the pre-ordained breakdown still provides a lot of value to people and still makes it much easier for people to consume parts of the information in smaller subsets. But obviously, we'd like to get to a world

Starting point is 00:38:16 where you could be like, give me information about things with this symbol, which would be very nice, but that is still a ways out. So you've talked a bunch about ways in which you've worked on extending Gord. In some sense, kind of at the core functional level, changing the APIs and changing the way data flies around. Are there things that you have had to do at the engineering process level

Starting point is 00:38:35 to deal with just the extreme combination of growth and criticality of the system? As it grows in complexity and the number of things you have to support and the amount of data, and it's still the thing that the entire firm depends on for all of its trading. What have you done at the level of trying to build a process that lets you reliably make changes and efficiently make changes? When I started, there were no in-code tests. Not quite none. There was like maybe one or two files of very small amounts of things. But there were virtually no in-ode tests for anything in the system.

Starting point is 00:39:06 In addition to that, the way that we verified that we had not broken anything was we would roll our system to dev and we would just look at the diffs in the output from production and the thing we were testing over the course of many days to just try and make sure we had seen enough variety of thing that we'd be confident that we weren't going to fundamentally break anything. In addition to that, there was a really wide range of testing procedures that we did,

Starting point is 00:39:33 and many of them we still do, when we're rolling to production to ensure that we don't break the system in ways that breaking the system would, again, halt all the trading. We don't want to do that. So we had a lot of arduous procedures. When I joined the team, we rolled the system a few times a year. EGAD. Yes. One of the things that we are now at is we roll about once a month, which is still quite slow. And indeed, for a firm that does as many new things as Jane Street and on such a frequent

Starting point is 00:40:03 basis, this can be a pain point. We work really hard to try and make sure we know what's coming in advance and that we're well prepared so we get things out in time, but it is still a painful thing. And we would love to get to a spot where we can roll the system more frequently. And we're doing a lot of work in that direction and have done a lot of work. One of the early things I did, I mentioned we had no performance tests. I added performance tests so that we could measure the performance of the early things I did, I mentioned we had no performance tests. I added performance tests so that we could measure the performance of the system every time we rolled to make sure we hadn't introduced a concerning performance regression. In addition to that, we built in-unit tests that tested the entire architecture of the system so that we could, you know, have our input message types all the way through to our output types. The way that we tested things when I started was you would literally spin up an entire dev version of the system and then you would manually construct the activity that you wanted

Starting point is 00:40:47 to run through it and then you would run it through. And this process was like slow and pretty painful to set up. So you basically just had to trust that all the other developers on the team were following the procedure and were doing all the things appropriately. And you'd look at like a few sketch notes in their testing of the feature. You know, they checked all the right things. Once we had in-unit, in-code tests, we could actually add a test for the behavior change you intended and demonstrate that it had the effect you intended. This allowed you to make changes a lot more confidently and faster on that note. There are a number of things about how we roll that were relatively painful processes that we have done a lot of work as a team to improve. When I started,

Starting point is 00:41:25 the way that we rolled Gord was there was a symlink in basically our virtual file system that pointed to the binary of the current production Gord. And so the way that you flipped Gord to make a new version primary was you went in and you manually changed that symlink. Honestly, there are things that could be worse than that. But the thing that made this really painful was Gord is global. So today is not today everywhere. So what would happen is around 3 p.m. New York time, which is around 3 a.m. Hong Kong time, Gord would start up for tomorrow.

Starting point is 00:42:01 So you had to change the sim link for Hong Kong between when Hong Kong Gord shut down, which was 9 a.m., and 3 p.m. when it started up again in New York. And then London would shut down at around 5 p.m. New York time, which was around 10 p.m. in London. And you'd have to change the sim link in London at 5 p.m. before London Gord started up. And then when Gord shut down at 9 p.m., you had to go in and change the sim link in New York. And if you fail to do any of these things, then Gord would be considering different Gord's primary in different offices. And so clients would be getting weird data and like this would cause many, many major problems. And indeed, this was a thing that had survived because the people on the team were excessively careful and thoughtful

Starting point is 00:42:45 and like it was you know a thing that survived right up until my second role. I have ADHD and remembering to do specific tasks at very controlled time windows is a thing that I am bad at and so I think this was actually my second role on the team that I was doing. I rolled Hong Kong at 10 a.m. And then I went home for the day and at 8 p.m. realized in a moment of horror that I had not changed the sim link for London. And so at 8 p.m., London has already started up. So we then had to do a relatively painful rollback procedure, and it turned into a relatively large production incident all over, forgetting to change a symlink in a particular two-hour window. So we fixed that. Now it is the case you basically can stage a change, and none of this symlink stuff is used anymore.

Starting point is 00:43:36 There's a process that tracks which Gord is primering on which day, and it's just a config file that you can change in advance. You can say, I would like Gord to flip three days from now, and it will just do it for you. Everything is fine. It's very automated. There's none of this sort of, it understands the notion of date. And then Gord's just figure out what date they're currently running on and use this file to say, am I primary or not? As opposed to the thing we had before, which was not awesome. Right. Anytime you have something that depends on people just being repeatedly careful over and over, you're eventually going to run into a problem. The story also highlights just a weird thing about our infrastructure. And I think in some sense about the trading world, which is the notion of day, right? Like you have a trading day and lots and lots of applications

Starting point is 00:44:17 start up at the beginning of the trading day and shut down at the end of the trading day. And that's historically early on how we did kind of everything. And the complexity of Gord is it was kind of straddling those two worlds. We have basically three regions in which we operate, centered in New York and London and Hong Kong for North America and Europe and Asia. And each one of those Gord instances had its own trading day, and it would start at one point and end at another point. But also they communicated with each other and shared information. And so you ended up with this kind of weird handoff and overlap and shutting down and

Starting point is 00:44:52 bringing back up. And this is increasingly not how we operate. More and more, Dean Street just operates 24 hours a day and more of our systems have been moved over to this model. But it's just kind of a good example of the kind of complexities that are like somewhat unique to the historical story about how the system evolved. Yeah, one of the things that we want to do is make it so that Gord is more 24-7 available. But there's also a lot of really nice things that come out of being able to shut your system down, like having reboot windows and being able to archive things at a clean time when nothing

Starting point is 00:45:23 new is being written into the system. We are working towards a world where we will be able to set up our boxes dynamically and fold boxes sort of in and out of being used on a dynamic basis and then eventually hopefully get to a point where we actually can have Gord be 24-7 without all of these sort of pain points. But because Gord wasn't initially architected with this as the plan, it is again a relatively arduous process with many steps along the way to get from point A to point B. Okay. So let's switch gears. We spent a lot of time talking about software engineering, which is the core role that you have. You also do a lot of other things at Jane Street. And one thing you've been involved a lot in over the years is with

Starting point is 00:45:57 recruiting and with our internship program and a lot of things surrounding that. And you've also done a lot of work specifically on recruiting of underrepresented groups of various kinds. I'd be curious if you could tell me a little bit more about the kind of efforts you've been involved in in that part of the world. Yeah. When I started at Jane Street, I was the second female developer at the firm. Jane Street had already been around for many years, and so this is kind of sad. And people recognized that this was sad, but they didn't really know what to do about it. And so I, relatively early on, got involved in helping us figure out how to solve that problem. One of the things that was a big issue is that there is a view in the world that finance is a pretty bro-y and unpleasant

Starting point is 00:46:34 place to work as a woman. And while I have not found this to be true at Jane Street, I think the culture is excellent. And I've had really great experiences with all my colleagues. I mentioned at the very beginning, I had no plans on working in finance when I graduated. And had I not done an internship at Jane Street, there's no way I would have even considered it. So a big part of what we've done is we've done a ton of outreach programs aimed at finding folks that might be actually a really good fit here and might really enjoy their time at Jane Street. And giving them little tastes of what it's like to work with this, what the culture is like, getting them to meet a bunch of people. A lot of the way we've done this is through a bunch of external recruiting programs we've built. One of the first ones of these that was created was

Starting point is 00:47:14 actually happened the year after I was an intern. I started helping out with it relatively early on when I came back. This was called Women in STEM, which was aimed at bringing a bunch of folks between their senior year of high school and freshman year of college just to come see Jane Street and meet some of the people and hear a little bit about what we do and put it in people's brains that this was, you know, maybe a thing to consider when they were applying for internships down the road. Indeed, my very first intern was someone that we found through Women in STEM, a woman named Hao Hung, who now helps me run a bunch of the programs that I'm going to further describe. But it really was the case. A lot of this

Starting point is 00:47:50 is about finding folks that just would never have applied to us in the first place and getting them kind of into our pipeline. One of the things that really struck me about Women in STEM when we started doing it is it was a good example of playing the long game, trying to put together a program and tell them about us, but also tell them about finance and teach them interesting things about technology. But this was planning out pretty far in advance, right? These are people who were seniors or in high school who were about to go to college. And it was going to be some time before there were people who might apply for jobs here, but we've seen it work out. Like there are plenty of people who went through that program, who ended up eventually coming here. And it's also, it sort of worked, I think,

Starting point is 00:48:23 directly on people who we're trying to recruit and also has like a larger brand building effect. Yeah. One of the things is a lot of the folks that we might be really excited to have come to Jane Street, if they have a friend who does an internship in finance and has a bad experience, if they haven't had sort of something good to put against it,

Starting point is 00:48:39 they just won't even necessarily consider a place like us. We found starting early and also having stuff, again, spanning sort of all of folks' college years has been really effective at getting people in the door as often as possible. So one of the programs that we built relatively early on that has been one of our most effective is called Insight. And what this program is, is we bring a bunch of sophomore women to Jane Street for a week. We teach them a bit about finance. We teach them some OCaml. We just have them spend a lot of time interacting with full-time Jane Streeters.

Starting point is 00:49:09 And through this, we are able to get a lot of those folks to apply to our internship. And then hopefully, you know, people come as interns and then come back full-time and the whole pipeline goes from there. Insight has been sort of so successful, we've actually expanded it. We have expanded this because the number of women at Jane Street has grown a ton since I've been here. I went from being developer number two to having so many female developers. I no longer know them all. I don't even know the number anymore, which is like, again, a thing I was tracking for so long because, you know, it was like, okay, we're at two. Okay, we got another female developer. Amazing. We're at three. We're at five. We're at eight. And like, you know, I cared about the individual numbers so much. And now it really does feel like

Starting point is 00:49:49 the case that we are actually pretty good at recruiting women into our internship. Our dev intern class this summer is about 25% women, which is frankly not where I'd want it to be. I'd really love that number to be higher, but at the same time, compared to where we started, it's a massive improvement. And I really want to celebrate the progress that we've made, but we're also not satisfied. We're going to continue doing more programs and more events and trying to recruit more women to Jane Street and make sure that Jane Street continues to be an excellent place for women to work. Yeah. Even if it's not perfect, we've made a lot of progress. That's right. So when you run a program like Insight,

Starting point is 00:50:17 how do you think about the kind of dual problem of how to attract people, how to get people to apply to it? And then also among the people who apply, how do you pick people who in a way they'll maximize the likelihood of success? Yeah. A lot of this is amazing work done by our recruiting team to build connections on campus with women in CS groups and building connections with some of the bigger women in tech organizations like the Anita Borg Foundation. So many years I went to Grace Hopper and had a booth and like just would grab anyone walking by and be like, hey, let me talk to you about Jane Street and come hear about this thing.

Starting point is 00:50:48 And so a lot of it was done through sort of the work of the recruiting team of finding people and getting our name onto campuses. And the thing that we've found very effective is once we kind of get our foot in the door to school, like we can get our reputation to spread there reasonably effectively. So a lot of it is figuring out what club we need

Starting point is 00:51:03 to reach out to or how we kind of find an avenue into a school. And this is one of the things that our recruiting team is really, really excellent at. They are excellent at many things, but this is one. That points at another problem, which is maybe a different kind of diversity, which is diversity in terms of the schools that you reach out to. You talk about how you get in and you get a reputation to school, and suddenly now you have access to a really interesting stream of candidates. But that requires a specific investment in a particular school. And there's a ton of schools and there are great people scattered across all of these schools. Is there anything that we've done in this to try and reach out to like people in the long tail of schools that we

Starting point is 00:51:36 don't have time to build particular relationships with that school, but would still be interested in seeing at least some subset of the people there? I think the answer is the recruiting team does a lot of stuff here, but I don't actually know. I don't know what the answer is in the context of Insight. I know in general we try to do all sorts, I mean, just to give a dumb example, like this podcast is a way of trying to reach out in a way that's orthogonal to the particular efforts at schools and also, for that matter, reaches out to people who already have jobs and are already employed in places. Yes. I mentioned the recruiting pipeline for our interns,

Starting point is 00:52:06 being things like Insight and some of the other programs in that space. But one of the things that we are still working on trying to make better is our recruiting pipeline for lateral women. Our intern class is 25% women, but only about half of dev hiring is done through the internship. And the other half is done through laterals, which are folks who have been in industry working at places. And the number of lateral women we hire is significantly lower in terms of percentages. And this is a thing that

Starting point is 00:52:29 we're very actively working on. We've been trying lots of different things in this space. I would say we haven't found a single silver bullet that works, but we've tried lots of different things and some of them have had some success and some we're sort of still iterating on. One of the ones that is my favorite that we put together, and I really hope that we can now bring back as the pandemic recedes is shortly before the pandemic. We started hosting brunches for women in tech in the city just to hang out with a bunch of Jane Street developers and have brunch. Because what happened was we were throwing out ideas for how we might do sort of some reach outs to women across the city who might be interested in a job at Jane Street. And we were like, people were talking about doing a tech talk or doing a day where we brought people to Jane Street and like told them about, and Casey, another female developer and I were like, we wouldn't go to that. What would we go to? We'd go to brunch. So we tried that and things like

Starting point is 00:53:19 that where it's just kind of reputation building. We didn't think we would want to go to something that was incredibly sponsored and like, you know, very pitchy and whatever. And so we were like, let's actually just approach this from a relatively altruistic angle. Let's try and help women in tech in the city, build a community and approach that as the angle. And then if it is the case, that great, maybe they're not looking for a job, but they have a friend who they know who is. This now might be a thing that they might consider being like, oh, have you heard of this place? Again, playing the long game of like, we're not doing this as a, oh, you have to be looking for a job or, oh, you have to be, you know, having this thing. It's like,

Starting point is 00:53:55 actually, let's just try and do a thing that we think would be fun for women in the city in tech. And hopefully there are good knock-on effects from that over the long haul. And, you know, frankly, I just really enjoyed meeting a bunch of other female software developers around the city. That was a really fun part of the brunches and I think was achieving the goal of helping build community, helping build connection. And I think that is a thing that I hope that we get to bring back now that people are maybe willing to be indoors at a brunch again or maybe we'll do them outdoors. I don't know. That's amazing. Yeah, I hadn't heard about these brunches at all.

Starting point is 00:54:27 It sounds like a great idea and something we should totally start up again. And I think your point about trying to generate something that's legitimately helpful to the people involved is like a really important thing and a thing that showed up in a lot of the recruiting programs that we do. Like we have a tech blog and it's obviously like the subterranean purpose behind all of that is like we want to hire great people, but we also try really hard to make the things that we write and publish like legitimately interesting and provide real value to the people who are reading. And I think that's kind of a thing we've done in general like women stem similarly like we had this program where the long-term goal is hiring people but we also tried to make sure it

Starting point is 00:54:49 was legitimately helpful and taught people things that would be useful to them whether or not they came that kind of playing of the long game and trying to make investments that are going to pay off down the line and do it in a way that feels legitimate and connected and useful to the people involved is core to how we approach this for a long time. Yeah, one of the things sort of in this vein that I'm incredibly excited about that we're running this summer for the first time, I've talked a lot about our recruiting of women to Jane Street. Another area I am involved in is trying to recruit more underrepresented minorities in tech in general to Jane Street. This is also an area where we are woefully underrepresented and we want to get better. So a program that we're running this summer is for folks who are from underrepresented backgrounds in tech between their freshman and

Starting point is 00:55:29 sophomore year. And we built a program that was designed to be purely educational. Part of the thinking about this was how to help people who come from a place where Jane Street doesn't have a strong recruiting pipeline to begin with. Schools where Jane Street already does a lot of recruiting, we have people who have gone through our internship or our interview process. They know how to prep for that and know how to prepare for the interviews. And that just does give you a leg up in the interview process. So our goal was to build something that could help level the playing field a little bit for folks coming from places where they just don't have that preparation. We're trying to find a whole group of folks who are

Starting point is 00:55:58 relatively early in their careers and build a program that is designed to be 100% educational. We're going to teach them OCaml for the first several weeks of the program. And then we're going to work with them on an open source project so that they'll have something they can put on their resume. Because again, it is often hard to get an internship if you haven't had something on your resume like an internship already. So kind of building a thing aimed to both hopefully help these folks in their sort of furthering their CS education and also getting them prepared to be able to get internships during their sophomore and junior years, hopefully a bunch of them with us,

Starting point is 00:56:29 but also just in general, our aim of the program is to provide a service and provide a program that will hopefully help all of these folks. And, you know, again, we always have the sort of long game of like, hopefully some of those people become James Street developers, but our aim in building the program is hopefully they'll get something out of it, whether they come to Jane Street in the long term or not. One of the interesting things about that program is it's really connected to open source work that we do, right? Because one of the ways we're making that happen is by having them do projects where they don't have to come inside of Jane Street's walls.

Starting point is 00:56:57 They can work on stuff that we've open sourced already. And that simplifies and smooths out that story. And it makes it much easier, again, if the goal is kind of helping them have something that they can talk to the world about, it's much easier if it's something they can point to. One of the things that is interesting about Jane Street is we don't have a product that you can point to and show to your mom. You can't be like, I did this thing. And, you know, maybe looking at a bunch of open source code is not necessarily something everyone's mom would understand. But it is a thing that you can put on your resume and a recruiter can look at and be like, oh, there's a GitHub link. Okay, cool. I can see a thing that this person did. And I think those things are hopefully really helpful in helping them, again, as they're

Starting point is 00:57:32 progressing in their CS education and also in their careers. Do you have any kind of feedback or a sense of how effective these programs have been? So I think for Insight and then also the version of it we run for underrepresented minorities in tech called InFocus, they have been incredibly effective. The percentage of women who go through this program who then end up in one of our internship or then come back full time is something like 20%, which is for a one-week program we're running quite high. If we could have our interviews be 20% effective, that would be awesome. We're using 20% of the people who go through the program eventually come to the GNSU internship? Something like that. That is astonishing.

Starting point is 00:58:13 Yeah. One of the things that's amazing is a group of people who are not me who have taken over the program since I was involved. So I'm not sure what the all-in numbers are these days, but at least when we were involved in the program, I think from our very first one, I think it was 25 students and five of them ended up in the Jane Street internship over the next few years. That's amazing. That's really a lot of impact to have on the firm's hiring. It's nuts. From the very first in-focus we ran, we had, I think, something like 15 students and three of them ended up at Jane Street that next summer. Again, we were running this for the first time in the fall. So a lot of people who came to it already had internships. So our hope is that some of those people will be in like this year's internship and next year's. So getting from just one year of turnaround, it's been awesome. And again,

Starting point is 00:58:57 we would love all of these numbers to be larger and we're hoping to continue to make them so, but the impact of these things is really large. And the number of people who come through these programs who say, I never would have considered applying to Jane Street if not for them is also very large. You know, I think one of the women who helps run the programs now is Grace Ng, who was in actually that first class of insiders. And yeah, she said when she was applying, like, yeah, like this isn't a thing that I would have done before. And like, I'm so grateful we have her because she's amazing.

Starting point is 00:59:25 I think the best selling point Jane Street has is just introducing full-time Jane Streeters to people because we are in general a very inclusive and welcoming and friendly place to work. And I think that the people here are fantastic. When people are, you know, asked in interviews, what is your favorite thing about the place? People have to struggle to come up with an answer that isn't the people because everyone's answer is the people. And so that gets a little repetitive, but it's just so true. All right. Thanks so much for coming and joining me. This has been great. Yeah, it's been awesome. Thank you. You'll find a complete transcript of the

Starting point is 00:59:59 episode along with show notes and links at signalsandthreads.com. One thing I wanted to mention is that, as you may have noticed, the pace of releasing new episodes has slowed down a bit. Don't worry, we're not going anywhere. In fact, we've got a bunch of episodes planned that I'm really excited about. But things have been busy, and I do expect the pace to be a bit slower going forward. Anyway, thanks for joining us, and see you next time.

Signals and Threads - Swapping the Engine Out of a Moving Race Car with Ella Ehrlich

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.