Signals and Threads - Multicast and the markets with Brian Nigito

Starting point is 00:00:00 Welcome to Signals and Threads, in-depth conversations about every layer of the tech stack from Jane Street. I'm Ron Minsky. Today, I'm going to have a conversation with Brian Nogito, essentially about the technological underpinnings of the financial markets and some of the ways in which those underpinnings differ from what you might expect if you're used to things like the open internet and the way in which cloud infrastructures work. And we're going to talk about a lot of things, but there's going to be a lot of focus on networking and some of the technologies at that level, including things like IP multicast. And Brian Nijito is a great person to have this conversation with because he has a deep and long history with the financial markets. He's worked in the markets for 20 years. Some of the time he spent working at the exchange level where he did a lot of the foundational work that led to the modern exchange architectures that we know today. And he's also worked on the side of various different trading firms. And for the last eight years, Brian's been working here

Starting point is 00:00:58 at Chain Street and his work here has covered a lot of different areas. But today he spends a lot of time thinking about high performance, low latency, and especially network level stuff. So let's dive in. I think one thing that I'm very sensitive to is a lot of the people who are listening don't know a ton about the financial markets and how they work. Just to get started, Brian, can you give a fairly basic explanation of what an exchange is? I think when you hear about an exchange, you can think of lots of different kinds of marketplaces. But when we talk about an exchange,

Starting point is 00:01:27 we're talking about a formal securities exchange. And these are the exchanges that the SEC regulates, and they meet all of the rules necessary to allow people to trade in securities. So when we use that loosely, yeah, it's pretty different than your average flea market, supposed to be anyway. That's obviously a function which, once upon a time, was done with physical people in the same location, right? Those got moved into more formal, more organized exchanges with more electronic support. And then eventually, there's this kind of transformation that's happened essentially

Starting point is 00:02:02 over the last 20 years, where the human element has changed an enormous amount. Now, humans are obviously deeply involved in what's going on, but the humans are almost all outside of the exchange. And the exchange itself has become essentially a kind of purely electronic medium. Yeah, it's a really interesting story because you have examples of communications technologies and electronic trading going back to late 60s, but probably more mid 70s. I'm being a little loose with dates. So it was kind of always present, but the rule set was not designed to force people to operate at the kinds of timescales that electronic systems would cause you to operate at.

Starting point is 00:02:45 It was rather forgiving. So if somebody on the floor didn't want to deal with an electronic exchange, the electronic exchange had to wait. And over the past 10 to 15 years, that's kind of flipped. And so generally we favor always accessible electronic quotations. To step back a little bit, the exchanges are the places for people to meet and trade, as you said, to advertise their prices and for people to transact with each other. Other than people who are buying and selling, what are the other people who interact at the

Starting point is 00:03:16 exchange level? What are the other kind of entities that get hooked in there? So you have obviously the entities who either in their own capacity or on behalf of other people are transacting securities, but then you have financial institutions that are clearing and guaranteeing those trades, providing some of the capital or leverage to the participants who are trading. They obviously want to know what's going on there. You have other exchanges because the rule set requires the exchanges to respect each other's quotations. In this odd way, there's a web where the exchanges are our customers of each other. And you may also have various kinds of market data providers. So those quotes that reflect the activity on the exchange are eventually making

Starting point is 00:03:55 their way all the way down to what you might see scrolling on the bottom of the television or your brokerage screen or financial news website, et cetera. I guess they even make it all the way down to the printed page when the Wall Street Journal prints transaction prices. So what does this look like at a more systems level? What are the messages that the different participants are sending back and forth? The most primitive sorts of things are you have orders or instructions. There are other platforms where we have quotes and we may use that loosely, but we'll just say orders. And an order would just say that I would like to buy or sell, let's say a specific stock.

Starting point is 00:04:31 And I'd like to do so at no worse than this price and for no more than this quantity. That may mean I could get filled at a slightly better price than that. I could get filled for less than that. I could get filled not at all. And that order could basically check the book immediately and then come right back to me if there's nothing to be done. Or it can rest there for some non-zero amount of time where it could advertise and other people may see it and choose to interact with it. And then obviously I can withdraw that interest or cancel it.

Starting point is 00:05:03 So when we talked about orders or cancels, those go hand in hand. And finally, there's execution messages where if you and I intersect on our interest, I want to buy, you want to sell or vice versa, then the exchange is going to generate an execution to you and to me saying that that happened and the terms of that trade. And I guess one of the key properties here is that you have a fairly simple core set of messages. There's this basic data structure at the heart of it called the book, which is the set of orders that have not been satisfied. And then people can send messages to add orders and remove orders. And then if two orders cross, if there are two orders that are compatible where a trade can go up, then an execution occurs and the information flows out. A fairly simple core machine at the heart of it.

Starting point is 00:05:45 But then lots of different players who want different subsets of information for different purposes. There are people who are themselves trading, want to see, of course, their own activity and all the detail about that. And they also want to see what we call market data, the kind of public anonymized version of the trading activity. So you can see what are the prices that are out there that are advertised for you to go and transact against. And so in the end, you need to build

Starting point is 00:06:09 a machine that's capable of running this core engine, doing it at sufficient speed, doing it reliably enough. Maybe a thing that's not apparent if you haven't thought about it is there's a disturbing, dizzying amount of money at stake. And oh my God, you do not want to lose track of the transactions, right? If you say like, oh, you guys did this trade and then you forget about it and don't report it or you report to one side or not the other, terrible things happen. So reliability is a key thing. Yeah. And I think to go back, there's lots of different consumers, lots of different participants. And I think the key word there is there's lots of competing participants. So one thing you didn't mention in there is

Starting point is 00:06:45 disseminating all that information fairly. So trying to get it to everybody at the same time is a real challenge and one that participants are studying very, very carefully and looking for any advantage they can technologically within the rule set, et cetera. So that extra layer of competition sort of makes the problem a little more complicated and a little more challenging. And this fairness issue is one that you've seen from the inside working on early exchange infrastructure at Island and at Instanet, which eventually became the technology that NASDAQ is built on. Early on, you guys built an infrastructure that I think

Starting point is 00:07:23 didn't have all of the fairness guarantees that modern exchanges happen today. Can you say more about how that actually plays out in practice? When working on the island system, it was very close originally to sort of, I guess, fair in that you had the same physical machines, you had an underlying delivery mechanism, which we'll talk about, that was very fair at getting it to those individual machines. And then you were sending copies of orders or instructions after going through one application to everyone. So you were all passing through about the exact same amount of work and about the exact

Starting point is 00:07:58 same number of devices, but it was actually very inefficient. We were using thousands of machines that were mostly idle. So once we started trying to handle multiple clients on a single machine, it exposed sort of some obvious and silly problems. The naive implementation where people would connect, we would collect all of those connections. And then when we had a message, we would send them on the connections serially, often in the order in which people connected. Well, that immediately led to thousands of messages per second before the exchange opened where somebody tried to be the very first

Starting point is 00:08:35 connection in that line. So then you start sort of round robining. So you start from one and then the next time around you start from two, et cetera, et cetera, to try to randomize this. And then you had people who were connecting to as many different ports as they could and taking the fastest of each one. And so these incentives are very, very strong. And we'd like to use machines to their fullest, but to literally provide each participant their own unique machine for each connection starts to get ridiculous as well. So where did that lead you? How did you end up resolving that problem?

Starting point is 00:09:07 A lot of these were TCP protocols. In those days, we actually had a decent number of people connecting over the open internet. I don't think we provided trading services directly over the open internet, but we did actually provide market data that way. And TCP is probably your only reasonable option over something like the internet. But once you started moving towards co-location and towards more private networks where people's machines were in the same data center and really only two or three network devices away from the publishing machine, it became a lot more feasible to start using different forms of networking, unreliable networking, UDP, and that leads you

Starting point is 00:09:47 to something called multicast, where rather than you sending multiple copies of the message to end people, you send one copy that you allow the network infrastructure to copy and deliver electrically and much more deterministically and quickly. For someone who's less familiar with the low-level networking story, just give a quick tour of the different options you have in terms of how you want to get a packet of data from one computer to another. The internet and ethernet protocols

Starting point is 00:10:14 are generally a series of layers. And at the lowest layer, we have an unreliable best effort service to deliver a packet's worth of data. And it's sort of a one-shot thing, more or less point-to-point from this machine to some destination address. And then we build services on top of that that make it reliable by sequencing the data, attaching sequence numbers so we know the original order that was intended,

Starting point is 00:10:41 and having a system of retransmissions, measuring the average round trip time, probabilistically guessing whether packets are lost, et cetera, et cetera. So that all gets built up into a fairly complex protocol that most of the internet uses, TCP, maybe not all, and there are some people pushing for future extensions to that. But by and large, I'd say that's the vast majority of reliable in-order connected data over the internet is sent via TCP. And TCP assumes that there's one sender and one receiver. And it has unique sequence numbers for each of those connections. So I really can't show the same data to multiple participants.

Starting point is 00:11:19 I actually have to write a unique copy to each participant. UDP is a much lighter layer on top of the underlying raw transport, still unreliable, but with a little bit of routing information. And that protocol has some features where you can say, I want to direct this to a specific participant or a virtual participant, which the network could interpret as a group. Machines can join that group and then that same message can be delivered by network hardware to all the interested parties. One of the key features there, which I think is maybe not obvious, why would I prefer multicast over unicast? Why is it better for me to send one copy to the switch that then sends a bunch of copies to a bunch of

Starting point is 00:12:05 different recipients versus me just sending a bunch of individual copies on my own? What's the advantage of baking this into the actual switch infrastructure? I mean, the switches are very fast and deterministic about how they do that. And because of, I think, their usage in the industry, they've gotten faster and more deterministic. So they can just electrically repeat those bits simultaneously to all, you know, 48 ports or whatever that that switch might have. And that's just going to be much faster and more regular than you trying to do it on a general purpose server where you might be writing multiple copies, writing to multiple places. It's

Starting point is 00:12:45 just, you really can't compare the two. One of the key advantages of using switches is that the switches are doing the copying in specialized hardware, which is just fundamentally faster than you can do on your own machine. And also there's a distributed component of this, right? When you make available a multicast stream, there's this distributed algorithm that runs on the switches where it learns essentially what we call the multicast tree, which is at each layer, each switch knows to what other switches it needs to forward packets. And then those switches know which part they need to forward packets to. And so that gives you the ability to kind of distribute the job of doing the copying.

Starting point is 00:13:19 So if you have like 12 recipients in some distant network, you can send one to the local switch. And then the final copying happens at the last layer at the place where it's most efficient. So that's like the fundamental magic trick that multicast is providing for you. I mean, as the networks get simpler, the very first versions we were using weren't even using multicast. We were using something called broadcast, which just basically said anything you get, I want you to repeat everywhere. It's funny because you could imagine that you could certainly overwhelm a network that way. And a large part of the uncertainty and the variation that comes from TCP

Starting point is 00:13:53 are these self-learning algorithms that are very concerned about network health. And so when we would work with Linux kernel maintainers and stuff like that, and have questions about variability that we saw, then they would say, well, you shouldn't be using TCP. If you care about latency, you shouldn't be using TCP. TCP makes trade-offs all the time for network health and so on and so forth. And for the internet, that is absolutely necessary. And if you really have these super tight requirements and you really want them to get there fast, and you have a controlled network with very little packet loss and very few layers between participants, you should be using UDP. And they were probably right. We mostly do nowadays for this stuff,

Starting point is 00:14:36 but it took a while to get there. And they were right in a way that was kind of totally unactionable, which is to say there are a bunch of standardized protocols about how you communicate when you're sending orders. In fact, another thing to say about the trading world is, if you step back and look at how the protocols we use are broken up, there are two kinds of primary data flows that a trading firm encounters, at least when we're talking to an exchange. There is the order flow connection, where when we send our specific orders and our specific cancels and see the specific responses to those, and that is almost always done on a TCP connection.

Starting point is 00:15:11 And then there is the receipt of market data, and that's where you're sending the data that everyone needs to see exactly the same anonymized stream of data, and that's almost always done through multicast. So there is part of the data which is done via UDP in the way the Linux kernel developers would recommend. And there's part of the data flow that's still to this day done under TCP. And I think the difference is we no longer use the open internet in the way that we once did, right?

Starting point is 00:15:37 I think there's been this transformation where instead of sending things up to the trunk and having things routed around the big bucket of the open internet, trading firms will typically have lots in the way of co-location sites where they will put some of their servers very near to the switches that the exchange manages. And they will have what we call cross-connects, right? We will connect our switch to their switch and then bridge between the two networks and deliver multicast across these kind of local area networks that are very tightly controlled, that have very low rates of message loss. So in some sense, we're running these things over a very different network environment than the one that most of the world uses. Yeah, a couple of interesting observations to that. It means that co-location makes competition between professional participants more fair. It enables us to use

Starting point is 00:16:25 these kinds of technologies. Whereas without co-location, you have less control over how people are reaching you and you end up with probably more variation between participants. I think it's also worth saying that a lot of things we're talking about are a little bit skewed towards US equities and equities generally. There's lots of other trading protocols that are a little bit more bilateral. There isn't like a single price that everybody observes. In currencies, often people show different prices to different people. There's RFQ workflows and fixed income and somewhat in equities and ETFs. But by and large, probably the vast majority of the messages generated look a bit like this, where there's shared public market data that's anonymized, but viewable by everyone. And then the private stream,

Starting point is 00:17:09 as you say, of your specific transactions and your specific involvement. I think from a perspective of what was going on 15 years ago, I feel like the obvious feeling was, well, yeah, the equity markets are becoming more electronic and more uniform and operate in this way where there's kind of central open exchange and not much in the way of bilateral trading relationships. And surely this is the way the future and everything else is going to become like this. No, actually the world is way more complicated than that. And currencies and fixed income and various other parts of the world just have not become that same thing. Yeah. And I think that's partly because those products are just legitimately different and the participants have different needs.

Starting point is 00:17:48 Sometimes it's because the equity markets happen so, I think, relatively rapidly. A lot of the transformation happened there. And so other markets that were a little bit behind saw the playbook, they saw how it changed and they positioned and controlled some of that change to maintain their current business models, et cetera. I wanted to go back to one thing was you said we mostly use TCP. And it's interesting because there were attempts, I know of at least one off the top of my head, probably there are more, to use UDP for order entry. Specifically, somebody had a protocol called UFO, UDP for orders. There wasn't a ton of uptake because look, if you're a trading firm connecting to 60 exchanges and 59 of them require you to be really good at managing a TCP connection

Starting point is 00:18:33 and one of you offers a unique UDP way, that's great, but that's one out of 60. And so I kind of have to be good at the other thing anyway. So there just wasn't as much adoption because there's just enough critical mass and momentum that the industry kind of hovers around a certain set of conventions. And the place where you see other kinds of technologies really taking hold are where there's a much bigger advantage to using them, right? I think when distributing market data, it's just kind of obviously almost grotesquely wasteful to send things in unicast form where you send one message per recipient. And so multicast is a huge simplifier. It makes the overall architecture simpler, more efficient, fairer. There's a big win that really got adopted pretty broadly.

Starting point is 00:19:16 We've kind of touched on half of the reason people use multicast, right? Which is, I think, one of the core things I'm kind of interested in this whole story is why is trading so weird in this way, right? Multicast was like, when I was a graduate student many years ago, multicast was going to be a big thing. It was going to be the way in the internet that we delivered video to everyone, totally dead. Multicast on the open internet doesn't work. Multicast in the cloud basically doesn't work. But multicast in trading environments is a dominant technology. And one of the reasons I think it's a dominant technology is because it turns out there are a small number of videos that we all want to watch at the same time. Unlike Netflix, where everybody watches a different thing, we actually want in the

Starting point is 00:20:00 trading world to all see what's going on on NASDAQ and ARCA and NYSE and SIBO and so on and so forth live in real time. We're all stuck to the same cathode ray tube. But there's a whole different way that people use multicast that has less to do with that, which is that multicast is used as a kind of internal coordination tool for building certain kinds of highly performant, highly scalable infrastructure? What is the role that multicast plays on the inside of exchanges and also on the inside of lots of firms trading infrastructure? The exchange, the primary thing it's doing is determining the order of the events that are happening. And then the exchange wants to disseminate that information to as many participants as possible. So certain parts of this don't parallelize very well. The sequence has to pretty much be done in one place for the same security. So you ended up where you were trying to funnel a lot of traffic down into

Starting point is 00:20:54 one place and then report those results back. In that one place, you wanted to do as little work as possible so that you could be fast and deterministic. And then you were spreading that work out into lots of other applications that were sort of following along and provided value-added information, value-added services, and reporting what was happening in whatever their specific protocol was. So the same execution that tells you that you bought the security you were interested in can also tell your clearing firm, maybe security you were interested in, can also tell your clearing firm, maybe in a slightly different form, can tell the general public via market data that's anonymized and takes your name off of it, et cetera, et cetera.

Starting point is 00:21:33 And let me try and just kind of sharpen the point you're making here, because I think it's an interesting fact about how this architecture all comes together, which the kind of move you're talking about making is taking this very specific and particularistic problem of like, we want to manage a book of open orders on an exchange and distribute this and that kind of data and turning it into a fairly abstract CS problem of transaction processing. You're saying like, look, there's all these things that people want to do. The actual logic and data structure at the core of this thing is not incredibly complicated. So what we want to do is just to simplify all of the work around it,

Starting point is 00:22:06 we're just going to have a system whose primary job is taking the events, the request to add orders and cancel and so forth, and choosing an ordering and then distributing that ordering to all the different players on the system so that they can do the concrete computations that need to be done to figure out what are the actual executions that happen, what are the things that need to be reported to regulators, what needs to be reported on the public market data. And then multicast becomes essentially the core fabric that you use for doing this, right? You have one machine that sits in the middle, you can call it the matching engine, but you could also reasonably just call it a sequencer, because its primary role is getting all the different requests and then publishing them back out in a well-defined order.

Starting point is 00:22:49 Worth noting that multicast gives you part of the story, but not all of the story because it gives you getting messages out to everyone, but it misses two components. It doesn't get it to them reliably, meaning messages can get lost. And it doesn't necessarily get them to each participant in order. Essentially, the sequence kind of puts a counter on each message. So you can see it's like, oh, I got message one, two, four, three. Well, okay, I got to reorder them and interpret them as one, two, three, four. And then also that ordering lets you detect when you lose messages. And then you have

Starting point is 00:23:19 another set of servers out there whose job is to retransmit. When things are lost, they can fill the gaps. And now this is a sort of specialized supercomputer architecture, which gives you this very specialized bus for building what you might call state machine style applications. Right. And I will say, I think I'm aware of a number of exchanges that actually do have a model where they actually have just a sequencer piece that does no matching, that really just determines the order. And then some of these sidecar pieces are the ones that are actually determining whether matches do indeed happen, and then sequencing them back, reporting them back, et cetera, et cetera. So there's definitely examples of that. A couple other points. So yeah, the gap filling and recovery has been a problem that I think is

Starting point is 00:24:03 covered by other protocols. There are reliable multicast RFCs and protocols out there. And everywhere I've been, when we've looked at them, we've run into the problem that they have the ability for receivers to slow or stop publication. And in those cases, if you scale up to having thousands of participants, there's sort of somebody somewhere who always has a problem. So using any of these general purpose, reliable multicast protocols never seemed to quite fit any of the problems that we had. And I think because of the lack of use for the other reasons you mentioned,

Starting point is 00:24:39 they were generally not super robust compared to what we had to build ourselves. And so we ended up doing exactly that where we added sequencing and the ability to retransmit missed messages in various specialized ways. It's also worth noting that you get some domain-specific benefits that I think also can generalize where if you've missed a sufficient amount of data, I guess you can always replay everything from the beginning, but it sort of turns out that if you know your domain really well and you can compress that data down to some fixed amount of state, you can have an application that starts after 80% of the day is complete and be immediately online because you can give him just a smaller

Starting point is 00:25:26 subset of the state. And a general purpose protocol like TCP, where you'd have to sort of replay any missed data, has a number of problems in trading. That can be buffered there for sort of arbitrarily long, and it assumes you still want it to get there. And it's buffering it bite by bite. Whereas if you say, oh, I'd like to place an order, oh, I'd like to cancel an order, oh, I'd like to place you say, oh, I'd like to place an order, oh, I'd like to cancel an order, oh, I'd like to place an order, oh, I'd like to cancel an order. If all of those are sitting in your buffers,

Starting point is 00:25:51 the ideal thing to do would be, well, if you know the domain, they cancel each other before even going out if they're waiting in the buffer and you said nothing. So when we design those protocols ourselves, optimized for this specific domain, we can pick up a little bit more efficiency when we do it. This is, in fact, in some ways, a general story about optimizing many different kinds of systems, essentially specialization, understanding the value system

Starting point is 00:26:14 of your domain and being able to optimize for those values. I think the thing you were just saying about not waiting for receivers, that's in some sense part of the way in which people approach the business of trading. The people who are participating in trading care about the latency and responsiveness of their systems. People who are running exchanges, who are disseminating data, care about getting data out quickly and fairly, but they care more about getting data to almost everyone in a clean way than they do about making sure that everyone can keep up. So you'd much rather just kind of pile forward obliviously and keep on pushing the data out. And then if people are behind, well, you know, they need to think about how to engineer their

Starting point is 00:26:50 systems differently. So they're going to be able to keep up. And you worry about like the bulk of the herd, but not about everyone in the herd. You know, the stragglers in the herd, well, you know, they can catch up and get retransmissions later and they're going to be slower, but we're not going to slow down. Understanding what's important to the applications can be massively simplified. A huge step you can take in any technical design is figuring out what are the part of the problems you don't have to solve. I think it's also worth saying that the problem is somewhat exacerbated by fragmentation. We've

Starting point is 00:27:17 said it's important for people to determine the order of events, but you also need to report it back to them quickly and reliably quickly, deterministically quickly, because that translates directly into better prices. If I told you that you could submit an order and it would be live for the next six or eight hours, you're going to enter probably a much more conservative price. And let's say I'm actually acting as an agent for you. I'm routing your order to one of these other 14 exchanges. Well, I may want to check one and then go on to the next one. And the faster and more reliable it is for me to check this one, the more frequently I'll do so. If I think there's a good

Starting point is 00:27:57 chance that the order will get held up there, well, that opportunity cost, I may miss other places. So this is all kind of a rambling way of saying that speed and determinism translate directly into better prices when you have markets competing like this. People often don't appreciate some of the reasons that people care about performance in exactly the way that you're kind of highlighting. Just to kind of give another example in the same vein, like this fragmentation story, you might want to put out bids and offers at all of the different exchanges where someone might want to transact, right? There's a bunch of different marketplaces. You want to show up on all of them. You might think, oh, I'm willing to buy or sell a thousand shares of the security and I'm happy to do it

Starting point is 00:28:36 anywhere, but you might not be happy to do it everywhere. There's like a missing abstraction in the market. So they want to be able to express something like, I would be willing to buy the security at any one of these places, but they can't do it. So they try and simulate that abstraction by being efficient, by being fast. So they'll put out their orders on lots of different exchanges. And then when they trade on one of them,

Starting point is 00:28:57 they'll say, okay, I'm no longer to, so they'll pull their orders from the other. And they're now worried about other professionals who are also very fast, who try to route quickly and in parallel to all the different places and take all of the liquidity that shows up all at once. There's this dynamic that the speed and determinism of the markets now becomes something that essentially affects the trade-offs between different professional participants in the market. Yeah, that's right.

Starting point is 00:29:23 Another thing I kind of want to talk about for a second is what are some of the trade-offs that you walk into when you start building systems on multicast? I remember a bunch of years ago, you were in the guts of systems like Island and InstantNet and NASDAQ and Chiax and all of that building this infrastructure before you came to Jane Street. I was on the other side and at the time, Jane Street, I think, understood much less about this part of the system. And I remember the first time we heard a description from NASDAQ about how their system worked, and I basically didn't believe them, kind of for two reasons. One reason is it seemed impossible. the way NASDAQ works is every single transaction on the entire exchange goes through a single machine on a single core. And on that core is running a more or less ordinary Java program that

Starting point is 00:30:14 crosses every single transaction. And that single machine was the matching engine, the sequencer. And I didn't really know how you could make it go fast enough for this to work. There was essentially a bunch of optimization techniques that I felt like at the time we just didn't really know how you could make it go fast enough for this to work. There was essentially a bunch of optimization techniques I felt like at the time we just didn't understand well enough. And also, it just seemed perverse. What was the point? Why to go to all that trouble? Maybe you could do it, but why? Well, a couple of things. First, I want to say on all the systems you mentioned, I like to think I did some good engineering work, but I was certainly a part of many excellent teams and worked with just a tremendous bunch of

Starting point is 00:30:45 people over the years. But yes, from a performance perspective, you said, well, the fewer processes I have, the simpler the system is. And it gives you some superpowers there where you just don't have to worry about splitting things up in various ways. There were certainly some benefits to adding complexity, but a lot of that came about as hardware itself started to change. And that should provide probably the baseline for optimization. I think you want to understand the hardware and the machines you're using, the machines that are available, the hardware that's available to you deeply. And you want to basically model out what the theoretical bounds are. And then when you look at what you're doing in software and you look at the kind of performance

Starting point is 00:31:30 you're getting, if you can't really explain where that is relative to what's capable, you're leaving some performance on the floor. And so we were trying very, very hard to understand what the machine could theoretically do and really utilize it to its fullest. Part of what you're saying is that instead of thinking about having systems where you fan out and distribute and break up the work into pieces, you stop and you think, if we can just optimize to the point where we can handle the entire problem in a single core, a bunch of things get simpler, right? We're just going to

Starting point is 00:32:06 keep everything going through this one stream. There's a lot of work that goes into making things uniformly fast enough for this to make sense, but it simplifies the overall architecture in a dramatic way. It definitely does. And it's been pretty powerful. I mean, not every exchange operates exactly on these principles. There's certainly lots of unique variations that people have put out there, but I do think that it is pretty ubiquitous. And certainly the idea that exchanges want some kind of multicast functionality, I think is universal at this stage. I'm sure there maybe is an exception here or there, but amongst high performance exchanges

Starting point is 00:32:42 with professional participants like this, I think it's pretty universal. And when you're talking about publication of market data, we can see that directly, since we're actually subscribing to multicast in order to receive the data ourselves. But their internal infrastructure often depends on multicast as well, right? True. Although, you know, I'm not as familiar with like the crypto side of the world, but since a lot of that is happening over the open internet, UDP is probably not one of the options. And so you have people using more web sockets and JSON APIs and things like that.

Starting point is 00:33:15 But it is kind of the exception that proves the rule, right? Because of that focus on the open internet and everything, you've got a totally different set of tools. It highlights the fact that the technical choices are in some sense conditional on the background of the people building it. There's like two sides of the question we were just talking about. There's the question of what's the advantage of doing all this performance engineering? And the other question is, how do you do it? How do you go about it? It has moved around over the years. Many years ago, I remember that we had interrupt-driven I.O. Packets would come into the network card where

Starting point is 00:33:50 they would essentially wait for some period of time. And if it had waited there long enough or enough data had accumulated, then the network card would request an interrupt for the CPU to come back and service the network card. And so how frequently should we allow interrupts? If we allow them essentially anytime a packet arrives, that'll be way too much CPU overhead. And so there were trade-offs of throughput and latency. But once you end up with the sheer number of cores that we do nowadays, we can essentially do away with interrupts and just wait for the network card by polling and checking, do you have data? Do you have data? by polling and checking. Do you have data?

Starting point is 00:34:25 Do you have data? Do you have data? Do you have data? And the APIs have shifted a bit away from general purpose sockets. The sockets APIs require lots of copies of the data. There's like an ownership change there. When you read, you give the API a buffer that you own. The data is filled in from the network and then given back to you.

Starting point is 00:34:46 So this basically implies a copy on all of the data that comes in. And if you start to look at, say, a 25 gigabit networking, that means you basically have to copy 25 gigabits a second to do anything at a baseline. And the alternative is you try to reduce those copies as much as possible. And you have the network card just delivering data into memory, the application polling, waiting for that data to change, seeing the change, showing it to the application logic, and then telling the network card he's free to overwrite that data. You're done with it. And when you get down to that level, you really are getting very close to the raw performance of what the machine is capable of. So eliminating the copies and the unnecessary work in the system, that's certainly one. Trying to make your service times for every packet and every event as reliable and deterministic

Starting point is 00:35:36 as possible so that you have very smooth sort of behavior when you queue, you don't end up having to do that everywhere. The critical path tends to be pretty small when it's all said and done. I think one of the guys who had built the island system really kind of had the attitude that if any piece of the system is so complicated that you can't rewrite it correctly and perfectly in a weekend, it's wrong. And so I think that, you know, probably the average length of an application there was, you know, 2000 lines, something like that. And the whole exchange probably was maybe four or five applications stitched together. Sad to say, I think we do not follow that rule in our

Starting point is 00:36:14 engineering. I think we could not rewrite all of our components in a weekend. I'm afraid. The world has gotten more complicated, but it's, it's not a bad goal, you know, to often ask people. And I think it's consistent with reliability and performance to constantly ask yourself, yes, but can it be simpler? We want it to be as simple as possible. No simpler, but as simple as possible. And it really is a mark of you deeply understanding the problem when you can get it down to something that seems trivial to the next person who looks at it. It's a little depressing because you kill yourself to get to that point. And then the next person that sees it is like,

Starting point is 00:36:49 oh, well, that makes sense. That seems obvious. What did you spend all your time on? And you're like, if only you knew what was on the cutting room floor. One thing that strikes me about this conversation is that just talking with you about software is pretty different than lots of the other software engineers that I talk to because you almost immediately in talking about these questions, go to the question of what does the machine do and how do you map the way you're thinking about the program onto the actual physical hardware? Can you just talk for a minute about like the kind of role of mechanical sympathy, which is a term I know you like for this in the process of designing the software where you really care about performance. about how drivers with mechanical sympathy who really had a deep understanding of the car itself

Starting point is 00:37:46 were better drivers in some way. And I think that that translates to performance in that if you have some appreciation for the physical and mechanical aspects, just the next layer of abstraction in how our computers are built, you can design solutions that are really much closer to the edge of what they're capable of. And it helps you a lot, I think, in terms of thinking about performance when you know where those bounds are. So I think what's important there is it gives you a yardstick. Without that, without knowing what the machine is capable of, you can't quickly back of the envelope say, does the system even hang together? Can this work at all? If you don't know what the machine is capable of, you can't even answer that question. And then when

Starting point is 00:38:30 you look at where you're at, you say, well, how far am I from optimal, right? Without knowing what the tools you have are capable of, I just don't know how you answer that question and, you know, when you stop digging, so to speak. Or if you're observing that the market, be it from a competitive perspective or just the demands of the customer are much higher than what you think is possible, well, you've probably got the wrong architecture. You've probably got the wrong hardware. It's kind of hard for me to not consider that. As a practical matter, as a software engineer, how do you get a good feel for that? I feel like lots of software engineers, in some sense, operate most of the time at an

Starting point is 00:39:10 incredible remove from the hardware they work on. There's the programming language they're in, and that compiles down to whatever representation, and maybe it's a dynamic programming language, and maybe it's a static one, and there's like several different layers of infrastructure and frameworks they're using. And there's the operating system. And they don't even know what hardware they're running on. They're running on virtualized hardware and lots of different environments. For lots of software engineers,

Starting point is 00:39:35 a kind of concrete and detailed understanding of what the hardware can do feels kind of unachievable. How do you go about building intuition, trying to understand better what the actual machine is capable of? Well, I think, so you're separating programmers into people who get a lot of things done and people like myself. I think, is that fair? Seems fair, yes.

Starting point is 00:39:59 No, it's a good question. I think part of it is interest. And I think you really need to construct a lot of experiments. And you have to have a decent amount of curiosity. And you have to be blessed with either a problem that demands it or the freedom to be curious and to dig. Because you are going to waste some time constructing experiments. And your judgment initially is probably not going to be great. The machines nowadays are getting more and more complicated. They're trying to guess and anticipate a lot of what your programs do. So very simple sorts of benchmarks, simple sorts of experiments don't actually give you the insight you think you're getting from them. And so I do think it is a hard thing to develop, but certainly a good

Starting point is 00:40:52 understanding of computer architecture or grounding in computer architecture helps. And then there are now a decent number of tools that give you this visibility. But you do have to develop an intuition for what are the key experiments? What are the kinds of things that are measurable? Do they correlate with what I'm trying to discover, et cetera, et cetera. And I think it requires a lot of work of staying current with the technology and following the industry solutions as well as what's happening in the industry generally of computing technology. You got to kind of love it, right? Got to spend enough time to develop the right kind of intuition and judgment to pick your spots when you do your experiments. I think in lots of cases, people approach problems with a kind of, in some sense, fuzzy notion

Starting point is 00:41:46 of scalability. There are some problems where if you're like, no, actually, I can write this one piece, it admits simpler solutions some of the time than they do if you try and make it scalable in a general way. You can make a thing that is scalable, but the question of being scalable isn't the same as being efficient. So when you think about scalability and think about performance, it's useful to think about it in concrete numerical terms,

Starting point is 00:42:10 and in terms that are at least dimly aware of what the machine is capable of. I think it's actually easy to get programmers to focus on this sort of thing. If you just stop hardware people from innovating, they will have no choice, right? So many programming paradigms and layers of complexity have been empowered by the good work of hardware folks who have continued to provide us with increasing amounts of power. And if that stops, and it does seem like in a couple of key areas that is slowing, I don't know about stopping, but certainly slowing, then yeah, people will pay a lot more attention to efficiency. So this is maybe a good transition to talking about some of the work that you do now, right?

Starting point is 00:42:49 You, these days, spend a bunch of your time thinking about a lot of the kind of lowest level work that we do. And some of that has to do with building abstractions over network operations that give us the ability to more simply and more efficiently do the kind of things that we want to do. And part of it has to do with hardware. So I'm wondering if you could just talk for a minute about the role that you think custom hardware plays in trading infrastructures and some of the work that we've done around that. Jane Street has always had a large and diversified business. And for lots of our business, it's just not super relevant. But in the areas where message rates and competitiveness are a little extreme,

Starting point is 00:43:41 it becomes a lot more efficient for us to take some of these programmable pieces of hardware and really specialize for our domain. And that can mean, you know, like a network card is actually very good at filtering for multicast data. It can compare these addresses bit by bit. But there's really nothing that stops us from going deeper into the data and filtering based on content, looking for specific securities, things like that. And there aren't a lot of general purpose solutions out there to do that at hardware speeds, but we can get programmable network cards, custom pieces of hardware, where we can stitch together solutions ourselves. And I think that's going to become increasingly relevant and maybe even necessary as we start to move up in terms of data rates. I think earlier I mentioned

Starting point is 00:44:26 that we have, I didn't get the exact number, maybe there's 12 now going up to something like 15, 16, 17 different US equity exchanges. If each one of those can provide us data at something close to 10 gigabits per second, and the rule set requires that we consolidate and aggregate all that information in one place, well, we have something of a fundamental mismatch if we only have 10 gigabit network cards, right? So for us to do that quickly and reliably in a relatively flat architecture, we're going to need some magic. And the closest thing I think we have to magic is some of the custom hardware. This feels to me like the evolution of the multicast story,

Starting point is 00:45:05 which is if you step back for a moment, you can think of the use of multicast in these systems as a way of using specialized hardware to solve problems that are associated with trading. But in this case, it's specialized networking hardware. So it's general purpose at the level of networking, but it's not like a general purpose programming framework for doing all sorts of things. It's, you know, specialized to copying packets efficiently. Is there anything else at the level of switching and networking worth talking about? Yeah, I think that it's funny. I don't know if I've ever come across these like layer one cross point devices outside of our industry. I think certainly some use them maybe in the cybersecurity field, but within our

Starting point is 00:45:46 industry, there's been a couple of pioneering folks that have built devices that allow us to, with no switching or intermediate analysis of the packet, just merely replicate things electrically everywhere according to a fixed set of instructions. And it turns out that that actually covers a tremendous number of our use cases when we're distributing things like market data. So the more traditional, very general switch will take in the packet, look at it, think about it, look up in some memory where it should go, and then route it to the next spot. That got sped up with slightly more specialized switches based on concepts from InfiniBand that would do what was known as cut through. They would look at the early part of

Starting point is 00:46:30 the packet, begin to make a routing decision while the rest of the bytes were coming in, start setting up that flow, send that data out, and then forward the rest of the bytes as they arrive. Those were maybe an order of magnitude or even two faster than the first generation. Well, these that actually do no work whatsoever, but just mechanically, electrically replicate this data, they're another order of magnitude or two faster than that. So maybe a store and forward switch, the first kind I was describing, I don't know, maybe that was seven to 10 microseconds. A cut-through switch, looking at part of the packet and moving it forward, maybe that's 300 to 500 nanoseconds.

Starting point is 00:47:10 And now these switches, these layer one cross points, maybe they're more like 3 to 5 nanoseconds themselves. And so now we can take the same packet and make it available in maybe hundreds of machines with two layers of switching like that. And that's, we're talking about, you know, a low single to double digit number of nanoseconds in terms of overhead from the network itself. I think it's an interesting point in general that having incredibly fast networking changes your feelings about what kind of things need to be put on a single box and what kind of things can be distributed across boxes, right? Computer scientists like to solve problems by adding layers of indirection.

Starting point is 00:47:48 The increasing availability of very cheap layers of indirection suddenly means that you can do certain kinds of distribution of your computation that otherwise wouldn't be easier and natural to do. What do the latencies look like inside of a single computer versus between computers these days? It's starting to vary quite a bit, especially with folks like AMD having slightly different structure than Intel. But it's true that moving between cores is starting to get fairly close to what we can do with individual network cards. I mean, to throw out some numbers that somebody will then probably correct me on, I think maybe that's something on the order of 100 nanoseconds. It's not that different when we're going across the PCI bus and going through a highly optimized network card. That might be something like, you know, 300 to 600 nanoseconds. And this is, you know, one way to get the data in. But it is not unreasonable for the sorts of servers that we work with to get frames all the way up to the user space into the application to do very little work on, but then turn around

Starting point is 00:48:51 and get that out in something less than a microsecond. Moving, you know, context switching, things like that in the OS can start to be on the order of a microsecond or two. Yeah. And I think the thing that's shocking and counterintuitive about that is the quoted number for going through an L1 crosspoint switch versus going over the PCI Express bus. We're talking 300 nanos to go across the PCI Express bus and two orders of magnitude faster to go in and out of a crosspoint switch. Well, you got to add the wires in though. The wires start to... Yeah, yeah. The physical wiring starts to matter. The wiring absolutely starts to matter. And by the way, in some ways to go back to like the kind of mechanical sympathy point, when you think about the machines, we're not just talking about the computers. We're also talking about the networking fabric and things like that. I think an aspect

Starting point is 00:49:37 of the performance of things that people often don't think about is serialization delay. Can you explain like what serialization delay is and how it plays into that story? We've been talking about networking at specific speeds. I can send one gigabit per second. I can send 10 gigabits per second, 25 gigabits per second, 40 gigabits per second, et cetera. I can't take data in at 10 gigabit and send it out at 25 gigabit. I have to have the data continuously available. I have to buffer enough and wait for enough to come in before I start sending because I can't underflow. I can't run out of data to deliver. Similarly, if I'm taking data in at 10 gig and trying to send it out at one gig, I can't really do this bit for bit. I've kind of got to queue some up and I've got to wait.

Starting point is 00:50:22 The lowest latency is happening at the same speeds where you can do that. And certainly the L1 cross points are operating at such a low level, as far as I understand, that certainly no speed conversions are happening at the latencies that I described. And just to kind of clarify the terminology, by serialization delay,

Starting point is 00:50:39 just like you were making this point that, oh yeah, when you're in at 10 gig and out at 25, it's like, well, you can't pause or anything, right? You have to have all of the data available at the high rate, which means you have to queue it up. When you send out a packet, it kind of has to be emitted in real time from beginning to end at a particular fixed rate. And that means there's a translation between how big the packet is and temporally how long it takes to get emitted onto the wire. There's a kind of electrically determined space-to-time conversion that's there. And so it means if you have a store-and-forward switch, and you have, say, a full, what's called an MTU, which is like the maximal transmission unit of an Ethernet switch, which is typically, you know, 1500 bytes-ish, that just takes a fixed amount of time.

Starting point is 00:51:28 Like on a 10 gig network, what does that translation look like? I think it roughly works out to something like a nanosecond per byte. And I think this comes back to the thing we were talking about in the beginning and a little bit of appreciation for multicast. So imagine I have 600 customers and I have one network card and I would like to write a message to all 600. Well, let's say the message is a thousand bytes. Okay. So that's about a microsecond per. So the last person in line is going to be, you know, 600 microseconds at a minimum behind

Starting point is 00:51:57 the first person in line. Whereas with multicast, if I can send one copy of that and have the switch replicate that in parallel, one of these layer one cross points, I'm getting that to everybody in something close to a microsecond. And that affects latency, but it also affects throughput. If it takes you a half a millisecond of wire time to just get the packets out the door, well, you could do at most 2,000 messages per second over that network card. And that's that, right? Again, this goes back to there are real physical limits imposed by the hardware that it can be as clever as you want, but there's just a limit to how much stuff you can emit over that one wire. And that's a hard constraint that's worth understanding. Multicast is a story of the technology that could, incredibly successful in this niche. There's other bits of networking technology that have a more complicated story. And I'm in particular thinking about

Starting point is 00:52:55 things like InfiniBand and Rocky. What is RDMA? What is InfiniBand? InfiniBand is a networking technology that was very ahead of its time. I think it's still used in supercomputing areas. And a lot of high-performance Ethernet has begged, borrowed, and stolen ideas from InfiniBand. InfiniBand provided things like reliable delivery at the hardware layer. They had APIs that allowed for zero copy IO, and they had the concept of remote direct memory access. So direct memory access is something that like peripherals, devices on your computer can use to sort of move memory around

Starting point is 00:53:38 without involving the CPU. The CPU doesn't have to stop what it's doing and copy a little bit over here, from here to there, from here to there. The device itself can say, okay, that memory over there, I just want you to put this data right there. And remote DMA extends that concept and says, I'd like to take this data and I'd like to put it on your machine over there in memory without your CPU being involved. And this is obviously powerful, but requires different APIs to interact with.

Starting point is 00:54:09 A number of the places I've been at used InfiniBand, some very much in production, some a little more experimentally. And there are some bumps in the road there. You know, InfiniBand had some of this problem where by default, it essentially had some flow control in hardware, meaning that it was concerned about network bandwidth and could slow down the sender.

Starting point is 00:54:33 So we'd have servers that didn't seem to be doing anything, but their network cards were sort of oversubscribed. They had more multicast groups than they could realistically sort of filter. And so they were pushing back on the sender. And so when we scaled it up to big infrastructures, we'd have market data slow down and it was very difficult to figure out why and to track down who was slowing that down. So the Ethernet model of like best effort and sort of fail fast and throw things away quickly is in some cases a little bit easier to get your head around and to debug. You mentioned that when we talk about multicast, one of the key issues with multicast is it's not reliable. We don't worry about dealing with people who can't keep

Starting point is 00:55:14 up, right? People who can't keep up, fall behind and have like a separate path to recover and that's that. And you just mentioned that InfiniBand had a notion of reliability and reliability is a two-edged sword, right? The way you make things reliable is in part by constraining what can be done. And so the pushback on senders of data is kind of part and parcel of these reliability guarantees, I'm assuming. Is that the right way of thinking about it? Yeah, I think that's a good way to think about it. But certainly the visibility and the debuggability could have been improved as well. And you mentioned Rocky. I never worked with it personally, but it was a way to sort of extend Ethernet to support the RDMA concept from InfiniBand. But I don't believe it

Starting point is 00:55:57 involved some proprietary technology still. So it was a little bit of like the embrace and extend approach applied to Ethernet. So when you look at the kinds of custom hardware that was being developed, I think there were sort of more interesting things happening in the commodity world than Rocky. We've spent a lot of time talking about the value of customizing and doing just exactly the right thing and understanding the hardware. I guess the Ethernet versus InfiniBand story is in some sense about the value of not customizing, of using the commodity thing. There is a strong lesson there.

Starting point is 00:56:28 I mean, I had a couple of instances over my career where I was very surprised at the power of commodity technologies. I was at a place that did telecommunications equipment, and they were doing special purpose devices for processing phone calls, phone number recognition, what number did you press, sorts of menus. And these had very special cards with digital signal processors and algorithms to do all of this detection, some basic voice recognition. And this is in the 90s.

Starting point is 00:57:03 And these were complex devices. And it turned out that somebody in the research office in California built a pure software version of the API that could use like a $14 card that was like sufficient to be able to generate ring voltages and could emulate like 80% of the product line in software. And when I saw that, I was like, I'm not really sure I want to work on in software. And when I saw that, I was like, I'm not really sure I want to work on custom hardware. I don't know that I want to sort of swim upstream against the relentless advance of x86 hardware and commodity vendors, like just the price performance. It's, you know, you've got a million people helping you, whereas in the other direction, you've got basically yourself. And it took a lot to get me convinced to consider some alternative things.

Starting point is 00:57:50 But I do think that trends around the way processors and memory latency are improving certainly make it clear that, I mean, just, you know, looking at things like deep learning and GPUs, like it's pretty clear that we're starting to see some gains from specializing again, even though I'd say the first 10 or 15 years of my career, it was pretty clear that commodity hardware was, was relentless. And it's worth saying, I think in some sense, the question of what is commodity hardware shifts over time. Like I think a standard joke in the networking world is

Starting point is 00:58:26 always bet on ethernet. You have no idea what it is, but the thing that's called ethernet is the thing that's going to win. And I think that has played out over multiple generations of networking hardware, whereas you see it stealing lots of ideas from other places, InfiniBand and whatever, but there is the chosen commodity thing and learning how to use that and how to identify what that thing is going to be is valuable. The work that we're doing now in custom hardware is also still sensitive to the fact that FPGAs themselves are a new kind of commodity hardware, but it's not the case that we actually have to go out and actually get fabricated a big collection of chips on one of those awesome reflective disks, we get to use a certain kind of commodity hardware

Starting point is 00:59:08 that lots of big manufacturers are actually getting better and better at producing bigger and more powerful and easier to use versions of these systems. Is there anything else that you see coming down the line in the world of networking that you think is gonna be increasingly relevant and important, say, over the next two to five years? I think what we're going to see is

Starting point is 00:59:28 a little bit more of the things we've been doing already, standardized and more common. So this sort of like user space polling and that form of IO, I think you're seeing some of that start to hit Linux with IO U-ring. So these are very, very, very similar models to what we've been already doing with a lot of our own cards, but now they're going to become a bit more standardized. You're going to see more IO devices meet that design, and then you're going to see more efficient zero copy polling sorts of things come down the line. You know, some of the newer networking technologies like 25 gig, I do think is going to have a decent amount of applicability. It is waiting for things like an

Starting point is 01:00:09 L1 crosspoint. And it is not always a clear net win. Some of the latency has gone up as we've gone to these higher signaling rates. You can be overcome with large quantities of data. The gain and serialization delay will overcome some of the baseline latency if the data gets big enough. But it's complicated. Can you say why it is? Why do switches that run at faster rates sometimes have higher latency than switches that are running at lower rates? I believe that that's... I mean, part of it is decisions by the vendor where they're sort of finding the right market for the mix of features and the sensitivity to latency.

Starting point is 01:00:51 I do think that we are at the mercy, so to speak, of some of the major buyers of hardware, which is probably cloud providers. That's just an enormous market. And so I do think that the requirements hew a little closer to that than they do for our specific industry. So we've got to contend with that. As the signaling rates go up, and again, I'm no expert here, but I think that you start to have to rely more on error correction and forward error correction is built into 25 gig and eats up a decent amount of latency if you have runs of any length. So that's also a thing that

Starting point is 01:01:25 we have to contend with and an added complexity. So I think it's going to be important. I think it's going to be something that does come to our industry and maybe quickly. I think at this point, there's a decent amount of 25 gig outside of the finance industry, but not quite as much in the trading space. And once you start to see a little bit of it, you'll see a lot very quickly. All right. Well, thanks a lot. This has been super fun. I've really enjoyed kind of walking through some of the history and some of the kind of low-level details of how this all works. I think we do this basically all the time, you and I. It's just kind of like now we're doing it for somebody else.

Starting point is 01:02:01 You can find links to more information about some of the topics we discussed, as well as a full transcript of the episode on signalsandthreads.com. And while you're at it, please rate us and review us on Apple Podcasts. Thanks for joining us and see you next week.

Signals and Threads - Multicast and the markets with Brian Nigito

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.