CppCast - Trading Systems

Starting point is 00:00:00 Episode 287 of CppCast with guest Carl Cook, recorded February 17th, 2021. Sponsor of this episode of CppCast is the PVS Studio team. The team promotes regular usage of static code analysis and the PVS Studio static analysis tool. In this episode, we discuss updates to VS Code. Then we talk to Carl Cook from Optiver. Carl talks to us about high frequency training. Welcome to episode 287 of CppCast, the first podcast for C++ developers by C++ developers. I'm your host, Rob Irving, joined by my co-host, Jason Turner. Jason, how are you doing today? I'm all right, Rob. How are you doing?

Starting point is 00:01:20 Doing okay. I guess we should talk a little bit about the awful weather that's going on right now. Like half of Texas lost power, I think. It's very scary. Hopefully they get that resolved soon. How are you doing in Colorado? Oh, I mean, we've had winter weather, but it's Colorado. You're used to it. We had winter weather.

Starting point is 00:01:39 Yeah. Like, oh, no, there's some snow on the ground and it's cold. I mean, we were cold. We were negative 8 Fahrenheit, negative 23 C. Wow. And then when I looked at one point, I actually saw a reading of negative 13 Fahrenheit, which I think was something like negative 30 Celsius. But still, it didn't set any records. Right. I mean, it was something like negative 30 Celsius. But still, it didn't set any records.

Starting point is 00:02:09 I mean, it was cold for a few days. I didn't get anything. We've just been getting rain over and over. Well, I mean, on the upside, from our perspective, a little bit of snow got our river valley here. Our winter snow totals are almost up to normal finally, which have below normal and it's really important for the farmers out here it's like a big deal well uh at the top of every episode i tried a piece of feedback uh this week i actually got a message from abigail martinson on linkedin uh she connected to me and wrote i started listening to cppcast last year to help as i talked to C++ engineers for work, and I now look forward to listening each week.

Starting point is 00:02:49 As a recruiter, I know how most engineers view me, so I think it's important to become as knowledgeable as possible, and your podcast has played a huge part in that. I just thought that was really cool that non-programmers are listening to the podcast and making use out of it. I know I haven't been recruited in a while, but if a recruiter reached out to me and knew their stuff with C++, I would probably be a bit more responsive when talking to them. Doesn't your boss listen to the podcast? Maybe. I'm not sure he still does. I'm not looking to get recruited right now. If I was looking for a new job

Starting point is 00:03:26 and the recruiter I was working with, you know, knew their stuff with C++, I think that'd be pretty cool. Yeah. Yeah. I mean, if you don't get like must have five years experience with C++17 kind of, you know, requirements or something like that, then yeah, that's, that's handy. I know I, I do hate it when I get like those out of the blue emails, like, oh, you know, we're looking for someone with all this experience in like JavaScript or something like that. I don't even use. But right. Yeah. So I thought that was cool.

Starting point is 00:03:54 We'd love to hear your thoughts about the show. You can always reach out to us on Facebook, Twitter or email us at feedback at cpcast.com. And don't forget to leave us a review on iTunes or subscribe on YouTube. Joining us today is Carl Cook. Carl has a PhD in computer science from the University of Canterbury, New Zealand. Since graduating in 2005, he has worked mainly within finance,

Starting point is 00:04:15 ranging from hedge funds and market makers through to cryptocurrency trading. Carl is interested in high performance and low latency systems and enjoys the competitive and challenging nature of trading. As a hobby, he enjoys recreational flying and sees parallels between the safety built into aviation and the safety in place within regulated financial markets. Carl, welcome to the show. Thank you very much. Good to be here.

Starting point is 00:04:36 So you mean flying actual airplanes, right? Yeah. Yeah. Although at the moment with COVID, it's just on flight simulator, but yeah, I want to get a chance, actual airplanes. Well, that's interesting. I mean, in all seriousness, why would COVID limit your ability to rent a plane and go flying? There are curfews in the Netherlands, for example. There are curfews on sort of, but you can exercise and recreate during the day if you want. But also, it's not an entirely nice experience with, you know, face masks and things like that right now. It just takes the fun out of it a little bit.

Starting point is 00:05:13 Okay. Well, Carl, we got a couple news articles to discuss. Feel free to comment on any of these and we'll start talking more about your work and high-frequency trading. Okay. All right. So this first one is a blog post about Kronos releasing SICKL 2020 for C++ heterogeneous parallel programming. And I think we talked about this release when it was still upcoming with Michael Wong a couple of months ago.

Starting point is 00:05:41 Right, Jason? That sounds believable. Yeah. So I'm not sure how much to go into detail on this, but if you want to hear about what's in this release, you can probably go back to that episode or read into this blog post. Is there anything you wanted to highlight here, Jason?

Starting point is 00:05:59 Well, I do find it interesting that they jumped 2019 version numbers because they went from 1.2 point X to 2020. But in all seriousness, I've actually been looking at sickle lately and it's been on my to do list for like years to actually play with some of this GPU programming stuff. And I think this is for me, it looks like this would be the gateway drug here because it is,

Starting point is 00:06:23 it is cross platform. Right. And that's what I would, I gateway drug here because it is cross-platform. Right. And that's what I would want for the kinds of things that I do. But I haven't actually tried it yet. Is that possibly in an upcoming C++ Weekly episode? Yes, but at this rate, it'll be in like 18 months. What about you, Carl? Do you use any of these higher higher level abstractions for gpu programming

Starting point is 00:06:47 not not directly um we do a small amount of gpu programming uh at work but i'm fairly well detached from that side of things i'm very much i think the same as you i'm just watching this sort of from the sidelines at the moment and um yeah when i get a rainy afternoon or a rainy week one day i might dive in a little bit deeper i looked at the programming model and i'm like i'm just going to give this a try this afternoon and then i looked at the programming model and i'm like i'm gonna do it next week well what is it called the the one that uh nvidia is working on that is more like you can just call C++17 parallel algorithms directly, that is a just this afternoon, I'll play with it. But I don't have an NVIDIA GPU on a computer that

Starting point is 00:07:34 I have a C++ compiler on. I do have an NVIDIA GPU, but it's in the system I use for gaming and such. It's got like different purpose for me. And besides that's running Windows. Stoodpar, that's what it was called, right? Yeah. So that's limited to Linux with an NVIDIA GPU, which is not a combination I have direct access to at the moment. Right. Yeah.

Starting point is 00:07:57 Okay. Next thing we have is a post on the Visual C++ blog. And this one is Visual Studio Code C++ Extension Cross-Compilation IntelliSense Configurations. And this is pretty neat. You know, they keep adding more and more to the C++ support in Visual Studio Code. What this post is about is if you're doing cross-compilation,

Starting point is 00:08:20 so you're on a Mac and you're targeting Linux, for one example, then instead of all the IntelliSense being based on what it perceives as a default Mac environment, it'll actually look, oh, you're using a Linux compiler, then it's going to give you Linux-based IntelliSense. So that's pretty neat. Yeah. Yeah.

Starting point is 00:08:40 Do you make use of Visual Studio Code, or what is your IDE of choice, Carl? Yeah, that's a religious war right there and there, right? Let's go. I personally used to use Visual Studio many years ago, and I know a lot of people who use Visual Studio for C++. Most of my coworkers are actually using Qt Creator which is fine, just awesome

Starting point is 00:09:08 and saying that VS Code is definitely turning up in large numbers now most of our new starters seem to gravitate towards it for some reason and I haven't seen anything particularly wrong with it it seems to be doing the job quite well typically compilation is actually done off on Linux servers And I haven't seen anything particularly wrong with it. It seems to be doing the job quite well. Typically, compilation is actually done off on Linux servers, somewhere with just using your Windows desktop to run code. And it seems to work pretty well.

Starting point is 00:09:39 You mentioned new developers seem to be gravitating towards it. I wonder if it's becoming popular in university amongst you know engineers and training that's a good question i mean i i suspect when new developers turn up and they look at it and they go oh what i'm gonna do i finally get a chance to start from scratch here you know do i really want to be building out a massive you know virc or something like that maybe this is a chance where they try something new and shiny and um people seem to get started with it pretty quickly. And they don't stop using it once they've got it set up. So, yeah, I'm relatively positive about it. That's an interesting point because I do recall one of my earlier jobs

Starting point is 00:10:18 before I had really set myself in my ways to become an old man set in my ways. I was like, I'm starting a shiny new job. And I tried KDevelop at the time and used it for a while until I ended up moving to Vim actually, which I think we talked about KDevelop briefly a couple of times on the show. Yeah. Which could be seen as a predecessor in a way to Qt creator, I guess. Maybe.

Starting point is 00:10:44 I mean, don't get me wrong. I have VI key bindings in Qt. So I just try to take the best of everything I can, and it works all right. Yeah. Well, yeah, and now I use CLion for most of my development with the VI bindings enabled. Yeah.

Starting point is 00:11:00 Is that an option in VS Code? I'd imagine it probably is. It's an option everywhere. It's an option in Compiler Explorer if you really want to. I do know at work a lot of people do use CLI because I see the Java processes being spawned all over the dev servers. It can be a bit of a problem if you have a very large C++ project as it consumes all available memory.

Starting point is 00:11:24 Yeah, it seems to consume all available CPUs as well. It's getting better, yes. It's definitely getting better. Okay, and then the last thing we have is, this is a post on QuantLab Financial's GitHub, and I don't know how long ago they started this, but they're doing these C++ tip of the weeks, doing these c++ tip of the weeks and modern c++ tip of the weeks and i think it's one of our previous guests who is uh working on these right jason that's two of our previous guests actually if you look at uh lenny maiorani

Starting point is 00:11:58 and chris jusiak we've had both on the show in the past um Yeah. Yeah. So it looks like they're up to 213 tips. I'm sure there are plenty of really good ones in here. Did you look through any of the recent ones, Jason? I started to click around. Sorry. And then I got distracted by something else at the moment. You know how it goes. But yeah, I was looking at.

Starting point is 00:12:20 No. Yeah. I didn't dig into any of them. Although I do notice now that you say something that it says episode or whatever, 213, but only goes back to 182. I don't know if they're pruning older ones or what. Or if they only just started making this thing public in the last six months. That's not immediately clear to me. There's 52 commits to the repository.

Starting point is 00:12:43 That is a good point. I doubt they would just prune them. Well, but GitHub gets sad once you have several hundred files in a directory. I work on projects that break GitHub. You can't open certain folders on GitHub. Not ideal. I'm curious about that now. Maybe we should ping him on Chris or Lenny on Twitter and find out what the answer is to that. All right. Well, Carl, do you want to start off by just telling us

Starting point is 00:13:10 a little bit about the work you do at Optiver and how C++ factors into that? Yeah, yeah, sure. So I work within the sort of the core auto trading team within Optiver Amsterdam. So my team develops all of the auto traders that we trade on, goodness, probably about 30 or 40 exchanges around the world. Optiva has probably about five or six offices,

Starting point is 00:13:38 including an office in Chicago, an office in London, office in Sydney, office in Beijing. But we tend to trade actually across from a single office. We'll trade, for example, the Dutch office will happily trade in the US markets as well, in the South American markets. So about half the job is working with the exchanges and writing code to actually function within the exchanges to be buying and selling financial instruments and probably the other half of the job is just working

Starting point is 00:14:12 on the algorithms the actual trading strategies behind the effectively what we use to decide when we buy when we sell yeah that, that would be a pretty simple description. Everything, basically everything is in C++ from within my team and the vast majority of my office would be C++. A little bit of Python here and there, a little bit of C Sharp for some of the user interfaces. But then again, a lot of the interfaces,

Starting point is 00:14:46 user interfaces are in C++ as well. Now, this is not at all C++ related, but when you said that you're happy to trade in whatever markets you can from wherever the office happens to be, and maybe you can't speak to this at all, but it kind of sounds like Optiver would end up competing against themselves

Starting point is 00:15:01 if different offices are trading in the same exchanges. Yeah, to some degree we do. These are separate trading entities and that's fine. So I should qualify this by the vast majority of our work that we do is what we call market making, which is effectively printing what we're willing to buy and what we're willing to sell for. So it's passive. And we're there on the exchange, basically providing the liquidity to the market. You'll find the market makers are actually the companies behind the vast majority of people willing to list or print their prices to the exchange.

Starting point is 00:15:40 And so, yeah, indeed, sometimes we will end up improving on another office's prices. I mean, there isn't a lot of overlap, but there is something that does happen sometimes. Then it's actually quite interesting to see who ultimately has the better trading strategy. Interesting. Yeah. So what are some of the unique challenges about programming for low latency trading that cause your whole team to work in almost entirely C++? Yeah. So the Dutch office has been around for about 35 years. We've been writing our own auto trading systems for at least 15 years.

Starting point is 00:16:26 So it's quite a big office. There's a lot of existing systems, a lot of integration points, a lot of components, a lot of moving parts, quite a complex sort of risk and assurance set of systems in there as well. We started with C++ and basically never looked back in that respect. And what's happened recently within the last few years is we've actually made even more of a conscious decision to write more and more of what we can in C++

Starting point is 00:16:59 because for us, it actually really works quite nicely. And so we have our own you know networking libraries that for just for communication within the office within the different components that you know it works really well and we have you know a nice sort of you know client server architecture we have nice protocol serialization and deserialization libraries, sort of our own inbuilt reflection and things like that. The more that we, I should just say it another way, we just kind of realized that, you know, we've got everything we need here.

Starting point is 00:17:35 And so more and more we throw away components written in other languages and just have a very quite simple C++ stack for most things, I should say. It does seem sometimes that a problem is easier to express in a different language for whatever reason, like maybe a functional programming paradigm or something. I mean, we can do that in C++. So do you have any strategies at all for when it's, you know, do you embed scripting languages into C++ or anything like that? Do you know what I'm trying to say? Yeah, yeah, it's, you know, do you embed scripting languages into C++ or anything like that?

Starting point is 00:18:06 Do you know what I'm trying to say? Yeah, yeah, it's a good question. We've definitely gone down that route a few times. The traders at present do have a scripting language that they can use, which then gets converted into C++ for us. I'm not even sure what the current incarnation is, but we did use Lua for quite some time.

Starting point is 00:18:25 But again, that all basically gets converted back into, I think it's parsed and processed by our backing servers, which are all C++. Look, I mean, for sure, functional languages can be really nice for certain problems. So I guess the exception here is that the vast majority of our research code would probably be Python. Okay. Because of course, you've got all of the goodies, all the numerical goodies and all of the graphing, all of these cool things that you can do, such as this, you know,

Starting point is 00:18:58 real-time analysis with Juniper notebooks and just, you know, the really nice number-hackling libraries. In saying that, all of our models for predicting, well, for effectively generating prices or generating other sort of mathematical attributes of shares, they're all in C++, and we just put Python bindings over everything. So whenever there's a sort of a need to do something, either to get access to something that's written in C++ or to get the performance of C++, we just

Starting point is 00:19:34 write it in C++, put a Python binding on top of it, and then that makes it accessible to everyone else as well. Kind of the same approach that a lot of AI research has gone, I believe. That's interesting. A lot of the research has gone, I believe. That's interesting. A lot of the neural network libraries, I believe, are written similarly. We talked to someone about that at some point.

Starting point is 00:19:51 Yeah, that sounds right to me. Yeah. Yeah. I mean, we used to try to have a dual implementation, so we might have... We actually... Yeah, we did have Python implementations of, you know, different pricing models and things,

Starting point is 00:20:04 but they just get too hard to maintain two different copies you get mathematical inaccuracies and it's two code bases you're better off better off with one right makes sense yeah so uh one thing we haven't asked yet is are you able to keep up with the latest standards are you you know using c++17 currently? Are you already looking at C++20? Yeah, it's an interesting one. So we don't get much benefit at all out of using the latest, greatest compilers or the latest, greatest language features. In fact, to a degree, we tend to just stand back a little bit and make sure that things are stable. So that's pretty important for us. The other thing, I mean, not that stability is

Starting point is 00:20:51 an issue. It's not like, you know, there's problems where early versions of the compilers just have terrible implementations. That's not the case at all. What we do find is a challenge is to keep our code really clean and keep our architecture clean, keep the code consistent, keep it understandable, keep it readable. It's actually a real challenge within trading because things change very, very quickly. And something you could be working on today, you might have to wrap that up tomorrow.

Starting point is 00:21:18 A new opportunity might come along. It's potential that, hey, we're not going to use the strategy anymore as of next week, that one goes. And so there's a huge churn and turnover, a huge number of commits, quite a lot of people working on basically the same things. So we have to be very, very careful not to, for example, go, hey, coroutines are awesome. Let's just put coroutines into a quarter of our code base and see how that works out.

Starting point is 00:21:45 And we have to be very very very careful in that respect i'm curious about what exactly these strategies look like that you're constantly rewriting and then possibly abandoning you know one week after another what are these you know can you describe these a little bit more maybe um yeah i'm just trying to think too much detail that's fine i'm just trying to actually think of an example to back up what i'm saying now um well an example might be potentially a market that we thought was really interesting or we thought that um a potential like a possible new sector on a market was really interesting we start trading there it turns out that we were wrong we just didn't have the numbers right and so that would be a market where we might go right we're we're not going to trade

Starting point is 00:22:29 there anymore or not for the meantime um another example yeah it could just simply be a strategy that um we thought would um be profitable it turns out that the trades we were trading a lot but the trades weren't profitable um in the long run and you know that could be anything from a implementation error which you can fix to just fundamentally we got our numbers wrong in that case we'd walk away from that strategy and markets change markets change all the time so you've got to keep up with that so does the c++ code in any way itself adapt to these market changes like do you use like profile guided optimization or something like that to help tune your code yeah so yeah that's a good question we do use profile guided optimization a little bit or we have in the past a little bit um generally we're

Starting point is 00:23:25 not too sensitive to those sorts of market changes um so it's kind of too too too past your question really um no we generally just write a one-size-fits-all sort of fundamental system um that should be able to trade on most markets and we sometimes have to be in the architecture for special cases or we have to take a step back and refactor and have a look and go you know what we didn't quite model this right let's do this again and properly in the next iteration now that we know a little bit more so that's kind of answering the how well does our code work just as markets change answering the question about profile-guided optimization,

Starting point is 00:24:06 we haven't had a huge amount of success with that. Other trading companies might have. But the problem that we find, it's a pretty classic problem. It's just one of basically overfitting the model. And so we can get an auto trader or an auto quota absolutely flying for a day's replayed data so if we ever get that day again with exactly the same trading events you know we can we're gonna be pretty damn pretty damn quick unfortunately the 99.9 percent of the time where that day doesn't happen to be replayed it's slower

Starting point is 00:24:42 yeah if you knew the day was going to be replayed you would do all kinds of things differently i would think not just need to do them faster yeah so that so that's that's the um trade-off that we have is we want to be fast that that's important um you know that if you're not within a reasonable balance of performance you're not really in the game at all but also on the days when you do get strange behavior so a very very busy trading day like a day that no one's all coming or something else that that really the markets are really really very hot you also need to be able to handle that and so there's always that tension there. Because if you push it too much, you'll probably find that on a day with a huge amount of volume, you're out of the

Starting point is 00:25:30 market, which is not what you want. In fact, with Optiva and any other registered market maker, you can't be out of the market. There's regulatory obligations to be in the market, to be trading for the entire time that that market is um open to trading oh interesting otherwise you risk getting fined or something yeah so yeah well indeed or um uh kicked kicked off the market um because that would be worse yeah well if i mean the markets need it's a symbiotic relationship the markets or the exchanges need the market makers to provide the prices. And you need to provide the prices all the time. Like say with a couple of weeks ago with GameStop,

Starting point is 00:26:11 I mean, that caused a lot of volatility in the market. It'd be quite easy to just turn off the systems, you know, and walk away and go, hey, like this week is just too hard for us. There's too much craziness going around. But that's exactly the time that the exchanges need the market makers to be there, to just be providing a bit of stability to as much of the market as they can. One thing I was curious about is you mentioned there being a lot of churn of new commits coming in all the time.

Starting point is 00:26:39 Do you have a lot of good practices in place to handle that? Are you, you know, co-reviewing every single commit going in i kind of imagine you must be yeah someone's not gonna bring down the whole trading network yeah yeah so from a regulatory point of view um and in the last five to seven years the regulation has has just increased you know three threefold um it's been a heck of a lot more regulation um in the u.s markets but actually pretty much all around the world so you need to be able to um prove that your systems are in control at all times you need to be able to um prove that you've done a prudent amount of

Starting point is 00:27:18 testing for any release of any software that goes near the exchange which makes sense because it has happened a couple of times where participants and exchanges have had code that hasn't worked and it has caused material damage to the markets it's caused spikes in stocks it's caused you know flash crashes a couple of times so you absolutely need to prove that you know that you're not going to have a system that's going to do that to the exchange what does this proof look like like do you have a large set of unit test suites or integration to us or like higher level like yeah so there's there's guidance on this from the regulators but for us we um absolutely absolutely have a huge amount of unit tests, but we record every run of unit tests.

Starting point is 00:28:09 We record the output of those unit tests. That gets committed. We record any log files that were produced. We record what the output was versus the expected output, and that's actually code-reviewed. So those commits are code-reviewed, and those commits are committed as well. Does somebody review those commit?

Starting point is 00:28:30 Oh, sorry. And then we have a suite of automated tests as well, which takes an hour or two to run just to make sure that there's been no regressions as well. Interesting. make sure that there's been no regressions as well interesting hour two would be considered a very long test suite by some people's standards yeah um to be honest i could i could make a change in the morning and have that running in the exchanges in the afternoon okay as long as um i can get other people to review what I've done, both the Agile code changes and the review of the testing

Starting point is 00:29:12 that's been done as well. There'd be a few eyebrows raised if I did want to get something into production that quickly. But it's actually not too bad. I mean, one, we have to do it from a regulatory point of view But two it's kind of nice being able to sleep easy at night as well No, not knowing that gee tomorrow wonder if that codes gonna work all right or not So it's kind of a there's a net natural sort of motivation to to making sure that things are pretty well covered

Starting point is 00:29:42 Have you gotten the 3 a.m. The Japanese market is probably a problem phone call that you have to go deal with? Not with OptiVu, but I have with other companies I've worked for. To be honest, though, the financial system is pretty amazing in that respect. All trades that we do, we have internal processes to make sure that our orders and trades look correct we've got a huge amount of risk checking around that and you know just sort of monetary checking around that as well but then there's external parties looking at this all the time as well so the exchanges are looking at it but for every trade and every order that we send actually that gets a copy of that gets sent to an external company that's running

Starting point is 00:30:26 a whole bunch of analytics and machine learning on it as well, just to spot, to make sure that everything's going okay. Sponsor of this episode of CppCast is the PVS Studio team. The team develops the PVS Studio Static Code Analyzer, which detects errors in C, C++, C Sharp, and Java code. When you use the analyzer regularly, you can spot and fix many errors right after you write new code. This means your team is more productive during code reviews and has more time to discuss algorithms and high-level errors. Let the analyzer that never

Starting point is 00:30:55 gets tired do the tedious work of sifting through the boring parts of code looking for typos. For example, let it check comparison functions. Why comparison functions? Click the link in the podcast description to find out. Remember that you can extend the PVS Studio trial period from one week to one month. Just use the CppCast hashtag when requesting your license. I don't want to derail too much from the technical discussion, but one thing you said a moment ago interested me when you talked about all this regulation going into, changes you're making in the software

Starting point is 00:31:31 and how that regulation has become, I guess more strict over the past few years. And you mentioned the GameStop thing. And I thought one of the things that was being said in the news was that there was really a lack of regulation and that, you know, there were hedge funds buying like 50% more of the shorts than should have been available. And I was just kind of curious about that. Yep. So we've definitely derailed the technical side.

Starting point is 00:31:57 Okay. That's fine. I can handle that. Yeah. So, I mean, the markets are very heavily regulated. Okay. I can only short sell as much as the market allows and there's actually a there's a very well thought of um sort of layered system of protection here so when you buy shares it's not you buying shares directly from the

Starting point is 00:32:18 exchange it's you buying them from a broker that ultimately will go to a clearer which will ultimately go to the exchange and so if for example something went really wrong the worst case scenario probably is that the broker fails i think once in history maybe a clearing house has failed it would never pull down an exchange with something like this so i'm not entirely sure where the accusations of breaking regulations come from because if there are regulations that have been broken that'll be investigated um i don't know i i i can't imagine a scenario and where people either selling a huge amount of game stock or buying a huge amount of game stock

Starting point is 00:32:57 actually is a regulatory breach i'm not sure i just remember hearing that stat that like you know 150 150 percent of the shorts that should have been available were bought you know it does seem well okay so we can move on though it's okay well i'll tell you what i'll say one thing being able to sell one unit of stock that you don't own is somewhat hard to understand so i just say it's slightly but it's even more harder to understand how you can sell more than a hundred percent of the stock that you don't own so yeah short short selling is a very interesting um thorny debate that's been going on ever since markets were invented and now i'm just curious if rob got in on that oh no game stop bubble no i mean i heard about it after it was all happening.

Starting point is 00:33:47 Kind of wish I had, but no, I was not into any of that. No, I looked at that and I said, yeah, okay, whatever. Maybe back to the technical a little bit. I want to maybe just clarify something because you're talking about how all of the Optiver offices are independent and you're talking about you're at Optiver, an exclusive, almost exclusive C++ shop. I'm just curious, is the C++ code that you work on shared across the offices? Yeah, it's a great question. No, we don't. We tried that. It didn't work so well. Interesting. Really? So we, are you guys serious about the interesting?

Starting point is 00:34:26 Or was that a sarcastic? No, I'm actually. I'm a little surprised that an international company doing the same work in all their different offices wouldn't have a shared code base. I mean, compared to like Google with their mono repo, 100 billion lines of code that everyone works on. Yeah, it's like the evil opposite. Yeah. Yeah. So, boy, we've tried it a few times,

Starting point is 00:34:46 and I still wake up in sweats at nighttime sometimes thinking about it. So Git wasn't around when we first started doing this, so that's the first point I'd like to make. Okay. So using CVS or... Yeah, we're using Subversion. Okay, that's better.

Starting point is 00:35:01 Yeah, but it's very difficult because yeah yeah sure so some core core libraries and we have a lot of really cool libraries um they were sort of shared for a while um because we gave up on sharing strategies and the sort of high level things um it just got a little bit too difficult because the market traded out of Sydney is often very, very different from the European market, for example. Just completely different regulations and just different logic, even if it's relatively the same style of trading. It just got too hard. We just spent all of our time resolving merge conflicts and fighting bugs that came out of resolved merge conflicts because, you know, didn't merge them correctly.

Starting point is 00:35:46 That's quite hard to spot sometimes. So our very, very sort of lowest level library, so, you know, the event model that we have, the logging, the inter-process communication, shared memory, those sorts of things, yeah, that was shared for a while and that just got too hard as well. And so we eventually went,

Starting point is 00:36:06 we're just running our own um fork of the repo per per office amsterdam and sydney um have roughly the same code base more or less or a fork of the same code base like like that look relatively similar, I think. The US office is basically purely C. That's sort of the direction they've gone in now as well. Interesting. That is interesting. Yeah, I'm not just saying it. Yeah, so I think the argument there is that, you know, there are complexities with C++. You have to be really disciplined when the code is moving around so much.

Starting point is 00:36:50 And in the US, I've just found that actually C seems to be enough for them, and they're pretty happy with that choice. Within Amsterdam, there's probably no way we'd do that. We're very much in the C++ camp. Do you ever get like a call or a Slack message from another office? Hey, Carl, you won't believe we just made this change. It had a huge impact. You guys need to reconsider this part of the code or something like that.

Starting point is 00:37:18 Yeah. So now that does happen because the idea sharing is there for sure. Yeah. But it is interesting because if you have you know four or five different offices all trying to solve the same problem um pretty much independently it's kind of like one of these genetic algorithms right like the best one wins and then the information is shared around so oh for sure the information sharing is there but it's not done via code it's done more at the sort of um

Starting point is 00:37:45 face-to-face level it's interesting effectively having the teams compete against each other to see who's got the i mean not not really directly but yeah yeah yeah i mean i use the word compete it's a little bit of a strong possibly a stronger word than reality um right. But yeah, it's by no mistake that we have this model. Okay. But just with the amount of code changes we make, particularly across the offices, yeah, honestly, I would not like to go back to the dates of global repo. And that's with a relatively low number of developers. I mean, I don't know the exact

Starting point is 00:38:25 numbers but i'll be guessing around about an average of 50 to 100 developers per office okay can you give us an idea approximately how large your code base is our one would we're not entirely sure okay because i ran some numbers the other, and it's actually a little bit hard because where are all the repos within Git? We have a lot of different repos. We think somewhere between 1 million lines of code to 5 million. That's pretty wide. Yeah, that's a wide range. Big, but not gigantic.

Starting point is 00:38:59 Yeah, and that's for my office. I think the other offices would be broadly about the same. And then our applications are, I just ran some basic word counts, line counts over them. Seem to be about 200,000, 250,000 lines of code for an auto trader or for a thing that does this on the market. About 10,000 to 20,000 lines of code is actually custom code for that auto trader or auto quota or whatever it is the remaining 190,000

Starting point is 00:39:32 lines is common libraries that we use to build up it's the base for our for our applications and of those common libraries I'm guessing somewhere between we only use about 10 to 20% of those per application. So this was just sort of a shotgun approach of getting some lines of code metrics. And then another thing is we build from source every time. There's no concept of headers and libraries. We just literally with Git bring everything in,

Starting point is 00:40:04 CMake it, build it, grab a coffee, grab another coffee, come back. I'm guessing if you're in the Amsterdam office, you have one of those nice fully automated espresso machines as well, which I tend to see across the Netherlands. We actually have an in-house barista. Oh, that's even better. I've only been to three offices that had an in-house barista. Okay.

Starting point is 00:40:32 Yeah, well, the office is in lockdown right now, so we all got sent a coffee machine to home. That'll do. Did they really send you a coffee machine? For a Christmas gift, yeah. Which has been getting some serious use that's cool kind of yeah same time i also i also learned that making good coffee is difficult uh yeah so i know we asked earlier um if you're you know on the latest version of c++ and you

Starting point is 00:41:03 said you're not necessarily uh on the latest and greatest is there anything you're on the latest version of C++ and you said you're not necessarily on the latest and greatest, is there anything you're looking for from C++ that you would like to be standardized to make the type of work you do easier? Yeah, for sure. So one of the reasons we're not on the latest and greatest is we tend to be pretty conservative with our servers. So we typically run a pretty mature version

Starting point is 00:41:24 of Enterprise Red Hat or CentOS or whatever it happens to be. That's a little down. Yeah, yeah. So we use their dev tool set. So we're currently on GCC 8 with the option to run GCC 9 if we want. I still suspect, I haven't checked,

Starting point is 00:41:40 but I still suspect things like small string optimization are not in there with the patched version of gcc that use i could be wrong but i yeah the copy on right change yeah for binary compatibility ah you might have to stupid abis yeah so um so yeah so we're a little like we don't we don't take you know gcc team for for example, just because they're not the cards that we've been dealt. And that's actually fine. We do use, we use a few features of, there's a few features of 20 that we use at the moment. We use most new features that are available on the compiler.

Starting point is 00:42:28 23 is interesting with the BitCast. We were talking about that today actually at work. We have a few potential issues with alignment and then we're like, oh, how are we actually gonna fix this? And it's like, oh, what do you know? It's in 23, problem solved um so that's that's kind of cool um coroutines is something i do want to look at more because our code is with with lambdas lambdas were awesome for us like lambdas just cleaned up our code base

Starting point is 00:42:57 um you know we can we could just go hey here i'm going to call something i don't want to know when you're ready but when you are ready well i don't care how long it takes but when you are ready i'm going to pass you you know this lambda to execute uh you pass a standard function but lambda same thing so our code really got cleaned up for a while but unfortunately then we've kind of pushed lambdas a little bit too far we are we have a lambda which creates another lambda that's fine so yeah but you can't you can't really go far it's it's fine yeah use uses lambdas as much as you want to so i mean you know you have to scroll right on your screen quite a long way to actually figure out when you're done um and so it's just the way that we program we program in this

Starting point is 00:43:44 asynchronous kind of style. I mean, we have no blocking call. That would be a nightmare, right? So we always just do work until we can't anymore, give that to something else to deal with, and then we do work where we can. And this is the way our event call works. We just spin whenever there's something to do, we do it.

Starting point is 00:44:02 If there's no event coming in, then we just spin again until an event comes in, like a new price arrives in the market and we need to figure out what to do with that. It's like you've written your own task scheduler. Yeah. Oh, look, everyone does. Or you can use ASIO.

Starting point is 00:44:20 I've used ASIO in the past, and that was fine using its um event event engine or whatever they call it i can't remember now um it's been too long since i've used it also actually i've been impressed with our masio for event-based programming except for a couple of weird bugs around timers which uh bite you in lifetime but we'll just park that for now um so yeah i mean we have a handwritten event model and i'm sure all trading companies do because you just want to know exactly what's going to happen all the time um you don't want it you know the event model to all of a sudden be doing a memory

Starting point is 00:44:56 cleanup or or some house clean cleaning tasks um but the more and more i look at our code the more i think that in a perfect world i think co-outines could actually make our code a bit easier to read, a bit simpler. If you end up deploying coroutines and have a huge success with it, then you have to come back on and explain to us what you did. And if it's a massive failure... Then come back on and explain to us why. Yeah, it'd be something that we'd have to do quite slow as well like we'd have to be quite careful about it um i have to be pretty careful

Starting point is 00:45:30 to make sure that we don't get caught out by object lifetimes or you know allocation where we weren't expecting it also the ability to debug as well um would actually come into it as well um a little bit um so this would be more sort of trial and error with a small project and just see how it goes. But I wouldn't be overly surprised if in a year or two from now, we are all about coroutines. We'll see. But the other thing that I think is missing,

Starting point is 00:45:55 and I haven't been looking too much at the proposals, but shared memory, shared memory communication and process communication. Yeah, it'd be awesome to have that as standardized support. I don't know if there's anything like that in the pipeline, although there's still discussion about transactional memory, which is tangentially related, I think. So there's the Boost inter-process library as well,

Starting point is 00:46:19 which we use a little bit, but we've just found that we've hand-coded the majority of what we need, which is just effectively fast shared memory messaging from process to process on the same box. I know other trading companies are doing this because I've got friends in other companies. Everyone's doing this.

Starting point is 00:46:39 Everyone's hand-coding it. Everyone's hitting the same mistakes. In fact, in your previous podcast, I forget his name, but he was working on the PowerPoint plugins. He was. Yeah. He was also doing his own shared memory kind of work. Yeah.

Starting point is 00:46:56 Yeah. I mean, those, you know, I'm making files and what happens if another process with different user permissions use that same file name and crashed uncleanly and the kernel didn't clean it up, and then you start up and you can't map the file name that you've picked because Linux isn't going to let you lock that file. Yeah, I mean, these problems are solved,

Starting point is 00:47:19 but it's interesting that everyone has to solve them themselves. Right. Just out of curiosity what um does it look like to debug on this software do you have some type of mock trading system that you're able to work with in order to actually test your code um to are you talking about testing in particular or are you talking about debugging some of this? Debugging testing, yeah. So the exchanges will offer test exchanges with varying degrees of quality. Some exchanges are excellent.

Starting point is 00:47:55 Some exchanges are bordering on non-existent. So that's obviously a little bit of a concern. But there are certainly test exchanges that you can test on the problem is generating the exact sequence of responses that you want um some exchanges are great and they have really good automated support for doing that or semi-automated support other exchanges you have to try to simulate your own sequence of events such as putting in an order on a stock that hopefully no one else in the test exchange is also trying to test on at that point in time. Then send in an imposing order

Starting point is 00:48:32 for maybe only half the volume, see if that trades, see what happens to the rest of the order. Yeah, that's not a huge amount of fun. So to a a degree we have exchange simulators to just simulate what we expect the exchange to be doing obviously that has a little bit of risk as well exchanges however when you do a software update such as using a new protocol of the exchange so they might upgrade their API they typically have a certification process as well so we sit there with the exchange on the phone go through all of the likely scenarios, and they tick it off to make sure that they're seeing what they expect to at their end. For actual debugging, say, for example, we had an issue,

Starting point is 00:49:15 typically we'll just crash out and restart again. We try to not recover from errors that shouldn't be happening. That typically means that there's a bug. And so when I say we restart restart often we don't automatically restart but often we'll roll back just to the last known stable version something happens we've got a quarter we've got the symbols in there um you know just trying to work out what's happening but a huge amount of uh real-time metrics is going on as well. So, I mean, typically we'll catch something before it really goes bad.

Starting point is 00:49:51 If, you know, there's excess memory consumption or excess allocations or excess messages between us and the exchange, that'll be flagged pretty quickly. That's actually, that's really interesting. I'm familiar with systems that, you know, obviously like a watchdog or something, or if there's a crash, then they just come right back up again. But the idea of not just coming right back up again but you know having available and easy to access the last known binary or whatever is that you could roll back to yeah and look it's all in git you know right any right all of that configuration is done through

Starting point is 00:50:22 git um so if we want to upgrade it's's a code-reviewed pull request, and then that will automatically, once that's been approved, that's triggered to upgrade in production. And so if we need to roll back, it's very simple to figure out how to do that. Interesting. Although we are looking at, this is an idea that Sydney have, which is quite cool,

Starting point is 00:50:43 is actually just return values on when we exit to indicate if definitely don't restart do restart but not until you know this component has been fixed or you know there's there's um 255 values we can use there to indicate what you know has has happened um and so that's quite a nice way of semi-automating recovery after shutdown. Yeah. Yeah. I mean, you could take it a step further more than just the,

Starting point is 00:51:09 the bite is if you, you know, could somehow manage to like, what's the word I'm looking for? Like touch a file real quick or something that is, you know, even more information. Yeah.

Starting point is 00:51:22 Dependency ordering is also a fun one too, because we have 15 nearly 15,000 components running in production every day or 15,000 running processes and we have to start them up in the right order and we've got a degree of automation there to do it automatically but of course it gets a little bit a little bit difficult sometimes particularly if you're communicating over a shared message bus and it's not entirely sure if this other component is up yet or not um so yeah sometimes it can be a little bit of fun in the morning if something goes down to then figure out exactly the correct sequence to bring things back up again right i sorry bringing back flashbacks

Starting point is 00:52:07 of systems that i worked on that had a very theoretically we could reboot the whole system at once but in reality um we didn't rely on a dhcp server so each device got its own link local address and well i won't go in the details but they would all come up with the same link local address so in that particular environment it would sometimes take like three hours for the system to finish rebooting as they all negotiated what link local address they were going to use anyhow okay well i think we're starting to run a little low on time. Carl, is there anything else you wanted to talk about before we let you go today? Not really, actually.

Starting point is 00:52:50 It's been fun. I'm just having a look through some notes there. Not really. I think the main thing I want to sum up, I guess, is that, I mean, for us, C++ has been awesome. The hard bit is just managing it, picking and choosing the right language features. Yeah, it's certainly challenging to keep code

Starting point is 00:53:13 that's constantly changing, you know, readable and bug-free. I think our modern C++ has been great for that, you know, just to be able to auto- to auto out all of the, you know, iterated declarations. And we can write some really nice code these days, but I think that's actually where the challenge is, is, you know, when you've got big systems that are constantly changing and, yeah,

Starting point is 00:53:37 it's just, that's where the challenge is, is just trying to keep the code sane. But yeah, so far, so far, so good, actually. Yeah, that's my summary of my my life at the moment i guess well and you know since we've i've been or one of us has been asking pretty much every guest that we've had on lately uh it sounds like abi compatibility not a big deal for you you're you would be happy if c++ broke abi um yeah as long as the world doesn't end um i'd have to think about it quite carefully from uh what happens to our servers are we actually relying on a library that we didn't realize about but if we can't do it given that

Starting point is 00:54:20 we compile everything from source we have literally no libraries except for you know glibc and the usual suspects right yeah if we can't do it no one can do it because really we have the simplest um abi constraints we we have none we compile from source right well so far rob i don't think we've had a single guest who has said that breaking abi would break their system so sure and i think we would all agree with car with Carl that we're good with it as long as it doesn't cause the world to end. No one wants to cause the world to end with an ABI break. Our informal poll here of our guests that is going on.

Starting point is 00:54:58 Okay. Well, Carl, where can listeners find you online? And you know, is there anything else you want to plug from optivar are you actively recruiting or anything like that um for me myself online i tend to keep a pretty low profile don't even have a twitter handle um okay optivar is always always hiring

Starting point is 00:55:16 of course so i i think i've ordered my link which i guess you guys will um add to this podcast so yeah octopus um url is there check it out um yeah of course we're always always looking for people interested um uh and yeah that's about it for me awesome okay thanks girl thanks great thanks guys pleasure thanks so much for listening in as we chat about c++ we'd love to hear what you think of the podcast please let us know if we're discussing the stuff you're interested in, or if you have a suggestion for a topic, we'd love to hear about that too. You can email all your thoughts to feedback at cppcast.com. We'd also appreciate if you can like CppCast on Facebook and follow CppCast on Twitter. You can also follow me at Rob W. Irving and Jason at Lefticus on Twitter. We'd also like to thank all our patrons who help

Starting point is 00:56:05 support the show through Patreon. If you'd like to support us on Patreon, you can do so at patreon.com slash cppcast. And of course you can find all that info and the show notes on the podcast website at cppcast.com. Theme music for this episode was provided by

Your Ad Here

CppCast - Trading Systems

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.