PurePerformance - Old Patterns powering modern tech leading to same old performance problems with Taras Tsugrii

Episode Date: May 10, 2021

Have you ever thought about reorganizing data allocation based on production telemetry data? Have you ever thought about shifting compiler budgets to parts of your code that is heavily executed based ...on profiling information captured from your real end users? Whether the answer is yes or no you will be fascinated by Taras Tsugrii, Software Engineer at Facebook, who is sharing his experience on optimizing everything from compilers, to databases, distributed systems or delivery pipelines.If you want more after listening to this episode check out his recent talk at Neotys PAC titled “Old pattern powering modern tech”, subscribe to his substack newsletter, his hashnode blog, or the conference recordings of Performance Summit and Scaling Continuous Delivery.https://www.linkedin.com/in/taras-tsugrii-8117a313/https://www.youtube.com/watch?v=itOCQvk_LAshttps://softwarebits.substack.com/https://softwarebits.hashnode.dev/https://www.youtube.com/channel/UCt50fEvgrEuN9fvya8ujVzAhttps://www.youtube.com/channel/UCWf9HxiBudKLzCFtgAAz8XQ

Transcript
Discussion (0)
Starting point is 00:00:00 It's time for Pure Performance! Get your stopwatches ready, it's time for Pure Performance with Andy Grabner and Brian Wilson. Hello and welcome to another episode of Pure Performance. My name is Brian Wilson and as always my not-so-fantastic co-host Andy Grabner. See how I changed it there, Andy? Yeah, I know we should maybe stick with the initial start where you called me much nicer names. I called you fantastic you fantastic yeah so for everyone listening i messed up the start and i decided to take it out on andy because that's what people do take it out on the people they care about most andy yeah well it's basically it's easy it's easy today because you just saw me have a sip of schnapps so i can easily handle it because i don't
Starting point is 00:01:00 get it understand it anyway just hope you don't start getting violent. I'd like to see you drunk violent, Andy. That'd be interesting. I've never seen that. I don't think that happens. I'm one of the people that is getting more quiet and tired when I get drunk. I probably fall asleep. All right. Good to know.
Starting point is 00:01:18 Good to know. Anyhow, Andy, would you like to introduce themselves would you like to let our guests introduce themselves into your little pre-introduction introduction? I would love to do a little pre-introduction, but I will let him introduce himself because he knows himself better than I know him. But the reason why we have our guest today is because I think I was invited, I was speaking at one of his performance conferences earlier this year, And then he spoke at a NewTisPax event just a couple of weeks ago on the topic, old pattern powering modern tech.
Starting point is 00:01:53 And it is truly fascinating. So I've never seen so much content in such a timeframe. And then what's really the, I mean, the whole presentation is amazing. If you watch it on YouTube, but do me a favor. At the end, when he goes into Q&A with Scott and with Henrik, just a lot of fantastic things that they are discussing. I just took a couple of notes because I just rewatched it. Things like that he mentioned, FBInfer, detecting bad data access patterns through static analysis. He talked about Amazon Code Guru. He talked about his project at Android Fitness
Starting point is 00:02:30 and how they were rebuilding a storage solution, replacing SQLite, where it was like at multiple orders faster than the other system like SQLite, and a lot of other cool things. So really watch the presentation. We'll put it into the into the proceedings of the of the podcast so you can just click on that link but now the question is who is this mysterious guest who who who i guess it's my turn so hello everyone my name is taras i'm a
Starting point is 00:02:58 software engineer at facebook um i work on all sorts of stuff, usually switch teams almost every year. But I guess I've worked on the compilers and toolchains. I worked on the building infrastructure, continuous integration infrastructure, then performance infrastructure, and now I'm working on release infrastructure. And the more I work on all these things, the more I realize that they are pretty much all the same anyways. So you said you switch teams about every year. Is this just out of curiosity?
Starting point is 00:03:35 Is this a regular thing that you do at Facebook where you constantly rotate so you see new things and try out new things? Or is this just you because you like the change? Or how does that work? It usually depends on the people. We do have an option to switch teams fairly flexibly as long as we perform well. But for me personally, I just try to practice kind of like a gradient ascent approach where at regular intervals, I just look around.
Starting point is 00:04:06 I try to identify the directions with the steepest opportunity, like the steepest types of opportunities for having more impact. And I just follow them. That's the reason why, for example, when I started, I thought that, well, what could be more impactful than improving compilers because they power pretty much everything. So you make the compiler better, so you improve everyone's code better. So it scales really, really well. But the problem is that this space has been studied and developed so profoundly that it, for the most part, reached the marginal returns, essentially. fight for even small incremental wins like one or two percent whereas in other spaces for example like build space you can have orders of magnitude improvements if you apply
Starting point is 00:05:13 some latency reduction patterns or you just find better ways to distribute loads or identify incrementalities. Most of these things are not really applicable in compilers. And it's sometimes even sad because oftentimes you are not even allowed to add certain optimizations to compilers just because you have a limited budget. Because obviously nobody wants to wait forever for their build to happen. And we already have to do that like our builds are unfortunately already taking hours and hours even for relatively small mobile applications so we don't want to make them more slow. Basically a trade-off, it's a trade-off between optimization gains in the end,
Starting point is 00:06:08 but also developed productivity. Yeah, I mean, it's actually, it reminds me about one of the awesome talks by Chandler Karuth, who works on the LLVM compilers at Google. And I think the talk was called there is no such thing as zero cost abstractions. And basically the idea is that
Starting point is 00:06:29 somebody always has to pay the price. So you can shift the price, for example, imagine that you would like to improve your runtime performance by applying certain optimizations. And those optimizations, let's say they're NP-complete. But it's small enough, so it's still manageable, but it takes a lot of time. So your runtime performance is going to get better,
Starting point is 00:07:00 but your build time is going to suffer. So you are shifting the cost from the runtime to the build time. So it's like developers now have to pay the cost of the extra build time. So no matter what you do in engineering, there is always somebody who has to pay the price for any improvement. So it's always a matter of budgeting and making sure that you're working on things that make the most sense for the business. And when you say runtime, you're talking about once it's out and users are using it, correct? Yes.
Starting point is 00:07:43 When you execute the application. Execution time. Right, right, right. Because to me, it's nuts to hear you talk about trading off between build time and run time because it's what we refer to as a very first world problem. So many organizations don't pay any attention to build time. Heck, they have horrible pipelines.
Starting point is 00:08:03 They might not even have pipelines. Obviously, Facebook and some of the other bigger players have these very, very advanced pipelines. We have a really pretty advanced pipeline. I think we're like a one-hour deployment. But that is a, you know, it's not something that I think a lot of companies have to even consider because they'd be like, our deployment time, that takes forever anyway. Who cares if we had another hour or something? Because they're not in that phenomenal place you've gotten to where you can push these out.
Starting point is 00:08:32 In fact, when you were talking about slowing down your build time, I thought you were going to say you slow down your build time to like five seconds or something. I didn't realize it was that long. Well, I mean, it depends. Right. But it's just really interesting to see where you can get to
Starting point is 00:08:46 when you put this much effort into that build time because that is how we get to that very agile state, the DevOps-y pipeline and all whatever terms you want to throw in there. But that is a huge consideration of it that I think a lot of people just aren't at yet. But it's amazing to hear that there's now analysis being done of trade-offs between the two because it does cost the business to delay getting things out. So yeah, interesting trade-off that I think a lot of people don't even have on their radar
Starting point is 00:09:15 yet. Well, yeah. And there is even a bigger point about this. So there is this famous Conway's law that I guess roughly can be restated as the organizational structure is reflected in our system architecture. But I think there must be another law that describes how you impact the culture through the infrastructure that you have. So, for example, when your builds are slow, obviously you're going to incentivize people to not build very often, which means that they will probably be making bigger changes
Starting point is 00:09:56 before they test them. They would probably be less likely to test them in general just because you don't get the feedback that you are usually aiming for on a regular basis. So you would much more likely write code before tests. And that means that you would even less likely to write tests at all. It also means that you would be much more reluctant to perform refactorings just because whenever you do even a simple change
Starting point is 00:10:27 like renaming a variable you're going to have to wait potentially maybe an hour to see the result of that refactoring so you disincentivize people to do the right things and it also has a huge implications for performance work obviously because performance work oftentimes is not something that can be seen with the naked eye. You actually have to tinker with the code base. You want to explore different things. You want to try, you know, let's change, let's say, the data structure allocation to be more predictable by maybe specifying initial capacity for your vector or whatnot.
Starting point is 00:11:09 Or maybe it would be better to try using a try-based hashing instead of maybe more standard hash table that is available in most of the libraries. So all this experimentation is discouraged, obviously, when you have inefficiencies in your pipeline. That's why I think it's extremely important to keep this in mind. And it's probably one of the reasons for why go language um has uh became so successful um and in fact according to legends i guess it was one of the reasons why rob pike uh even decided to write go i think the legend goes is that he was waiting for the C++ build to compile.
Starting point is 00:12:08 And while he was waiting, he was like, okay, let me just write a faster language in the meantime. And that's how it happened, I think. There's probably some truth to it. Because, I mean, you can imagine that if you have to wait multiple hours for your builds, you start thinking about other alternative options. And creating a new language with a faster compiler is that option. And in fact, if you look at the maturity of that compiler, it's extremely mature in terms of correctness, but if you compare its generated code with the LLVM, let's say, or GCC, it's not
Starting point is 00:12:52 even remotely comparable. It generates much less efficient code, although things have drastically improved since they've moved to the single decade SSA forms that enabled a lot of more optimizations, but it's still nowhere close to what LLVM or GCC is able to generate. So now listening to the first couple of minutes, right, with this, I guess, so many directions we can take this conversation, because initially i thought you know we just talk about data access patterns because it at newtispec you you i think your claim your famous claim that i also wrote down um you think that new storage is just a faster
Starting point is 00:13:38 magnetic tape and then you kind of go on and explain different access patterns of data. Shall we go down this track or shall we more focus on kind of what we just started with what I think is more on the developer productivity side and making sure we have the right tools and put performance into these tools? Taras, what do you think? I mean, it's certainly up to you, but I think that the main theme for me is that I really love focusing on the things that do not change instead of things that do change.
Starting point is 00:14:15 So instead of chasing the latest frameworks or libraries that are showing up every single day, or maybe even some small tweaks or optimizations that are appearing every day, I really like to focus on fundamentals. The things that have been true for a long time, and that's the reason why they are so widespread. And the data access patterns is one of those things. That's the reason why I mentioned that pretty much all the new storage formats and even memory, which is sometimes very surprising for people, is pretty much still the same
Starting point is 00:14:53 tape. And if you look at the hardware aspects of how the data is accessed, it's always, pretty much always accessed in blocks. So again, there is a lot of similarities between how you work with caches or how you work with SSDs. And that's the reason why sequential access is still a lot faster, no matter what kind of storage you use. Even though people often think that, oh,
Starting point is 00:15:29 but I'm using the latest NVMe device, which is like orders of magnitude faster than the hard drive, and while it's true, there is still an order of magnitude gap between what you get when you access data in a sequential way versus what you get when you access it in a random way. And this, this team is actually as applicable to developer productivity as everything else. I actually believe that there are lots of,
Starting point is 00:16:04 I guess, organizational patterns that can be applied from the distributed computing world to how we build the organizations, like social organizations and the other way around as well. And then to your point, right? I mean, if the way we store data, if I'd say if the latest storage is just using the same mechanisms as we used in the past,
Starting point is 00:16:33 I guess the only way we can change them things fundamentally, and this is also something that I took from your talk, is that you need to figure out how to kind of smartly organize your data. And I think what I wrote down is you have to add your data in a predictable way. So if you want to really speed up, I guess, data access, then you need to understand first what type of data do you have? How do you need this data? How can you store it in a predictable way so that later on you can also read it in a
Starting point is 00:17:00 predictable way and not in a random way because the randomness is exactly what slows you down. Did I get this correctly from the talk? yeah i mean that makes total sense and again you can think of it as an example from the real life where if you have an order in your apartments and you know where all the things are located in advance you you don't have to waste time randomly searching under your rug or whatever when you're looking for your keys, let's say. You just know that they're going to be in a certain place. And this way, you can come up with the most efficient route to get to where you want to get to, and you reduce the inefficiencies.
Starting point is 00:17:46 And the same thing happens when you have a data, because when you access data, it's not like there are lots of things that are helping you to access it in a predictable way. So when you're talking about the level of the CPU, for example, has a branch predictor. So if your branches are more predictable, it means that you will have fewer pipeline stalls and you will get better performance. So if you have sequential data on a file system, let's say your kernel usually is able to enable prefetching effectively, that's going to read more data than you even requested. And in fact, it always reads a little bit more than you request just because the data is usually read in chunks that are called blocks, similar to what happens on the cache level.
Starting point is 00:18:54 So usually you have cache rows that are fetched by the memory management unit. And again, if it's easier to predict what's going to be your next location that you would like to use, the prefetcher is going to be able to help you with that. So it works everywhere, on every level of the stack. The more predictability you get, the more optimizations you can put and fewer inefficiencies you have. And in fact, for example, when you have the binary layout, which is something that most people usually don't even think about when they
Starting point is 00:19:47 think about performance it actually matters a lot how you even lay things out in terms of the machine code that you generate because when you read things from the disk into memory while you're loading your executable, you have to read things in pages. And so the more scattered your data is, the more likely it's going to span multiple pages, which means that you will probably have to have more page folds, and they are not free.
Starting point is 00:20:28 So again, you lose in terms of performance, and it pays off to actually think about how you order your functions in a way that mirrors the order in which they are executed during startup, for example, to improve startup performance. My question to you, do we expect or shall we expect the average developer or the average architect to know all this and take care of it? Do we have tools or do we expect these things to be built into our compilers or whatever?
Starting point is 00:21:04 Do we have tools that at least cover most of these things, like analyzing your current access patterns and with that, then optimize your code exactly based on that? Does this exist? Yeah, there are a bunch of things that are trying to address these issues. So, for example, in compilers, there is usually a profile guidance optimization, which can be used to record the access patterns from the actual use cases, apply this data to make sure that the branches that you have in your code are more predictable. Because, for example, let's say we can see that in production, almost all the time,
Starting point is 00:21:58 the second, like the else branch is taken. So instead of ordering your layout in a way that has something like, you know, where you have if and then you have the machine code for the if branch, you put the else branch instead. And then you can even potentially move the other branch somewhere else to a different page so that you can fit more useful machine code on the initial memory page. So that's for compilers. But there are also a lot of separate tools. Facebook has, I believe, open-sourced Bolt,
Starting point is 00:22:49 and it has a lot of interesting optimizations when it comes to memory layout. And it doesn't cover just the regular function layouts, it also thinks about how to align things in the loops because interestingly, as I mentioned, the cache lines are fetched in blocks and by making sure that you lay instructions in a way that they, for example, end up in the same cache line,
Starting point is 00:23:35 you can, again, improve performance quite a bit. So you have micro-optimizations for the layout as well. They are much more advanced, and I highly recommend checking out the bold paper. You can probably Google it, or I can also share it with you, and so you can include it in the description of the podcast. That would be cool. Hey, but if I hear this correctly,
Starting point is 00:24:02 I mean, these are role optimizations that has to be done based on data in production. Because if you would do it at initial compile time, you could maybe do it based on some test executions. But you always know the challenge of executing tests with realistic data access patterns and stuff like that. That means, are you telling me then that you would then combine this with your profiling in production and then in the next and then you're rebuilding things or you're taking this into consideration the next time you build a new version? How does this work? Yes, yes, exactly. That's how it works. the extra instrumentation, then you feed the inputs to that binary,
Starting point is 00:24:47 which generates the profile. And then you feed that profile into the next compilation, which is going to generate your final binary. And the similar approach can be used for many different ways. Like, for example, for our Android binaries, what we do is we generate order files which specify in which order Java classes have to be loaded to improve startup performance.
Starting point is 00:25:20 But obviously, you don't know in advance what's the best access pattern so what you can use is you can use for example your last profile that worked and then you can ship it to a small percentage of your users get that data to generate a new profile, and then use that profile to generate the final binary that you're going to ship to all the users. So it's somewhat, I guess, similar to machine learning infrastructure where you have a continuous feedback loop to improve on each iteration. So you have backpropagation, essentially. It's like a canary deployment,
Starting point is 00:26:05 but you're deploying that canary with a special compiled version that includes additional telemetry and profiling data to a specific set of users. And that's the exact reason why you need that fast build time, right? As we talked about in the very beginning, because if you're slowing down that pipeline,
Starting point is 00:26:26 it's gonna be much more painful to do all this yeah and in fact this this is this is becoming more popular uh for example google uh is now using a similar approach for their android apps as you know uh Android apps historically have been using a DAX format, which is bytecode, similar to the Java bytecode, but the virtual machines, DAX virtual machine, and then the Dalvik virtual machine, and later on the ART virtual machines, they still operate on the DAX format instead of Java bytecode. But the point is that this code is interpreted.
Starting point is 00:27:09 And there are pros and cons to having bytecode. So there was an effort to create a ahead-of-time compilation where you can precompile your application into the native code, which is then executed. But then you lose a lot of flexibility that you normally get from the bytecode because you can use JIT to improve things. So now we actually have this hybrid approach where they are using real production data to create a profile for your application that they use to generate the native code.
Starting point is 00:27:51 And then you ship that native code to the customers to improve the initial experience, but then you still collect the data that can be used to create a much more tailored profile for your use cases and your application, which you can use to feed into the JIT compiler to get even more optimized version of your application. So again, you can have a very interesting feedback loop on a continuous basis. And that's actually the reason why when I look at the CICD pipeline, I do not see it as just like an interval from point A to point B.
Starting point is 00:28:42 It's more like a circle where you have this continuous loop where you feed production data into the build phase, which is going to produce the new binary that you're going to push to production, and then you enable monitoring. And that's actually the reason why I'm a huge fan of things like Captain because it's an awesome project. And I believe that it just makes it a whole lot easier to create this continuous feedback loop.
Starting point is 00:29:16 Brian, by the way, he said it first. I was not the first one that picked up the word Captain. So when we first started with Captain andy would always work it into the podcast that would be towards the end and like oh and i turned it into a drinking game although i wasn't really drinking um but it's always like when is the point where andy's going to mention captain and you you mentioned it first today so he now I wonder if, obviously, you know, you know, Tarek, we are on the Dynatrace side, we are obviously monitoring production data. We are also heavily involved in open telemetry.
Starting point is 00:29:59 I wonder if this data, do you think people in the observability space, when we talk about more like end-to-end tracing, if they also think about that use case to feed that data back to the build process, or is this not the right data? Do we need, I guess, we need much more granular, much more specific data for making these optimizations than what we typically do in distributed tracing where we pick individual methods and not necessarily individual branches but more like you know how often was this method executed what was the execution time i'm just wondering if we should take this idea as a use case and say and say hey can we use openlemetry data or any type of production tracing data
Starting point is 00:30:47 and also feed it back into the build pipeline for build optimization? Yeah, so that's an interesting question. I actually have some, like even though I'm a huge fan of OpenTelemetry and the infrastructure that it enables, it also has a lot of inefficiencies, which I usually have some brands to talk about. For example, they usually use a lot of strings,
Starting point is 00:31:19 and I believe that we could benefit greatly from using techniques like index tables, where we just map strings into integers, and then we only send those integers over the wire instead of sending strings. So I'm not even going to talk about using something like JSON, because obviously that's off. But going back to your question, I absolutely believe that there is a huge amount of useful information that we can leverage to improve things in terms of performance. Obviously, the more granular data you have, the better.
Starting point is 00:31:57 But even coarse-grained data that you normally get can be very interesting. Like, for example, if you know that, let's say, one of your methods is pretty slow, then maybe you can allocate a bigger budget for the compiler to spend on that function. And I think that this hybrid approach, this data-oriented approach or data-driven approach is the future. Because right now, we have all these budgets allocated statically. And for example, we decided, okay, let's say if the function is longer than this number of instructions, then we do not inline it. And this decision is static and it's global.
Starting point is 00:32:48 So you usually apply it and it doesn't matter whether that function is invoked often or not, you're going to get the optimization or you're not going to get the optimization. And I think it's a very, very brittle approach because first of all, those heuristics, like when they are developed, they may be reasonable for a particular use case, but definitely not for all of them. And things change all the time. What used to be reasonable a year ago may not be necessarily reasonable now because a lot of assumptions have
Starting point is 00:33:25 changed. So I believe that the observability data that we get, even at the level of, let's say, how much time a particular function is invoked, is invaluable because this way you know where you want to focus your attention. You know that this function is really critical or maybe like even if the function is invoked once, but it took a lot of time, but it was also on a critical path. It's important to know. This is an information that you can feed
Starting point is 00:33:58 not only to the compiler, but also surface it in your IDE, let's say, so that developers, when they invoke the functions, imagine that in your VS code or whatever editor you're using, you're going to see this information that the function, for example, is going to be highlighted with red, identifying that it's expensive. So be careful when you use this function because it's probably going to impact your performance or you can use all sorts of other interesting annotations where you can say that
Starting point is 00:34:31 this function is going to have some i o operations so when you're developing a time sensitive application or real-time, you really want to make sure that you understand that your dependencies do not allocate. They do not use IO because otherwise you lose control over predictability of your own function. So I'm a huge fan of the observability space. I believe that we definitely need to close this gap between the vast amount of data we get from it and how we use it. We just don't take advantage of
Starting point is 00:35:13 all the data that we generate, which is a shame. I think there is a lot of insight that we can generate. And kind of to expand on your VS Code example, if I'm a developer and I'm in my function, wouldn't it be cool if I would also see
Starting point is 00:35:30 like a live graph next to it that shows me live telemetry data from let's say from production where I see how often is my method called? What's the P95 response time of that method? What's my kind of performance budget that i have overall in the use case that i'm executing if i now make a code change then my visual studio code or whatever id i'm using could say hey this is probably going to increase your response time by 10 10 means
Starting point is 00:35:59 you're exceeding your budget or it means 10% times, 1 million times your function gets called in an hour, right? You get the idea. So that would be fantastic. Yeah, and I think that's exactly what Amazon Code Guru is trying to do. And I guess some of the things we also have at Facebook, even at the source level. So we use a diffusion fabricator, basically, for code reviews and viewing our code.
Starting point is 00:36:36 And we can see the percentage of time a particular hack branch was taking. And it's useful because, again, if you understand that a certain area of code is pretty much never executed, then you can think about whether it makes sense to invest in improving it. Because if it's a dead code, then maybe you don't care. But if it's a really, really hard path and you know that there is some,
Starting point is 00:37:09 even simple optimizations you can apply, you can just go ahead and do it because you know that it's going to have some meaningful impact. I think even in that example, getting rid of the dead code would be beneficial as well too. Absolutely.
Starting point is 00:37:25 And that's actually another, I guess, fairly simple use of the observability, that if you identify certain areas of your code as dead because they are never executed in production, then you can potentially feed this information to developers to make sure that they know that, oh, this code is dead, so you can just feed this information to developers to make sure that they know that, oh, this code is dead, so you can just remove it. Because compilers, they do have techniques like tree shaking or whatnot to find unreachable code. But the problem is that not all code that is reachable is executed.
Starting point is 00:38:09 So using production data and observability, it would help a lot in these efforts. Coming back to the whole idea of predicting the way you write your data, is there, especially for developers that are listening in now, that find this fascinating because maybe they hear about this the first time. Is there anything like a listener can do and say, hey, there are three best practices on how you can do this. Or there's like, check out this video.
Starting point is 00:38:41 Obviously, check out your talk from Newtis Peck, but is there something that you suggest every developer that wants to optimize the way they access and store the data, what they should know? What should every developer know that wants to be more efficient with the data? Well, I usually call this as a mechanical sympathy principle. So you usually want to really understand the APIs that you're using, because all of them are developed with a certain set of assumptions. And if you know those assumptions, you are able to leverage them in the most efficient way.
Starting point is 00:39:30 I don't think there is a single advice that can be applied to solve majority of the issues. But in practice, you can think about techniques like parallel arrays, where instead of arrays of structs, for example, you use a struct of arrays. And it's a common pattern that is used in game development. But basically, the idea is that you usually try to pack the data that you're going to access close together. So imagine that, let's say, you have a method that is, I don't know,
Starting point is 00:40:21 calculating a total score of a game. So you pass an object, like let's say a game, that has a list of maybe tournaments or whatnot. And then a tournament has a bunch of fields like a name or a list of players that participated and other things. And then it also has a score field. So what you would normally have is you would have a loop that iterates over all the tournaments and access that score to aggregate it.
Starting point is 00:41:00 But the problem with this approach is that your tournament struct or class or whatnot is going to have a lot of extra data in addition to the single field that you're interested in. And it's going to pollute your cache and it's going to drastically decrease efficiency of the IO that you have to perform to get the data to the CPU. So what you can do instead is imagine that you have again the same game class, but instead of having a list of tournaments, you have an array of all tournament scores you have an array of all users for all the tournaments and like all those things so uh this way when you iterate you're going to be iterating over the array with just the scores which means that you're going to be iterating over the array with just the scores, which means that you're going to be able to fetch a bunch of scores
Starting point is 00:42:07 into the single cache line. And on top of that, you would be able to leverage SIMD instructions, so vectorized instructions, a lot better, because you'd be able to just load them into a single register, perform a bunch of additions or multiplications or whatnot as a single instruction. So you have multiple wins for that. And so it's just one of the examples,
Starting point is 00:42:38 but obviously the fewer branches you have the better. And that usually means that it's usually very useful to know all the properties of your data in advance. It's, I guess, imagine that you have an input, an array of numbers, and you are supposed to check whether there is a certain number in it. If you know nothing about that array, then you would probably have to use something like a linear scan. But if you know that it's, let's say, sorted, then you'd be able to do a binary search,
Starting point is 00:43:23 which is presumably much faster than a linear scan. So again, just knowing this is extremely important. So we frequently try to think that obstructions are created to just forget about all of our dependencies and just, you know, live in our comfy comfort zone, not caring about anything else. And I believe that abstractions, I actually like how Dijkstra used to think about abstractions. He was basically thinking of them as a way to
Starting point is 00:44:04 create a way to reason about things extremely precisely. So it's not about vagueness, it's not about ignorance, it's the opposite. a higher level of understanding your properties. And so just thinking about what exactly you know about the data that you are working on. What do you know about the type of devices that you work with? Is it the network? Is it the hard drive? Because there used to be even a movement, an attempt to create all these RPC abstractions that create an illusion for you that you're just doing a local function invocation. And that illusion is extremely harmful because you really don't want to
Starting point is 00:45:08 have a lot of RPC calls in, let's say, your tight loop. But if you don't even know that it's an RPC call under the hood, then there would be absolutely no reason for you not to use it. And I guess that's where the observability comes into picture. And it's probably something that you can point out these kind of issues and say that, oh, looks like you have an IPC call in your loop.
Starting point is 00:45:42 So maybe you want to create a batch call instead of having these single ones. So I guess just knowing your data, knowing your dependencies is extremely helpful to make sure that you get the
Starting point is 00:46:00 best of performance. And also thinking about the use case. Because again, oftentimes we try to overgeneralize things. And I guess if you look at the database space, you can probably notice how many different databases are available these days. And the reason is that they are trying to address a very particular use case. And for example, if you have, let's say, InfluxDB,
Starting point is 00:46:34 it's really optimized for time series processing. And it's really, really good. It has a lot of encodings that make the storage efficient, processing efficient. But it's not a competitor to, let, say, something like CockroachDB, which has a completely different set of use cases in mind. And that actually goes back to that example about the Android fitness optimization where replacing SQLite with manually built database storage was extremely
Starting point is 00:47:09 beneficial just because we don't need all that generality that SQLite provides. We really wanted to have a really fast way to append data and really fast way to scan all the data to create an aggregate. And I guess this is one of the main ideas of the presentation was that whenever possible, if you have to do an IO, and you pretty much always have to do an IO because even if, in my opinion, even if you use memory it's an i o you really benefit from append-only data structures where you always append just to the end
Starting point is 00:47:54 without having to uh look for a place to insert your items somewhere in the middle and then shift things around or even spend multiple pages, memory pages. So that's extremely beneficial. You can do all sorts of optimizations and you can build a lot of interesting data structures buffers which power both kernels with things like IO ring, but they also power the fastest the exchanges like Disruptor for example is leveraging the ring buffers effectively to reduce the number of locations that you have to have and also reduce contention that usually is inherent to most other data structures. And again, if you have a pend-only data structure,
Starting point is 00:49:00 it's fairly easy to implement it without any logs at all. So it's very friendly for concurrent environments, just because you usually have to just use a single atomic increment to allocate the block of memory you're going to write to and other writers at the same time are able to just use the other memory. And so you have fast advance, very efficient because the data is packed very well and also lock-free, which is great for avoiding the pitfalls of Amdahl's law
Starting point is 00:49:48 and its better version called universal scalability law. This is amazing. I think you were just, I mean, it's fascinating to learn all these things. Obviously, you're deep into this and you have a lot of experience. For me, a lot of this is new, but it's great. I also want to say, don't give everything away because you want to make sure that more people now watch your talk from the New to Spec event.
Starting point is 00:50:19 Also, an interesting comment that I watched on the recording when you started the Q&A. Kind of Henrik was saying, I don't have any questions right now because first I need to read up everything. And then you said it seems you haven't done a good job if people now don't know what to ask because you have this analogy of you need to, when you explain, when you teach, you need to explain in very simple terms that people understand without them having to go back to the material. But you did a phenomenal job. I took a lot of notes.
Starting point is 00:50:50 Even I understood it. See, even Brian understood it. That's always a good benchmark. I would love eventually, not eventually, definitely to have you back because there's so many, I guess, more topics in that area but i would also as you know all the work we do with captain around continuous delivery and automation there it would be fascinating to have a conversation with you
Starting point is 00:51:15 as well about this topic i know you're very busy but i would love to have you back to talk about even more of this stuff and especially about continuous delivery. Yeah. Thank you so much for the kind words. I really appreciate it. I think I pretty much haven't covered even a small fraction of the things that I covered during the talk. And I'm very sorry for being so scatterbrained. It's just, there are so many awesome topics. Like this is such a vast space that it's really easy to get off the topic
Starting point is 00:51:54 and just, you know, cover all sorts of things. Just because I really love this area. I believe that it's super beneficial to build this mechanical sympathy and just learn about how things work. I think it's probably one of the reasons why people are interested in engineering in general, just because they love to tinker with things and nothing forces you
Starting point is 00:52:26 to understand the hardware and the software and their interactions as working on performance just because they are extremely related like you cannot think about performance and just in pure software or in pure hardware. You have to think about how they interact. And I guess you mentioned where else you can find some of the patterns. I occasionally post some of the patterns that I find useful on the LinkedIn. And also I just recently created the sub stack where I published some of my thoughts. So if anyone is interested,
Starting point is 00:53:14 they are very welcome to subscribe. Yeah, we'll do that. And we definitely make sure to link your LinkedIn profile. I also want to mention that you have hosted the performance summit, right? I that you have hosted the Performance Summit. I think you did it the second time or third time. I don't know how often, how many? I think I hosted probably like five.
Starting point is 00:53:34 And the last one was hosted by my colleague here in London. And we will probably have another one in a few months. But I am now also organizing the Scaling News Delivery Summit. So we got DevSecPerfOps. Kidding. So what's really interesting, Andy, too, is that people like you and I who come from the performance testing and engineering background of that side, this is like a whole new world of it. And I think just that I was thinking about this quite often or quite a lot during the podcast today.
Starting point is 00:54:25 And at the very end, I forget which one of you mentioned it, but it was the idea of when you get the engineers to think about performance, where I took that is that's when you get a whole different scale of it. We're always looking at response time. We're looking at some things like some code efficiency and everything. But when you talk about disk access and block storage and things like that that are way beyond the kind of areas that we would probably think about, we can't even observe that too well.
Starting point is 00:54:44 It's just fascinating how much further you can take all this. And I think the biggest hurdle, and if we do have you back on, I'd love to get some ideas from you, because I know we heard some from Garenka when she was on. I would love to get some ideas from you. How do you get the engineers in places that are not like Facebook to care about performance? What incentivizes them?
Starting point is 00:55:07 Besides, hey, your code will run well? Most of them are, a lot of developers are just struggling to get their code out the door. So how is it that we can make this more attractive to them, more mutually beneficial for them and make them care about it? I think that's one of the biggest hurdles. Because I think a lot of people have it in their mind. I think some of it is around the tool sets and what can be brought
Starting point is 00:55:28 to them to point out and help point out. But just seeing, again, for a future show, because we're out of time anyway, how we get developers to care about performance besides whipping them. Yeah, I mean, I can, I guess,
Starting point is 00:55:43 give a very, very short answer short answer obviously it's a complex topic and i have a lot of interesting ideas about how we can do that but if i had to give just one advice then i would just tell uh that the insights that we generate should speak the same language as developers. Because instead of talking about some microseconds or allocations, you really want to explain what the actual impact is going to be in production. So let's say your page is going to load in one second instead of 100 milliseconds, or your service is going to load much slower, and because of this, you're going to lose a certain amount of revenue. So if you can if
Starting point is 00:56:45 you can uh boil things down to revenue it's probably the easiest one like oh your change is gonna make us lose one million dollars that's a good incentive to actually think about it but if you just talk about some abstract units then it's like i don't know if it's relevant or not it's hard to tell awesome good hey again thank you so much we looking forward if you have any additional links you want to send us over then we'll do this after the with them at the recording are there any other events coming up where you're speaking because um sometimes not yet uh as i mentioned i think that we will hopefully have another performance summit probably somewhere in june or july uh so that's probably gonna be the next event but i have been uh absolutely honored to be featured on your podcast. So if you ever want to invite me again, I would love to.
Starting point is 00:57:51 It's been an absolute pleasure. And I think there are so many more things to talk about, especially how awesome tools like Captain can help us to drive a lot of performance and reliability efficiencies into our culture and shape the culture in general. Awesome. Thank you so very much. We will look forward to having you back on.
Starting point is 00:58:21 And thanks for all of our listeners for listening all the time. And yeah, have a wonderful day yeah thank you so much thank you bye andy bye

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.