PurePerformance - Old Patterns powering modern tech leading to same old performance problems with Taras Tsugrii
Episode Date: May 10, 2021Have you ever thought about reorganizing data allocation based on production telemetry data? Have you ever thought about shifting compiler budgets to parts of your code that is heavily executed based ...on profiling information captured from your real end users? Whether the answer is yes or no you will be fascinated by Taras Tsugrii, Software Engineer at Facebook, who is sharing his experience on optimizing everything from compilers, to databases, distributed systems or delivery pipelines.If you want more after listening to this episode check out his recent talk at Neotys PAC titled “Old pattern powering modern tech”, subscribe to his substack newsletter, his hashnode blog, or the conference recordings of Performance Summit and Scaling Continuous Delivery.https://www.linkedin.com/in/taras-tsugrii-8117a313/https://www.youtube.com/watch?v=itOCQvk_LAshttps://softwarebits.substack.com/https://softwarebits.hashnode.dev/https://www.youtube.com/channel/UCt50fEvgrEuN9fvya8ujVzAhttps://www.youtube.com/channel/UCWf9HxiBudKLzCFtgAAz8XQ
Transcript
Discussion (0)
It's time for Pure Performance!
Get your stopwatches ready, it's time for Pure Performance with Andy Grabner and Brian Wilson.
Hello and welcome to another episode of Pure Performance.
My name is Brian Wilson and as always my not-so-fantastic co-host Andy Grabner.
See how I changed it there, Andy?
Yeah, I know we should maybe stick with the initial start where you called me much nicer names. I called you fantastic you fantastic yeah so for everyone listening i messed up the start and i decided to take it out on andy because that's what people do
take it out on the people they care about most andy yeah well it's basically it's easy it's
easy today because you just saw me have a sip of schnapps so i can easily handle it because i don't
get it understand it anyway just hope you don't start getting violent. I'd like to see you drunk violent, Andy.
That'd be interesting.
I've never seen that.
I don't think that happens.
I'm one of the people that is getting more quiet and tired when I get drunk.
I probably fall asleep.
All right.
Good to know.
Good to know.
Anyhow, Andy, would you like to introduce themselves would you like to let our guests introduce themselves
into your little pre-introduction introduction?
I would love to do a little pre-introduction, but I will let him introduce himself because
he knows himself better than I know him.
But the reason why we have our guest today is because I think I was invited, I was speaking
at one of his performance conferences earlier this year, And then he spoke at a NewTisPax event just a couple of weeks ago on the topic,
old pattern powering modern tech.
And it is truly fascinating.
So I've never seen so much content in such a timeframe.
And then what's really the, I mean, the whole presentation is amazing.
If you watch it on YouTube, but do me a favor.
At the end, when he goes into Q&A with Scott and with Henrik, just a lot of fantastic things that they are discussing.
I just took a couple of notes because I just rewatched it.
Things like that he mentioned, FBInfer, detecting bad data access patterns through static analysis. He talked about Amazon Code Guru.
He talked about his project at Android Fitness
and how they were rebuilding a storage solution,
replacing SQLite, where it was like at multiple orders
faster than the other system like SQLite,
and a lot of other cool things.
So really watch the presentation.
We'll put it into the into the
proceedings of the of the podcast so you can just click on that link but now the question is who is
this mysterious guest who who who i guess it's my turn so hello everyone my name is taras i'm a
software engineer at facebook um i work on all sorts of stuff, usually switch teams almost every year.
But I guess I've worked on the compilers and toolchains.
I worked on the building infrastructure, continuous integration infrastructure, then performance
infrastructure, and now I'm working on release infrastructure.
And the more I work on all these things,
the more I realize that they are pretty much all the same anyways.
So you said you switch teams about every year.
Is this just out of curiosity?
Is this a regular thing that you do at Facebook where you constantly rotate
so you see new things and try out new things?
Or is this just you because you like the change?
Or how does that work?
It usually depends on the people.
We do have an option to switch teams fairly flexibly as long as we perform well.
But for me personally, I just try to practice kind of like a gradient ascent approach where at regular intervals,
I just look around.
I try to identify the directions with the steepest opportunity,
like the steepest types of opportunities for having more impact.
And I just follow them.
That's the reason why, for example, when I started,
I thought that, well, what could be more impactful than improving compilers because they power pretty
much everything. So you make the compiler better, so you improve everyone's code better. So it
scales really, really well. But the problem is that this space has been studied and developed so profoundly that it, for the most part, reached the marginal returns, essentially. fight for even small incremental wins like one or two percent whereas in other spaces
for example like build space you can have orders of magnitude improvements if you apply
some latency reduction patterns or you just find better ways to distribute loads or identify incrementalities.
Most of these things are not really applicable in compilers.
And it's sometimes even sad because oftentimes you are not even allowed to add certain optimizations
to compilers just because you have a limited budget.
Because obviously nobody wants to wait forever for their build to happen.
And we already have to do that like our builds are unfortunately already taking hours and hours
even for relatively small mobile applications so we don't want to make them more slow.
Basically a trade-off, it's a trade-off between optimization gains in the end,
but also developed productivity.
Yeah, I mean, it's actually,
it reminds me about one of the awesome talks
by Chandler Karuth,
who works on the LLVM compilers at Google.
And I think the talk was called
there is no such thing as zero cost abstractions. And basically
the idea is that
somebody always has to pay the price. So you can shift
the price, for example, imagine that you would like to improve your runtime
performance by applying
certain optimizations.
And those optimizations, let's say they're NP-complete.
But it's small enough, so it's still manageable,
but it takes a lot of time.
So your runtime performance is going to get better,
but your build time is going to suffer.
So you are shifting the cost from the runtime to the build time.
So it's like developers now have to pay the cost of the extra build time.
So no matter what you do in engineering,
there is always somebody who has to pay the price for any improvement.
So it's always a matter of budgeting and making sure that you're working on things that make the most sense for the business.
And when you say runtime, you're talking about once it's out and users are using it, correct?
Yes.
When you execute the application.
Execution time.
Right, right, right.
Because to me, it's nuts to hear you talk about
trading off between build time and run time
because it's what we refer to as a very first world problem.
So many organizations don't pay any attention to build time.
Heck, they have horrible pipelines.
They might not even have pipelines.
Obviously, Facebook and some of the other bigger players have these very, very advanced pipelines.
We have a really pretty advanced pipeline. I think we're like a one-hour deployment.
But that is a, you know, it's not something that I think a lot of companies have to even consider
because they'd be like, our deployment time, that takes forever anyway.
Who cares if we had another hour or something?
Because they're not in that phenomenal place
you've gotten to where you can push these out.
In fact, when you were talking about slowing down
your build time, I thought you were going to say
you slow down your build time to like five seconds
or something.
I didn't realize it was that long.
Well, I mean, it depends.
Right.
But it's just really interesting to see where you can get to
when you put this much effort into that build time
because that is how we get to that very agile state,
the DevOps-y pipeline and all whatever terms you want to throw in there.
But that is a huge consideration of it
that I think a lot of people just aren't at yet.
But it's amazing to hear that there's now analysis being done of trade-offs between
the two because it does cost the business to delay getting things out.
So yeah, interesting trade-off that I think a lot of people don't even have on their radar
yet.
Well, yeah.
And there is even a bigger point about this.
So there is this famous Conway's law that I guess roughly can be restated as the organizational structure is reflected in our system architecture.
But I think there must be another law that describes how you impact the culture through the infrastructure that you have.
So, for example, when your builds are slow,
obviously you're going to incentivize people to not build very often,
which means that they will probably be making bigger changes
before they test them.
They would probably be less likely to test them in general
just because you don't get the feedback
that you are usually aiming for on a regular basis.
So you would much more likely write code before tests.
And that means that you would even less likely to write tests at all.
It also means that you would be much more reluctant to perform refactorings
just because whenever you do even a simple change
like renaming a variable you're going to have to wait potentially maybe an hour to see the result
of that refactoring so you disincentivize people to do the right things and it also has a huge
implications for performance work obviously because performance work oftentimes is not something that can be seen with the naked eye.
You actually have to tinker with the code base.
You want to explore different things.
You want to try, you know, let's change, let's say, the data structure allocation
to be more predictable by maybe specifying initial capacity
for your vector or whatnot.
Or maybe it would be better to try using a try-based hashing
instead of maybe more standard hash table that is available in most of the libraries.
So all this experimentation is discouraged, obviously,
when you have inefficiencies in your pipeline.
That's why I think it's extremely important to keep this in mind.
And it's probably one of the reasons for why go language um has uh became so successful
um and in fact according to legends i guess it was one of the reasons why rob pike uh even decided to
write go i think the legend goes is that he was waiting for the C++ build to compile.
And while he was waiting, he was like, okay, let me just write a faster language in the meantime.
And that's how it happened, I think.
There's probably some truth to it.
Because, I mean, you can imagine that if you have to wait multiple hours for your builds,
you start thinking about other alternative options.
And creating a new language with a faster compiler is that option.
And in fact, if you look at the maturity of that compiler,
it's extremely mature in terms of correctness, but if you compare its generated code with the LLVM, let's say, or GCC, it's not
even remotely comparable. It generates much less efficient code, although things have drastically
improved since they've moved to the single decade SSA forms that enabled a lot of more optimizations,
but it's still nowhere close to what LLVM or GCC is able to generate.
So now listening to the first couple of minutes, right,
with this, I guess, so many directions we can take this conversation,
because initially i thought you
know we just talk about data access patterns because it at newtispec you you i think your
claim your famous claim that i also wrote down um you think that new storage is just a faster
magnetic tape and then you kind of go on and explain different access patterns of data. Shall we go down this track or shall we more focus on kind of what we just started
with what I think is more on the developer productivity side
and making sure we have the right tools and put performance into these tools?
Taras, what do you think?
I mean, it's certainly up to you, but I think that the main theme for me
is that I really love focusing
on the things that do not change
instead of things that do change.
So instead of chasing the latest frameworks
or libraries that are showing up every single day,
or maybe even some small tweaks or optimizations
that are appearing every day,
I really like to focus on fundamentals. The things that have been true for a long time,
and that's the reason why they are so widespread. And the data access patterns is one of those
things. That's the reason why I mentioned that pretty much all the new storage formats and
even memory, which is sometimes very surprising for people, is pretty much still the same
tape. And if you look at the hardware aspects of how the data is accessed, it's always, pretty much always accessed in blocks.
So again, there is a lot of similarities
between how you work with caches
or how you work with SSDs.
And that's the reason why sequential access
is still a lot faster,
no matter what kind of storage you use.
Even though people often think that, oh,
but I'm using the latest NVMe device, which is like orders of magnitude faster
than the hard drive, and while it's true, there is still an order of magnitude gap between what you get when you access
data in a sequential way versus what you get when you access it in a random
way.
And this,
this team is actually as applicable to developer productivity as everything
else.
I actually believe that there are lots of,
I guess,
organizational patterns that can be applied from the distributed computing
world to how we build the organizations,
like social organizations and the other way around as well.
And then to your point, right?
I mean, if the way we store data,
if I'd say if the latest storage is just using the same mechanisms
as we used in the past,
I guess the only way we can change them things fundamentally,
and this is also something that I took from your talk,
is that you need to figure out how to kind of smartly organize your data.
And I think what I wrote down is you have to add your data in a predictable way.
So if you want to really speed up, I guess, data access, then you need to understand first
what type of data do you have?
How do you need this data?
How can you store it in a predictable way so that later on you can also read it in a
predictable way and not in a random way because the randomness is exactly what slows you down.
Did I get this correctly from the talk? yeah i mean that makes total sense and again
you can think of it as an example from the real life where if you have an order in your apartments
and you know where all the things are located in advance you you don't have to waste time randomly searching
under your rug or whatever when you're looking for your keys, let's say.
You just know that they're going to be in a certain place.
And this way, you can come up with the most efficient route to get to where you want to
get to, and you reduce the inefficiencies.
And the same thing happens when you have a data, because when you access data,
it's not like there are lots of things that are helping you to access it in a predictable way.
So when you're talking about the level of the CPU, for example, has a branch predictor.
So if your branches are more predictable, it means that you will have fewer pipeline stalls and you will get better performance. So if you have sequential data on a file system,
let's say your kernel usually is able to enable prefetching effectively, that's going to
read more data than you even requested. And in fact, it always reads a little bit more than you request just because the data is usually read in chunks
that are called blocks,
similar to what happens on the cache level.
So usually you have cache rows
that are fetched by the memory management unit.
And again, if it's easier to predict what's going to be your
next location that you would like to use, the prefetcher is going to be able to help you with that. So it works everywhere, on every level of the stack.
The more predictability you get, the more optimizations you can put
and fewer inefficiencies you have.
And in fact, for example, when you have the binary layout,
which is something that most people usually don't even think about when they
think about performance it actually matters a lot how you even lay things out in terms of
the machine code that you generate because when you read things from the disk into memory
while you're loading your executable,
you have to read things in pages.
And so the more scattered your data is,
the more likely it's going to span multiple pages,
which means that you will probably have to have more page folds,
and they are not free.
So again, you lose in terms of performance,
and it pays off to actually think about how you order your functions
in a way that mirrors the order in which they are executed
during startup, for example,
to improve startup performance.
My question to you, do we expect or shall we expect the average developer or the average architect to know all this and take care of it?
Do we have tools or do we expect these things to be built into our compilers
or whatever?
Do we have tools that at least cover most of these things,
like analyzing your current access patterns and with that,
then optimize your code exactly based on that?
Does this exist?
Yeah, there are a bunch of things that are trying to address these issues.
So, for example, in compilers, there is usually a profile guidance
optimization, which can be used to record the access patterns from the actual use cases, apply this data to make sure that the branches that you have in your code are more predictable.
Because, for example, let's say we can see that in production, almost all the time,
the second, like the else branch is taken. So instead of ordering your layout in a way that has something like,
you know, where you have if and then you have the machine code
for the if branch, you put the else branch instead.
And then you can even potentially move the other branch somewhere else to a different page so that you can fit
more useful machine code on the initial memory page.
So that's for compilers.
But there are also a lot of separate tools.
Facebook has, I believe, open-sourced Bolt,
and it has a lot of interesting optimizations when it comes to memory layout.
And it doesn't cover just the regular function layouts, it also
thinks about how to align things in the loops
because interestingly, as I
mentioned, the cache lines
are fetched in blocks and
by making sure that you lay instructions in a way that they,
for example, end up in the same cache line,
you can, again, improve performance quite a bit.
So you have micro-optimizations for the layout as well.
They are much more advanced,
and I highly recommend checking out the bold paper.
You can probably Google it, or I can also share it with you,
and so you can include it in the description of the podcast.
That would be cool.
Hey, but if I hear this correctly,
I mean, these are role optimizations
that has to be done based on data in production.
Because if you would do it at initial compile time, you could maybe do it based on some test executions.
But you always know the challenge of executing tests with realistic data access patterns and stuff like that.
That means, are you telling me then that you would then combine this with your profiling in production and then in the next and then you're rebuilding things or you're taking this into consideration the next time you build a new version?
How does this work?
Yes, yes, exactly.
That's how it works. the extra instrumentation, then you feed the inputs to that binary,
which generates the profile.
And then you feed that profile into the next compilation,
which is going to generate your final binary.
And the similar approach can be used for many different ways.
Like, for example, for our Android binaries,
what we do is we generate order files
which specify in which order Java classes have to be loaded
to improve startup performance.
But obviously, you don't know in advance what's the best access pattern so what
you can use is you can use for example your last profile that worked and then you can ship it to a
small percentage of your users get that data to generate a new profile, and then use that profile to generate the final binary
that you're going to ship to all the users.
So it's somewhat, I guess, similar to machine learning infrastructure
where you have a continuous feedback loop to improve on each iteration.
So you have backpropagation, essentially.
It's like a canary deployment,
but you're deploying that canary
with a special compiled version
that includes additional telemetry and profiling data
to a specific set of users.
And that's the exact reason
why you need that fast build time, right?
As we talked about in the very beginning,
because if you're slowing down that pipeline,
it's gonna be much more painful to do all this yeah and in fact this this is this is becoming more popular uh for example google uh is now
using a similar approach for their android apps as you know uh Android apps historically have been using a DAX format,
which is bytecode, similar to the Java bytecode,
but the virtual machines, DAX virtual machine,
and then the Dalvik virtual machine,
and later on the ART virtual machines,
they still operate on the DAX format instead of Java bytecode.
But the point is that this code is interpreted.
And there are pros and cons to having bytecode.
So there was an effort to create a ahead-of-time compilation
where you can precompile your application into the native code,
which is then executed.
But then you lose a lot of flexibility that you normally get from the bytecode
because you can use JIT to improve things.
So now we actually have this hybrid approach where they are using real production data
to create a profile for your application that they use to generate the native code.
And then you ship that native code to the customers to improve the initial experience, but then you still collect the data that can be used to create a much more tailored profile
for your use cases and your application,
which you can use to feed into the JIT compiler
to get even more optimized version of your application.
So again, you can have a very interesting feedback loop
on a continuous basis.
And that's actually the reason why when I look at the CICD pipeline,
I do not see it as just like an interval from point A to point B.
It's more like a circle where you have this continuous loop
where you feed production data into the build phase,
which is going to produce the new binary
that you're going to push to production,
and then you enable monitoring.
And that's actually the reason why I'm a huge fan of things like Captain
because it's an awesome project.
And I believe that it just makes it a whole lot easier to create this continuous feedback loop.
Brian, by the way, he said it first.
I was not the first one that picked up the word Captain.
So when we first started with Captain andy would always work it into
the podcast that would be towards the end and like oh and i turned it into a drinking game although i
wasn't really drinking um but it's always like when is the point where andy's going to mention
captain and you you mentioned it first today so he now I wonder if, obviously, you know, you know, Tarek, we are
on the Dynatrace side, we are obviously monitoring production data.
We are also heavily involved in open telemetry.
I wonder if this data, do you think people in the observability space, when we talk about
more like end-to-end tracing, if they also think about that use case to feed that data
back to the build process, or is this not the right data?
Do we need, I guess, we need much more granular, much more specific data for making these optimizations
than what we typically do in distributed tracing where we pick individual
methods and not necessarily individual branches but more like you know how often was this method
executed what was the execution time i'm just wondering if we should take this idea as a use
case and say and say hey can we use openlemetry data or any type of production tracing data
and also feed it back into the build pipeline
for build optimization?
Yeah, so that's an interesting question.
I actually have some,
like even though I'm a huge fan of OpenTelemetry
and the infrastructure that it enables,
it also has a lot of inefficiencies, which I usually have some brands to talk about.
For example, they usually use a lot of strings,
and I believe that we could benefit greatly from using techniques like index tables,
where we just map strings into integers,
and then we only send those integers over the wire instead of sending strings.
So I'm not even going to talk about using something like JSON,
because obviously that's off.
But going back to your question, I absolutely believe that there is a huge amount of useful information
that we can leverage to improve things in terms of performance.
Obviously, the more granular data you have, the better.
But even coarse-grained data that you normally get can be very interesting.
Like, for example, if you know that,
let's say, one of your methods is pretty slow, then maybe you can allocate a bigger budget for
the compiler to spend on that function. And I think that this hybrid approach, this data-oriented
approach or data-driven approach is the future. Because right
now, we have all these budgets allocated statically. And for example, we decided, okay,
let's say if the function is longer than this number of instructions, then we do not inline it.
And this decision is static and it's global.
So you usually apply it and it doesn't matter whether that function is invoked often or not,
you're going to get the optimization
or you're not going to get the optimization.
And I think it's a very, very brittle approach
because first of all, those heuristics, like when they are developed,
they may be reasonable for a particular use case, but definitely not for all of them.
And things change all the time. What used to be reasonable a year ago may not be necessarily
reasonable now because a lot of assumptions have
changed. So I believe that the observability data that we get, even at the level of, let's say,
how much time a particular function is invoked, is invaluable because this way you know where
you want to focus your attention. You know that this function is really critical
or maybe like even if the function is invoked once,
but it took a lot of time,
but it was also on a critical path.
It's important to know.
This is an information that you can feed
not only to the compiler,
but also surface it in your IDE, let's say,
so that developers, when they invoke the functions,
imagine that in your VS code or whatever editor you're using,
you're going to see this information that the function, for example,
is going to be highlighted with red, identifying that it's expensive.
So be careful when you use this function
because it's probably going to impact your performance or you can use all sorts of other interesting annotations where you can say that
this function is going to have some i o operations so when you're developing
a time sensitive application or real-time, you really want to make sure that you understand
that your dependencies do not allocate. They do not use IO because otherwise you lose control
over predictability of your own function. So I'm a huge fan of the observability space. I believe that we definitely need to close
this gap between the
vast amount of data we get from it
and how we use it. We just
don't take advantage of
all the data that we generate, which is
a shame. I think there
is a lot of insight that we can generate.
And
kind of to expand on your
VS Code example,
if I'm a developer and I'm in my function,
wouldn't it be cool if I would also see
like a live graph next to it
that shows me live telemetry data
from let's say from production
where I see how often is my method called?
What's the P95 response time of that method?
What's my kind of performance budget that i have overall in the
use case that i'm executing if i now make a code change then my visual studio code or whatever id
i'm using could say hey this is probably going to increase your response time by 10 10 means
you're exceeding your budget or it means 10% times,
1 million times your function gets called in an hour, right?
You get the idea.
So that would be fantastic.
Yeah, and I think that's exactly what Amazon Code Guru is trying to do.
And I guess some of the things we also have at Facebook,
even at the source level.
So we use a diffusion fabricator, basically, for code reviews and viewing our code.
And we can see the percentage of time a particular hack branch was taking.
And it's useful because, again,
if you understand that a certain area of code is pretty much never executed,
then you can think about whether it makes sense
to invest in improving it.
Because if it's a dead code, then maybe you don't care.
But if it's a really, really hard path
and you know that there is some,
even simple optimizations you can apply,
you can just go ahead and do it
because you know that it's going to have
some meaningful impact.
I think even in that example,
getting rid of the dead code
would be beneficial as well too.
Absolutely.
And that's actually another, I guess,
fairly simple use of the observability,
that if you identify certain areas of your code as dead
because they are never executed in production,
then you can potentially feed this information to developers
to make sure that they know that, oh, this code is dead, so you can just feed this information to developers to make sure that they know that,
oh, this code is dead, so you can just remove it.
Because compilers, they do have techniques like tree shaking or whatnot to find unreachable code. But the problem is that not all code that is reachable is executed.
So using production data and observability,
it would help a lot in these efforts.
Coming back to the whole idea of predicting the way you write your data,
is there, especially for developers that are listening in now, that find this fascinating because maybe they hear about this the first time.
Is there anything like a listener can do and say,
hey, there are three best practices on how
you can do this.
Or there's like, check out this video.
Obviously, check out your talk from Newtis Peck,
but is there something that you suggest every developer that wants to optimize the way they access and store the data,
what they should know?
What should every developer know that wants to be more efficient
with the data?
Well, I usually call this as a mechanical sympathy principle.
So you usually want to really understand the APIs that you're using, because all of them are developed with a certain set of assumptions. And if you know those assumptions,
you are able to leverage them in the most efficient way.
I don't think there is a single advice
that can be applied to solve majority of the issues.
But in practice, you can think about techniques like parallel arrays,
where instead of arrays of structs, for example, you use a struct of arrays. And
it's a common pattern that is used in game development.
But basically, the idea is that you usually try to pack the data
that you're going to access close together.
So imagine that, let's say, you have a method that is, I don't know,
calculating a total score of a game.
So you pass an object, like let's say a game,
that has a list of maybe tournaments or whatnot.
And then a tournament has a bunch of fields like a name
or a list of players that participated and other things.
And then it also has a score field.
So what you would normally have is you would have a loop that iterates over all the tournaments
and access that score to aggregate it.
But the problem with this approach is that your tournament struct or class or whatnot
is going to have a lot of extra data in addition to the single field that you're interested in.
And it's going to pollute your cache and it's going to drastically decrease efficiency
of the IO that you have to perform to get the data to the CPU.
So what you can do instead is imagine that you have again the same game class,
but instead of having a list of tournaments, you have an array of all tournament scores you have an array of all
users for all the tournaments and like all those things so uh this way when you iterate you're
going to be iterating over the array with just the scores which means that you're going to be iterating over the array with just the scores, which means that you're going to be able to fetch a bunch of scores
into the single cache line.
And on top of that, you would be able to leverage SIMD instructions,
so vectorized instructions, a lot better,
because you'd be able to just load them into a single register,
perform a bunch of additions
or multiplications or whatnot as a single instruction.
So you have multiple wins for that.
And so it's just one of the examples,
but obviously the fewer branches you have the better.
And that usually means that it's usually very useful to know all the properties of your data in advance.
It's, I guess, imagine that you have an input, an array of numbers,
and you are supposed to check whether there is a certain
number in it.
If you know nothing about that array, then you would probably have to use something like
a linear scan.
But if you know that it's, let's say, sorted, then you'd be able to do a binary search,
which is presumably much faster than a linear scan.
So again, just knowing this is extremely important.
So we frequently try to think that obstructions are created
to just forget about all of our dependencies and just, you know, live in our comfy
comfort zone, not caring about anything else.
And I believe that abstractions, I actually like
how Dijkstra used to think about abstractions.
He was basically thinking of them as a way to
create a way to reason about things
extremely precisely. So it's not about vagueness, it's not about ignorance, it's the opposite. a higher level of understanding your properties.
And so just thinking about what exactly you know about the data
that you are working on.
What do you know about the type of devices that you work with?
Is it the network? Is it the hard drive? Because there used to be even
a movement, an attempt to create all these RPC abstractions that create an illusion for you that
you're just doing a local function invocation. And that illusion is extremely harmful because you really don't want to
have a lot of
RPC calls in, let's say, your tight loop.
But if you don't even know that it's an RPC call
under the hood, then
there would be absolutely no reason for you not to use it.
And I guess that's where the observability comes into picture.
And it's probably something that you can point out these kind of issues and say that, oh,
looks like you have an IPC call in your loop.
So maybe you want to create a batch call instead
of having
these single ones.
So I guess
just knowing your
data, knowing your
dependencies is extremely helpful
to make sure that you get the
best of performance. And also
thinking about the use case.
Because again,
oftentimes we try to overgeneralize things.
And I guess if you look at the database space, you can probably notice how many different
databases are available these days. And the reason is that they are trying to address
a very particular use case.
And for example, if you have, let's say, InfluxDB,
it's really optimized for time series processing.
And it's really, really good.
It has a lot of encodings that make the storage efficient,
processing efficient.
But it's not a competitor to, let, say, something like CockroachDB,
which has a completely different set of use cases in mind.
And that actually goes back to that example about the Android fitness optimization
where replacing SQLite with manually built database storage was extremely
beneficial just because we don't need all that generality that SQLite provides.
We really wanted to have a really fast way to append data and really fast way to scan
all the data to create an aggregate.
And I guess this is one of the main ideas of the presentation was that
whenever possible, if you have to do an IO,
and you pretty much always have to do an IO because even if, in my opinion,
even if you use memory it's an i o
you really benefit from append-only data structures where you always append just to the end
without having to uh look for a place to insert your items somewhere in the middle
and then shift things around or even spend multiple pages,
memory pages.
So that's extremely beneficial.
You can do all sorts of optimizations and you can build a lot of interesting data structures buffers which power both kernels with things like IO ring, but they also power the fastest
the exchanges like Disruptor for example is leveraging the ring buffers effectively to reduce the number of locations that you have to have
and also reduce contention that usually is inherent to most other data structures.
And again, if you have a pend-only data structure,
it's fairly easy to implement it without any logs at all.
So it's very friendly for concurrent environments, just because you usually have to just use
a single atomic increment to allocate the block of memory you're going to write to and other
writers at the same time are able to just use the
other memory. And so you have fast
advance, very efficient because the data is
packed very well and also lock-free, which is
great for avoiding the pitfalls of Amdahl's law
and its better version called universal scalability law.
This is amazing.
I think you were just, I mean, it's fascinating to learn all these things.
Obviously, you're deep into this and you have a lot of experience.
For me, a lot of this is new, but it's great.
I also want to say, don't give everything away
because you want to make sure that more people now watch your talk
from the New to Spec event.
Also, an interesting comment that I watched on the recording
when you started the Q&A.
Kind of Henrik was saying, I don't have any questions right now because first I need to
read up everything. And then you said it seems you haven't done a good job if people now don't
know what to ask because you have this analogy of you need to, when you explain, when you teach,
you need to explain in very simple terms that people understand without them having to go back to the material.
But you did a phenomenal job.
I took a lot of notes.
Even I understood it.
See, even Brian understood it.
That's always a good benchmark.
I would love eventually, not eventually,
definitely to have you back because there's so many, I guess,
more topics
in that area but i would also as you know all the work we do with captain around continuous
delivery and automation there it would be fascinating to have a conversation with you
as well about this topic i know you're very busy but i would love to have you back to talk about
even more of this stuff and especially about continuous delivery.
Yeah. Thank you so much for the kind words. I really appreciate it.
I think I pretty much haven't covered even a small fraction of the things
that I covered during the talk.
And I'm very sorry for being so scatterbrained.
It's just, there are so many awesome topics.
Like this is such a vast space that it's really easy to get off the topic
and just, you know, cover all sorts of things.
Just because I really love this area. I believe that it's super beneficial
to build this mechanical sympathy
and just learn about how things work.
I think it's probably one of the reasons
why people are interested in engineering in general,
just because they love to tinker with things
and nothing forces you
to understand the hardware and the software and their interactions as working on performance just
because they are extremely related like you cannot think about performance and just in pure software or in pure hardware.
You have to think about how they interact.
And I guess you mentioned where else you can find some of the patterns.
I occasionally post some of the patterns that I find useful on the LinkedIn.
And also I just recently created the sub stack
where I published some of my thoughts.
So if anyone is interested,
they are very welcome to subscribe.
Yeah, we'll do that.
And we definitely make sure to link your LinkedIn profile.
I also want to mention that you have hosted
the performance summit, right? I that you have hosted the Performance Summit.
I think you did it the second time or third time.
I don't know how often, how many?
I think I hosted probably like five.
And the last one was hosted by my colleague here in London.
And we will probably have another one in a few months. But I am now also organizing the Scaling News Delivery Summit.
So we got DevSecPerfOps.
Kidding.
So what's really interesting, Andy, too, is that people like you and I
who come from the performance testing and engineering background of that side,
this is like a whole new world of it.
And I think just that I was thinking about this quite often or quite a lot during the podcast today.
And at the very end, I forget which one of you mentioned it, but it was the idea of when you get the engineers to think about performance, where I took that is that's when you get a whole different scale of it.
We're always looking at response time.
We're looking at some things like some code efficiency and everything.
But when you talk about disk access
and block storage and things like that
that are way beyond the kind of areas
that we would probably think about,
we can't even observe that too well.
It's just fascinating how much further you can take all this.
And I think the biggest hurdle, and if we do have you back on,
I'd love to get some ideas from you,
because I know we heard some from Garenka when she was on.
I would love to get some ideas from you.
How do you get the engineers in places that are not like Facebook
to care about performance?
What incentivizes them?
Besides, hey, your code will run well? Most of them are, a lot of developers are just struggling to get their code out the
door.
So how is it that we can make this more attractive to them, more mutually beneficial for them
and make them care about it?
I think that's one of the biggest hurdles.
Because I think a lot of people have it in their mind.
I think some of it is around
the tool sets and what can be brought
to them to point out and help point out.
But just seeing, again,
for a future show, because we're out of time
anyway, how
we get developers to care about performance
besides whipping them.
Yeah,
I mean, I can, I guess,
give a very, very short answer short answer obviously it's a complex topic
and i have a lot of interesting ideas about how we can do that but if i had to give just one advice
then i would just tell uh that the insights that we generate should speak the same language as developers.
Because instead of talking about some microseconds or allocations, you really want to explain what the actual impact is going to be in production. So let's say your page is going to load in one second
instead of 100 milliseconds,
or your service is going to load much slower,
and because of this, you're going to lose a certain amount of revenue.
So if you can if
you can uh boil things down to revenue it's probably the easiest one like oh your change
is gonna make us lose one million dollars that's a good incentive to actually think about it but
if you just talk about some abstract units then it's like i don't know if it's relevant or not it's hard to tell awesome good hey again thank you so much we looking forward if you have any additional
links you want to send us over then we'll do this after the with them at the recording
are there any other events coming up where you're speaking because um sometimes not yet uh as i mentioned i think that we will
hopefully have another performance summit probably somewhere in june or july uh so that's probably
gonna be the next event but i have been uh absolutely honored to be featured on your podcast.
So if you ever want to invite me again, I would love to.
It's been an absolute pleasure.
And I think there are so many more things to talk about,
especially how awesome tools like Captain can help us
to drive a lot of performance and reliability efficiencies into our culture
and shape the culture in general.
Awesome.
Thank you so very much.
We will look forward to having you back on.
And thanks for all of our listeners for listening all the time.
And yeah, have a wonderful day
yeah thank you so much thank you bye andy bye