Signals and Threads - Compiler optimization with Greta Yorsh
Episode Date: September 30, 2020It’s a software engineer’s dream: A compiler that can take idiomatic high-level code and output maximally efficient instructions. Ron’s guest this week is Greta Yorsh, who has worked on just tha...t problem in a career spanning both industry and academia. Ron and Greta talk about some of the tricks that compilers use to make our software faster, ranging from feedback-directed optimization and super-optimization to formal analysis.You can find the transcript for this episode along with links to things we discussed on our website.
Transcript
Discussion (0)
Welcome to Signals and Threads, in-depth conversations about every layer of the tech stack from Chainstream.
I'm Ron Minsky.
So it's my pleasure today to have a conversation with Greta Jorsch.
Greta is a compiler engineer in our compilers team in the London office,
and she also has a rich and interesting background in compilers and programming languages
and program analysis and verification. And we're going to talk about all that. She's also had a
lot of experience in both industry and academia. And since 2018, Greta has worked with us working
on the OCaml compiler. So Greta, to start with, could you talk about how you got interested in
compiler optimization? I studied in Israel. I did my undergraduate studies in computer science and economics.
I found that I don't cope well with uncertainty.
So the interesting mathematical bits of economics were not the right choice for me.
So I decided to do a PhD in computer science in the area of software verification, where
a lot of the concerns are about ensuring things so that that fits well with my approach to
certainties.
And after my PhD, I worked for a few years at IBM Watson Research Center in New York,
working on program analysis and verification problems.
And after that, I moved to Cambridge in England,
and I joined an amazing company, ARM, that makes processors.
And I worked there in the compilers team,
working on GCC compiler backend for ARM. And after that, I worked at
Queen Mary University of London as an assistant professor, or lecturer it's called in England,
where I did research and teaching in the area of compilers. And then I recently, a year and a half
ago, I joined Chainstreet. And at Chainstreet, I work on the compilers team on a Camel compiler.
Tell a little bit more about how that fits into the arc of your research.
I started my research working on the problem of software verification or certain aspects
of this problem, proving that software satisfies certain good properties,
correctness properties that the users expect from it.
And that research involved a lot of reasoning about the source code of the program,
rather than executing it, and then using certain formal methods and the logical reasoning
techniques, decision procedures,
in order to prove properties of the program. And I worked on that for a while. And there is this
ultimate goal of proving that the software behaves just the way it should be so that real
world executions of your program don't cause terrible failures. And I was very excited about
it for a while, but it is an undecidable problem,
a problem that is very hard to solve in general.
And that becomes a bit frustrating after a while.
And I was interested in what is it that real programmers
and users of software want to prove about their programs
and how I can help them using the techniques that I know
and the logic that I know.
And so I worked on that. And in the process of that, I've discovered that it's really fun to
work with real users and have people who care about certain things. And I don't know what they
care about, but if I talk to them, maybe I'll find out. And maybe one of the tools I have in my
toolbox will help me to give users a tool.
And I found it very exciting.
I think the first time it happened, it was at IBM.
And I was really excited about it.
But what I also realized is that not everyone really cares about correctness.
That sometimes it's okay to have bugs in their programs.
There are other important aspects of software, like performance, that make a big change to
my understanding. because I could now
put these very formal techniques together with something practical and try to combine the two.
And then when I had the chance to work at ARM on the GCC compiler backend for ARM,
I've discovered how important it is, the performance, and how exciting it is to work
on these micro-optimization, CPU-specific,
machine-specific optimization, how much difference they make to the behavior of the program,
to the performance behavior, performance characteristics of the program.
And at the same time, working on a compiler, you just can't have bugs.
Important property of the compiler is that so much else in our system is built on top
of it.
That was a tool that combines correctness and performance,
some optimization function that is very clear and measurable as opposed to what properties do users want to have for their programs.
And that's how I came to work on compilers.
Yeah, it's interesting you're kind of starting off by focusing on verification
because I think verification is a very shiny and attractive goal
because there's this basic fact about programming,
which is we're all terrible at it.
Humans are bad at writing programs.
Programming languages are things that, at their best, do infuriatingly only the very
specific things you ask them to do and not what you intended.
And so trying to close that gap between the thing you try to do and the terrible thing
that you actually wrote down is an attractive goal. You were talking about one of the problems
of verification, and just to be clear, by verification, formal methods, lots of terms here,
we basically mean coming up with the moral equivalent of a proof that a program matches
some specification. So you write in some precise higher level way what the program is supposed to
do, and then there's like the
actual program that you've written and some kind of proof that the program does what it's intended
to do. And you mentioned undecidability, right? And undecidability roughly means an undecidable
problem is one where you can show that there is no general technique that solves it all the time.
And you mentioned that as like a sign of how difficult the problem is. But it's a funny thing.
In some sense, almost everything that we do is undecidable, right? You can't answer
almost any interesting question about a program in full generality, like the classic result being
the halting problem. You can't even tell whether a given program is going to ever finish executing
in a general way. But that normally doesn't stop us. The way we deal with this is we tend to provide people not with
fully automated systems that solve all of their problems, but instead tools that they can use
while they write programs. They can maybe do some extra work along the side to help ensure the
correctness of those programs. And those tools might be in the form of testing systems, and they
might be in the forms of something that looks more like formal methods, things that essentially, along with the program, develop some aspect of the proof of correctness
of the program. So I'm curious what you think about the kind of practicality and ergonomics
of formal methods as a thing that can be layered into the programming process to make it easier
for people to write programs that do the right thing. The kind of verification and proving
properties of programs that I think of when I The kind of verification and proving properties of programs
that I think of when I say software verification, traditional methods involve writing a program in
one language and then providing specification of its behavior, telling what properties should
hold about the program in a different language, in the logic of some kind that describes it, and then proving that the specification and
the implementation are conformed.
That is very different from the type system approach that OCaml takes, where the types
are the specification of the program.
And that is something I'm learning as I become better at programming in OCaml.
But the kind of verification I'm talking about, the idea of specifying these properties on
the side somewhere, or even in the program, but in a different syntax, I think it's a
big problem for adopting this kind of technologies.
It's possible to specify something about the library, but keeping it up to date as the
code evolves is very hard.
Same problem with documentation.
But when you try to reason about it and you have a tool that runs together with the code,
takes the specification and the code and checks them against each other as you do the development,
then there is some hope to keep the code and the spec aligned.
But still, the programmer needs to write both of them.
So if there is opportunity for automation to save some work for the programmer and keep
the two aligned.
I also wanted to mention when you were talking about verification, so different people think
of verification of different things.
To me, it is what you said.
It's the more formal proof about the source code of the program and its properties.
But for most of the development process, it is testing.
It's running the program on some inputs and checking that the outputs behave as we expect. So after
working a bit and trying to find the right logics that are expressive enough to describe the
properties we want, and at the same time, allow us to prove automatically that the program conforms
to these properties, you find, oh, this is very restrictive. And you look, but testing is so much better.
I've been working on this for two years, but I could have found this bug with a simple
test.
So how do we find tests that are good tests?
Or do we have good tests?
So this problem has been worked on by many people.
And what I find exciting is that taking the testing approach, finding test cases, and
the formal proof approach and combining them
helps us build better tools. So there are many ways in which each side can help the other.
And I worked on exploring some of these ways, and I found it very interesting. And I think this
shows up in verification, but there are also similar combinations of static and dynamic
information or execution time and compile
time information that are useful for compilers during compilation.
It's interesting for you to say this thing about the way in which tests and formal proofs
mix together.
I feel like this is a thing that I've noticed informally on my own in programming, which
is the OCaml type system and in general type systems are a kind of lightweight formal method.
Types only capture
some fairly limited properties of what your program can do, and some type systems are better
at this. OCaml has a relatively capable one. Languages like Java have comparatively simpler
ones. But you can capture a bunch of useful properties in your type system, and the properties
you capture in your type system are universally quantified. You know those properties are insured all the time for all executions without compromise. And then tests let you kind of test out what's going on at particular
points. You can say for this particular run, for that particular run, you can check that certain
properties hold. And there's a nice thing that happens, a kind of virtuous combination of the
two, which is when you have a well-typed program, one finds that
testing does more to help you bring it into place. It's easy to write a well-typed program that's
wrong, and it's easy to write a program that you've written a few tests for where the tests
pass, but it doesn't really work all the time. But there's a way in which when you have a well-typed
program and you test it in just a few places, it tends to snap into place, and you get much better results
in terms of the correctness of the result than either one alone. So that's like a very kind of
informal programmer experience point of view on this. I'm curious from a kind of more research
perspective, what are the ways that you found in your research to combine testing and formal
verification? So one of the ways that I think is quite in its own,
it gives understanding of the formal reasoning methods via examples of execution. So in the way
to think about it is you execute your program, you get some program states, you collect some
information about them, they satisfy the property you need, and then you take and abstract these
states. So you find some more general way of describing that state and similar states.
And that describes more than the states that you've explored during execution.
So the idea is you execute the program and collect the states that you observed in that
execution, and then you apply some abstraction function that's somehow related to the property you want to prove. And this gives you a bigger set of states that you have explored
that satisfy the property. And then you can go back into the formal world and say,
what are all the possible states and how much of that have I covered? So you could use logical
tools like decision procedures to find how much of that have I covered. If I have I covered? So you could use logical tools like decision procedures to find how much of that have I covered? If I have not covered everything, give me an example state that was not covered,
and then take that and see, can I cover it in my real execution? Can I get to it? What happens if
I start running from that state? So that's one combination. So let me see if I understand this
correctly. It sounds like what you're doing is you start with an example run.
That example run essentially explores some part of the shape of how the program might execute.
And then you kind of execute that in some sense symbolically, where you sort of aren't keeping exactly all the details, but you're taking that one run through the program and generalizing it
so that you think about it as covering not just a single run, but some bounded, but in some sense,
very large subset of
possible executions. Am I getting in the right direction of understanding how the technique
you're talking about works? It's not one technique. There are many techniques, but this is the general
way in which I think they help each other. Yes. So you execute down one path with one example,
and it says, okay, you've covered all the states, all possible states that go down that path,
but you have not covered different paths.
So let's now go down a different path
and try to find states that take us there.
And to find states, you can use more formal verification.
So that's one way in which formal verification
can help increase coverage of testing.
But there's also the other way that the tests help.
By running these tests,
you have a fast way to cover a certain area and it speeds up the verification of the rest.
Because one of the problems with verification is that if you want to prove expressive properties
about your program, it becomes very expensive, even if it succeeds at the end. There is a way
to use concrete execution to speed up verification,. There is a way to use concrete execution to speed up
verification, or there is a way to use information gathered from testing to speed up execution.
Got it. So these are essentially all ways of kind of popping back and forth between the formal
method style, larger scale analysis, and examples as a way of digging into the program. And the
tools can help you move back
and forth between these two and more efficiently do both tasks, essentially. And more precisely.
This combination, I think it's beautiful mathematically, but there are so many ways
in which it helps in practice. And that's why I find it really fun to work on that.
To step back for a moment, it sounds like some of your frustration
with working on this verification style approach, despite the fact that you clearly love a lot of
the techniques and ideas there, is that it felt to you like in practice, verification often wasn't
serving the actual needs of developers, in part because correctness wasn't quite as high on their
list of requirements for them to put the amount of
effort required to use these formal techniques. Yes.
This leads me to ask about something that you've worked on, which is super optimization,
which is an example of a compiler technique where correctness becomes incredibly critical.
So maybe you could tell us a little bit about what super optimization is and
the role that verification plays in it.
So let me tell you a story first. So when I worked at ARM on the ARM compiler backend,
there is interaction between the hardware and the compiler backend. And in order to generate efficient code for a particular architecture, you need to understand about the costs of each
instruction and the combination of these instructions and which registers to use, how to store memory.
And then the compiler has some limited information about these properties of the hardware that
the program is going to run on.
And the compiler is trying to produce good code, but doesn't provide any guarantees about
how well the code will perform.
And there is a trade-off there.
There is a reason for that.
Even if we could provide guarantees, it would be very long compilation time.
It would be very costly to generate code that's efficient.
But often we can't provide the guarantees because we don't know everything that's going
on in the system.
So the way the compiler approaches it is that we have some internal representation in the system. So the way the compiler approaches it is that we have some internal
representation in the compiler of what the source program was, and the compiler transforms,
it applies transformations to the program. And each transformation has a particular goal.
For example, one transmission could take one intermediate representation and generate a
simpler intermediate representation that's closer to the instructions that hardware understands.
So instruction selection.
Another pass will choose which values to store where, whether to store values and registers
on the stack.
Another pass would choose in which order to execute instructions that are not dependent
on each other, and so on.
And these decisions, they are made as separate passes on the intermediate representation
that the compiler holds for the program. And even if you make each pass optimal with respect to
knowledge that that pass holds, in variance, there are dependencies between choices made in one pass
and another pass. And you can repeat that on and on and maybe keep improving the code, maybe not,
depending on the properties of these passes. But it is very hard
to make an improvement to the compiler because you know what you want to generate, but you change
this pass and there are internal dependencies between them. And it's very hard to improve the
compiler in order to get the codes that you know will be better than what the compiler generates
currently. Again, changing the compiler is a difficult process
because of the correctness requirements on it.
So when I worked at ARM
and I was doing this kind of transformation
and trying to change the compiler to emit better code
for a new microarchitecture,
for some improved aspect,
to take advantage of some improved aspect
of the microarchitecture,
it is actually very hard to convince the compiler
to generate the code you want
because of this interaction between passes
and the limited way in which we can change them in compilation.
And so this made me think about the technique of superoptimization
where the technique says,
well, just give me a solution for the whole thing.
Don't apply passes one by one.
Give me the best solution possible for all of these constraints together.
Maybe that's how I think about it.
And that's why it's related to my research and verification.
So I should define the super optimization properly, though.
A little more mechanically, if you could just describe how super optimization works.
In super optimization, you have an instruction sequence and the goal is to find another instruction sequence that is optimal, but has the same behavior as the given instruction
sequence. So the goal of super optimization is to generate, for example, the shortest instruction
sequence that has the same behavior.
That is originally, but it's been taken over the years and extended.
This word is applied in a more general context of an intermediate representation of the program. And I will generate an instruction sequence for it that's the optimal one with respect to some measurement of what optimal is.
So initially optimal meant the shortest instruction sequence because there was a close correspondence between the length of the instruction sequence and the time that it takes to execute.
But now it's not true anymore.
There are other metrics.
So sometimes you have more expensive instructions that take longer to execute that can be replaced with longer instruction sequence that will end up faster.
So this was the problem. And the solution that was proposed has to do again with verification.
So if two instruction sequences, the candidate instruction sequence, what the optimal must be,
and then you need to show that they have the same behavior. Now, what does it mean to have
the same behavior? It means basically the first instruction given an input returns some output so the second instruction
needs to return the same outputs for the same inputs so it computes the same function so to do
that we now need to prove equivalence between two arbitrary programs and of course that's very hard
we can't do it in general it's undecidable again and so what people did initially they said okay
let's not look at arbitrary programs.
Let's just look at straight line instruction sequences, instruction sequences with no branches,
no loops. Can we do that then? And initially people also didn't know how to do that. So there
was a form of testing between the two sequences that was applied in the first paper in super
optimization, which is in 1982 by Massalin. So after that, people worked on the verification.
They came, oh, we have tools that can prove equivalence
of two straight line instruction sequences.
So let's now try to formulate these instruction sequences,
their meaning as a set of constraints
or as a logical formula,
and then prove that the constraints are satisfied
or that the formula is satisfied.
It's a very interesting connection from compilation to verification, but it brings in all the
costs of verification with it.
How expensive is it to run this decision procedure?
Can we run this kind of generation process every time for every basic block in our program?
That would be too expensive. Can we do it at compile time and prepare a few more of
these instruction sequences that we know they're optimal and then use them whenever we need?
It sounds like this all connects to two very different models of how you think about building
a compiler, right? There's the classic, I remember who this is attributed to, but there's
an old comment of you end up with a seven-pass compiler when you have seven teams working on the compiler,
which is to say the traditional structure of a compiler, where you have a sequence of what are
called IRs or intermediate representations, essentially little languages that at each
layer represent some different way of viewing the program. And then you work on what you were
talking about called passes, which are transformations,
either moving from one IR to another IR
or transformations within the same IR.
And this is all in some sense,
a way of modularizing the program.
So you can have a way of thinking about what you're doing.
So you don't have to think about the entire compiler
all at the same time.
And you were highlighting,
it's not quite as modular as you might imagine,
because there are actually invariants that spread across the same time. And you were highlighting, it's not quite as modular as you might imagine because there are actually invariants
that spread across the entire stack
and the modularity itself gets in the way of some things
in the sense that if you want to make certain changes
to the output, it's hard to know where to do it.
And the thing you might do in one phase
doesn't necessarily transfer the way through
to the other phase
and getting the actual outcome you want is hard.
And then super optimization treats compilers, I guess, kind of as a search problem where it says, well, let's just have
a criterion for what it means to be good and what it means to be right. And let's just search.
We'll do some kind of, in some sense, the stupidest possible thing. We'll look over all
the possible instruction sequences that look better and just find the ones that are provably
the same and we'll pick those. So it's interesting because it's both a concrete technique of how to make things faster,
but also just a very different mental frame to use for thinking about what a compiler is doing
and how a compiler is structured. So with all that about super optimization, I'm curious,
as a practical matter, what do you think is a good role for super-optimization to play in a modern compiler, given all the costs you mentioned about how expensive verification can be?
It's an interesting question. I would like to see it used more. I think it depends on how
fast the different engines get for checking and for searching through the space of possible candidate solutions
and finding the correct ones and there is a lot of research in that area so obviously right now
where it stands we can't use it for compilation on daily basis for our builds on the developer's desktop. And I don't think it will get there ever.
Maybe. I don't know. I don't see how there are inerrant costs. I don't know how to overcome.
But I think that the way it stands, the technology right now, is that it would be a very valuable tool to help build compilers and to help optimize code that is incredibly critical for performance.
That's, again, something that in the real world, as it's called in academia, it's so
important to understand.
We have a compiler that needs to compile vastly different styles of code, even in the same
language, with different performance requirements
and characteristics. Why do we use the same backend for all of them? Maybe the more critical
code should be compiled separately with different backend, different optimization, maybe even super
optimization when it goes onto production. So that's one way in which super optimization could
help. And the exciting aspect in what I believe is really the power of this super optimization
approach is takes as input logical representation of what the semantics of the underlying hardware
architecture is.
And if you take that as a parameter of the system, then when you have new hardware design with different costs or maybe even different specification,
then you can just plug it into that system and get a very good code out of it.
After a while, after it finishes code generation, get the new code without having to rewrite all these very intricate passes
and redesign them or without having to write a new backend.
Basically, it gives new backend for free in some utopic world.
And this is very important in the world where hardware changes and develops in so many different
ways than it used to be before.
That's, again, something I learned at Arm, where I got to work close to hardware designers.
There's such a close interaction between the way that hardware is designed and the way
the compilers are designed.
And with every new hardware, microarchitecture design, to get the best performance out of
it, to get the maximum benefit from it,
we would want to generate different code.
So why do we invest so much in hardware
and developing new hardware
when we can't extract the best performance out of it?
So is your point that super optimization
improves the collaboration cycle
between the hardware engineer and the compiler engineer
in the sense that like hardware engineer proposes
new change to the hardware to make things faster.
And then super optimization gives you a way
which is computationally expensive,
but cheap in terms of human resources to see,
well, how good is this hardware?
If we actually compiled in a way
that took advantage of the hardware,
what would that look like?
How good could we get at that?
Even if that doesn't lead to a compiler
that you can really ship to people
that's really useful in the end,
it lets you learn more efficiently
what are the possibilities
by finding essentially the best possibilities for you.
So that's one of the ways
in which super optimization already today can help, I think.
It can help compiler designs
and can help hardware designers.
But I think if these tools,
when they reach maturity,
maybe they're very close now,
they can also help users outside of the compiler
and hardware developer,
because it will make it easier for users
to just plug in new architectures that come in
and use them and take the best advantage of them.
The idea of super optimization as a way of helping people
who are building the core infrastructure seems to me very compelling, right? The idea of, among other things, being just a way
of checking how close you are to getting the goals that you want out of a compiler, right? You know,
you have your practical compilation stack that runs in a reasonable time and it generates some
code and you can now ask the question, okay, if I now apply a super optimizer to it, how much better
does it get? And that gives you a kind of loss function, a thing to optimize for.
The other thing you described I worry about just because in terms of like saying, oh,
you can have a cheap compiler for your domain specific language done this way.
A cheap but very slow compiler, which is to say, one of the things I think is critical
to making the development experience work well for people is having fast feedback time where they can make a change and see how it improves things. So like, I wonder how you
deal with the fact that the super optimization step would add a big temporal gap. Every time
you wanted to see how your program performs, you have to go through this maybe hours long
optimization process of going and trying to find the actual best code sequences. Do you think that compromises
the practicality of the approach? So I think once when you decide on your DSL and your hardware,
and then you need to kind of make the expensive step once, but then perhaps there is a way to,
I don't want to call it partial evaluation, but partially evaluate that formula or that constraint system to get more efficient compilation.
One, you decide on the pair of source language and target architecture.
Once you have that pair, you can reduce the costs.
But so I agree, the most useful at the moment application of super optimization is the help
it gives compiler developers and hardware designers. But I think it also opens up the possibility
of a lot more flexible combinations
of which hardware to use
without losing the performance characteristics
that that hardware was designed for.
So let's switch gears.
I want to talk a little bit about
how you got to where you are now.
You have spent a lot of time
working in both industrial context
and in research context.
And when you joined us, you moved both industrial context and in research context. And
when you joined us, you moved very much back into an industrial mode. And I'm curious about what
you think about the trade-off between these two and why you decided to make the move back
into a more industrial, practically focused role. So I feel like through the research,
I became familiar with a set of powerful tools that can be combined in different ways.
And in academia, there is time to think about carefully design these tools without the time
to market pressure that some industries have.
Even in academia, it always feels a bit like sitting in a tower and it's hard to get to
the users
who really care about these tools.
These people are very busy
and they are not there available always to test the tools
or to try things out on one hand.
And on the other hand,
I feel that there are a lot of tools in this toolbox
that I can put to good use,
but maybe it's not very novel and exciting in terms of
a research paper, but it is exciting to me because I can help developers do their job better maybe,
or improve the compiler that they use. So these techniques that I am familiar with,
but I don't know what it is that is needed. And I don't have any feedback of how these tools are suitable and what problems
there should be. It feels like in academia, there is more time, but there is more pressure to
get novel ideas and results out of the existing tools. And I feel that there are a lot of low
hanging fruit there that can be picked up with tools that we already have or techniques that
are almost close to what is needed. but the trade-offs are different.
The scale of code is different
when you work on it in industry.
I'm very excited about this aspect
of having people who actually care
about what I produce
and use it on a daily basis.
At Chain Street,
I sit in the compiler's room,
but the other people
who use the compiler are right there
and everyone communicates and talks about their problems and their ideas, what should be improved.
And you get feedback so quickly when you're there.
So part of what I'm hearing from here is that the thing that excites you about the industrial context is the connection with users, right?
You get to actually build things which get picked up and used, and you get feedback. And that feedback helps drive the work, decide what's important, build new things based on
what is actually useful.
So maybe you can talk about some of the projects that you've worked on at Jane Street.
One big project I was involved in at Jane Street is doing profile-guided optimizations
in OCaml.
We call them feedback-directed optimization these days. And the idea is
to collect execution profiles during program execution and then use them to improve code
generation. It was traditionally called profile-guided optimization. So profile-guided
optimization traditionally uses instrumentation. So it changes the program that executes by inserting special points
to keep track of the execution and then uses the profile collected by that. And more recently,
a sampling-based approach was used to reduce the overhead running the instrumentation and also to
improve the precision because the sampling-based approach, as opposed to the instrumentation approach, doesn't have to change the code that executes.
And by changing the code, as I mentioned earlier, it may have effects that are not related to the code.
So feedback-directed optimizations were not used by OCaml compiler, but they're actually traditionally used in other compilers,
C++ compilers in particular.
They're used a lot to improve performance.
One of the problems in static compilation
is that we don't know what inputs the program will get
when it is executed,
down which paths the program execution will go.
And so the compiler doesn't know where it is worth optimizing more
and where it's not.
But if we execute the program on inputs that are representative
of the way the program is used in production,
then we will get some idea of what paths in the program are hot,
what paths are important.
And if this information can be communicated to the
compiler, then the compiler can make better decisions or it can guide the compilation
decisions. For example, where to inline more or because the code is important, where to optimize
more. So the idea is to run some benchmarks and during the execution, collect some information
about the execution of where the
hot paths are, and then create this profile that is then used by the compiler statically.
And the way where we applied it in OCaml compiler is to optimize the code layout.
So the order in which the functions are arranged in the final executable and the order in which within a
function different basic blocks or different pieces of straight line code are arranged.
What it achieves is improvement in performance because when code that executes together,
when two functions that call each other often and that they are hot, when they are laid out
together in memory close to each other,
then the cost of accessing each one of them is lower. So the program runs faster.
I remember when we first started working on this and saw, especially when we saw some of
the early results, I was kind of shocked at the size of improvements, which were like 10, 15,
20% improvements to runtime of various critical programs just for moving memory around and not
even moving the memory of
the running program, which I'm already used to thinking of as a big deal, but the memory of just
the instructions. And so the idea that makes such a big impact on how fast your program runs,
I found pretty astonishing at the time. Why is it so slow to have things spread across your
executable? When the functions that call each other often are far away, when the
hardware executes the code, it needs to get the code from memory. And memory is arranged in layers.
So there's the cache memory that is very fast to access. And there are multiple levels of cache
that they each increasingly slower to access, but increasingly larger. When the code that we execute is not in cache,
then it takes longer to access.
So by laying the two functions that frequently call each other together,
they get into the cache faster.
It increases the access, which means that the program can execute the code
without needing to wait for loading the code from some lower cache level
or from memory.
When working on the FDO tool, I spent a lot of time trying to get the improved performance by reordering.
And then it turned out that one of the valuable uses of this is just to stabilize the performance. So one of the developers complained that their code suddenly got slower than before.
And the only change was a library that was doing some test manipulation that was not
related to the code that was executing.
And they were trying to understand what changed. So one of the people on the compiler's team suggested to run it with FDO.
And running it with FDO removed the differences in performance because what the FDO tool did is the code that was used was placed together.
So the changes in the library that were not affecting the executions that the benchmark was evaluating
were not affecting the layout of the code. So the benefit is not just improving the performance,
but making it more predictable. And maybe slower, but usually faster. Sometimes slower,
but predictable is better. Right, because if it's predictable and you know whether or not
your changes are real changes to the performance, that helps you guide your work and make changes that incrementally, like change after change,
drive the program overall to be better performing, you know, a few percent at a time.
And if every time you compile your program, it moves around by three percent,
just for no reason at all, then it's extremely hard to do this kind of sequence of small scale
improvements. And I assume the same problem hits in compiler development. Like if every time you modify the compiler
and it gets a little faster or a little slower,
you don't know whether it's really gotten faster
or it's just mucked with the alignment a tiny bit.
So is there a hope to more or less do this same thing
on the compiler side
when you're trying to evaluate compiler changes
as a matter of course to apply FDO
to reduce the amount of noise before and after?
So this in fact ties in with our previous discussion about benchmarks.
So there are some people in the tools and compilers team who developed a benchmarking
infrastructure to help us benchmark compiler versions and compiler improvements that we
make.
And one of the things that we've considered is to add FDO into that benchmarking infrastructure
to help us level the results.
And another way that this could be used is actually there are some interesting research
recently about introducing random changes to the layout and then re-measuring the performance
to see which transformations actually change performance
and which transformations do not. So that's interesting work by Emery Berger from UMass.
That's good. So these are almost the opposite approach. One of them is to say,
let's squish everything into a kind of canonical representation that has closer to the same
behavior. And the other is to say, let's give up on this, getting like things,
a single artifact that's predictable and instead try and understand the distribution. And so you're randomly creating a bunch of them. And I guess the disadvantage of the latter one is your benchmarks
are going to take a lot longer to run because every time you sample the distribution, you
basically have to perturb the executable and rerun the benchmark and perturb the executable and rerun
the benchmark. So they might be orders of magnitude different in terms of how long it takes the benchmarks to run.
It's interesting.
This research work I mentioned is a way to address this problem
in some sense by using sophisticated statistical aggregation sampling.
My understanding is that one of the approaches they have
is change the layout every few seconds.
During the execution of the program?
During the execution, yes. Oh, that's awesome. Yes, I know.
Another thing to talk about here with FDO is FDO is an interesting kind of example on a spectrum,
right? There are techniques for optimization that are very dynamic, and there's techniques that are
very static ahead of time. And FDO is one way of trying to forge a compromise
between these two, right?
Where you mostly do ahead of time compilation,
but then you gather profile information
that lets you go and redo some of the work there
and further optimize.
But there are lots of other places that this comes up.
Like one that shows up more and more over time
is various forms of just-in-time compilation.
So already many years ago, Java had just-in-time compilation. So already many years ago,
Java had just-in-time compilation for its bytecode.
Languages like JavaScript do an enormous amount
of trace-based optimizations
where you are running the program,
monitoring it as it runs,
observing things about what paths are hot
and what are the likely paths to be taken
and what are things that need further optimization
and applying further compilation there.
And then it also happens in the actual hardware, right?
The hardware itself is constantly gathering statistics and making guesses about how your program is going to behave based on how it behaved just recently.
I wonder how much of the work that one does in FDO and similar kinds of techniques where you're trying to take profile information
and apply it to your static compilation process,
how much that depends on what else
is happening in the stream.
If you were to do something like
some kind of feedback directed optimization
for a simpler architecture that does less
in the way of branch prediction
and prefetching and so on and so forth,
would that change the amount of work
that would be profitable to do
in the form of feedback-directed optimization?
So there are many aspects of this, right?
So one is if you switch to a different hardware,
as you said before,
that say doesn't do as much prediction,
then you would need to do more work on the compiler side
to arrange the code in the best possible way for hardware to do it.
That's one aspect of who rearranges the code.
But then there is the aspect of the feedback directed or the profile.
Do we know what the hot paths in that program are?
And for that, when the hardware only knows the current execution
and it only knows the history
of the current execution, and I think it has a pretty small window on that particular program's
execution, it doesn't have enough information on where that execution came from and where it is
going to. Whereas the compiler statically has more information about what will happen or what are all
possible things that can happen to this execution in the future.
So there are some things that hardware probably do better at a cost, again, perhaps energy cost.
And some things that the compiler has more information about, which is a completely different point in this whole spectrum, but related to the just-in-time compilation that you said before,
because the just-in-time compilation
is connected to it,
is what is the overhead
of doing the optimization?
If you put this optimization
in the hardware,
then the hardware will have to do it,
but maybe there will be some overhead
for doing it in hardware.
If it's possible to do it statically
at compile time or at load time
or earlier,
then that will give us
better, faster code. The core trade-off being the compiler knows more about the structure and
therefore in some sense about some aspects of the future behavior of the program and the hardware
knows more about what's happening right now. I guess there's another trade-off, which is
any optimization you get in the hardware from the point of view of a hardware vendor has the
nice property of applying to all the compilers. Whereas things that you get in the hardware, from the point of view of a hardware vendor, has the nice property of applying to all the compilers.
Whereas things that you do in a particular compiler benefits the particular community that uses that, but not everyone who uses the hardware in question.
Which, I guess, affects the economics of which side of the line you want to do the improvement on.
Intel surely wants their particular CPUs to be better than other people's CPUs, whereas someone
working on the OCaml compiler wants the OCaml compiler to work well in as broad a set of
architectures as possible.
But also the same trade, if you take it one step back to source, and again, there are
different kinds of programs.
Some of them, it matters more, these optimizations.
Others, the cost of accessing memory and doing garbage collection just
overpowers the micro-optimizations that we can get from reordering instructions and fitting them in
because it's so much more expensive to access memory. And then the data layout matters a lot
or the way that we access whatever we can learn from the way that the program accesses its data or its containers will matter more
for optimization, which is not something currently compilers do.
But I think there is some tooling like the space time and the mem trace tooling that
people in the compilers team have been working on to identify access patterns and help that.
And it's so exciting to be part of this team and seeing this work happening.
And it's a community of people who are so technical, so strong in what they do. And they
sit right there and I'm in the same room with them and I can ask them questions or just listen to
their conversations and learn from it and see how the development goes. It's a great team. Yes.
I always love visiting and hanging out with the compiler team because this is a ton of exciting
stuff going on all the time. So we talked a little bit about what it was like moving to
JaneSuite in terms of the difference in interaction and the kind of feedback that you get and some
about the concrete work that you've been doing. I wanted to ask about a different aspect, which is
what has it been like switching languages? And you switch languages really in two ways,
which is the optimization work that you did, say, at ARM
was all about C and C++ compilers.
And now you're working on a compiler
where the source language it's trying to optimize is different.
But also you're working in a different language, right?
The compiler that you're working on is written also in OCaml.
And I'm curious about both of those aspects. But to start with, can you tell me a little bit about
what it's been like starting to do compiler work in OCaml rather than in C and C++?
Before coming to Chainstreet, I have never programmed in OCaml. I knew about the existence
of OCaml and I sort of knew that I should learn this language, but all the times I tried,
it didn't quite work out. I even tried F-sharps once because there was a good book, but it still didn't help. And when I came to
Jane Street and I started writing in OCaml after getting used to the change in the thinking about
my programs, it was amazing. I couldn't believe I've been writing my program analysis in Java. OCaml is so much
better suited for it. The language is just perfect. The code that I need to express a
transformation is so small compared to multiple Java classes that I would need to write some
visitors and traversals and so on. So this has been great for me, for what I need. OCaml is just the perfect language
for writing compilers
and tools around compilation.
But also, it has this nice environment
around the ecosystem of OCaml
where there are build tools.
But yes, but for the language,
I think one of the reasons
that it's convenient for me
is the way that I think about it as more a
mathematical description of what I want to do when I program rather than the imperative
approach with languages I used before, especially with C and C++, where not only you need to
think about what you're doing, but also about where the memory and the allocated, which
in OCaml is not a concern
because of the garbage collector.
And the other aspect which I really enjoyed,
after getting some intuition about the type error messages,
after getting some idea of what the type checker
is trying to tell me,
it became a great tool
because when my program type checks,
I can see how many errors, how many silly mistakes in coding it caught.
And I was able to fix thanks to this.
So this is great. and Dune and Opam, of course, when I do the things that go outside of Jane Street
and also the type checker holding my hand
and helping me not make silly mistakes.
But then you always need to test.
It's the thing I think that people often don't appreciate
who have not used OCaml or a similar language
is how much better language is kind of in the OCaml
like part of the design space of programming languages,
how much better they are at catching errors at the type level. Like if you're used to programming
in a language like Java, you think, oh, I know what a type system is like. And it's like,
no, you really don't know what a type system is like. Like you've seen one particular kind
of type system with a lot of weird trade-offs and a lot of limitations. And it really is better in
this part of the pool. The type system is really much, much better at catching errors.
This is so true.
You know, I have a real story how true it is. I remember a few years ago talking to someone saying, no, what do you mean?
C++ has types.
It does have type.
And then we had a little conversation.
And I couldn't really appreciate it until after I started programming in OCaml and how
much more I get from the type system, how much
more reassurance and guarantees it provides.
Okay, so that was all about OCaml as a language to work in. How do you find
OCaml as a language to optimize? What challenges does that present?
I'm learning to appreciate the value of the choices that were made by OCaml developers
regarding the structure of the backend, and in particular not just the structure of the choices that were made by OCaml developers regarding the structure of the backend
and in particular, not just the structure
of the way that the compiler backend is,
but the choice of let's not make it overly complicated.
Let's make assembly code or generated code
that is easier to understand
and is as close as possible to the original program
because the hardware will optimize it
in ways that
will be far superior.
And we need to focus on generating the correct code, first of all.
Having said that, there are some optimizations, but they are not phrased in the way that I
am familiar with from the literature.
I think it's also because of the divide between the functional languages and imperative languages and how things are sometimes called a single static assignment, SSA, and sometimes they're
called something else about continuations, but they are equivalent in some place and
different compilers use different approaches to implement it.
So that was a big change for me, but I think I now appreciate the choices and how they
were made.
And I focus with what I do on the code generation for AMD64, so for the compiler backend.
But the compiler backend is structured to be very portable and independent of the target
hardware.
Most of the backend is independent.
And that's another reason why it is the way it is.
And it is not diverging.
The code that is specific to each hardware architecture is actually quite small.
But that restricts us in ways that I think would be good to lift the restrictions.
For example, the kind of optimizations and transformations that are performed right now
in the compiler backend, they're statically defined and there is basically no mechanism for changing
them or adding to them, which is one of the problems that we've encountered recently with
feedback-directed optimization work. So that's all about, in some sense, the structure of
how the compiler itself is built and how that affects the work. I'm wondering if there's
anything about the surface language that affects how you think about the cogeneration process. Do OCaml programmers
generate different patterns that require different optimizations than you might see in a language
like C or C++? I'm probably not the right person to ask about it, right? But on that subject,
I think that, again, the way that the OCaml compiler is structured with most sophisticated part of it is the type checking and the optimizations like inlining.
This is necessary in order to lower the higher order functional language that it is down
to assembly and produce efficient code.
And having that in place lets programmers write their code
in a natural, more natural way than they would have to without the support, right?
And this is very powerful. And it has also been a subject of work at Chain Street,
which I was not involved in, but I get to learn from it and benefit from it as a user for Camel as well.
The work on improving the middle end, as it's called, the optimizations that are happening.
And actually, let's demystify that terminology.
Front end, middle end, back end is like standard ways people talk about different parts of a compiler.
Can you say roughly what each of those means? You can think of a compiler as applying a sequence of transformations, starting from
source code and changing it gradually into machine code.
These transformations are divided into three stages called front-end, middle-end, and back-end.
Usually front-end performs things like parsing and type checking,
and back end performs optimizations that are specific to a given hardware, and it knows about
the exact registers and the pipeline design. So the front end, which takes as input the source
code, is specific to the programming language that is being compiled, for example, OCaml. The backend, the third stage generates a machine code or assembly code targeting a
specific hardware architecture so that it's specific to the target architecture.
For example, the OCaml compiler can generate code that runs on AMD64 Intel hardware using one backend and a different
backend to generate code for ARM hardware. And it has a few other backends for different
hardware architectures. So the same front and middle end of the compiler can be used with
different backends for the same source language. And similarly, there are compilers that can take as input different source languages.
For example, LLVM framework can take as input C++ and C using one frontend.
It can also take as input Swift or Rust using different frontends.
And then reuse the other two stages as they are. So this enables the compiler infrastructure to reuse a lot of the source code
while supporting different source and target languages.
The second stage of compilation, the transformations,
can be independent of both the source language and the target language.
And different analysis can be performed and reused.
So because the first part is called
front end and the third part is called back end then the middle part needs to also have end in
its name so we get the middle end of the compiler which i can never say without laughing
but yes right and the middle end is where things, at least in OCaml, where things like inlining occur. And so
that's what you're talking about, of taking all of this complex, higher-ordered functions,
calling functions, being passed functions, and all of that, and getting it flattened out
to a simpler representation that's more suitable for code generation. And the code generation is
the part you've been focusing on, and that's the back end of the compiler.
So one of the things, by the way, you mentioned is there's relatively small
amounts of code that are dedicated to particular architectures. I'm curious actually how well
adapted you think OCaml is to ARM, right? You have a lot of experience working in ARM and ARM
is an increasingly important architecture. It's a very good architecture for low power utilization. In
fact, I think Apple recently announced that they're going to switch to using ARM for their
desktop machines. So that's a big step. How good of a compiler do you think OCaml is for ARM at
this point? I have not used OCaml on ARM. I have no experience with it. I've seen the code in the ARM backend to some extent, and to me, it looked like
it would not be as good a fast code as, say, GCC generates.
But that is a very difficult target and also not a fair comparison with garbage collector.
But even without any allocation involved, I think that the ARM backend of OCaml has not received as much attention from what it
looks like from the code.
So they are missing things that are specific to ARM.
Well, again, it depends on what kind of ARM microarchitecture is being targeted.
Some of ARM microarchitecture, low power designs are in order execution.
So there's less reordering that the hardware does.
So it's more important, for example, to do good scheduling of instructions at the block level.
And there is a pass for doing scheduling of instructions in the OCaml compiler backend.
It's not used for AMD64.
It seems to be generic for ARM.
So all of ARMv8 get the same scheduling decisions or of ARM V7 target CPUs will get the
same scheduling decisions, which is not always the best. The use of registers for ARM backend
would be very important. So improving register allocation because using different registers
with different instructions has restrictions
and sometimes have different costs.
And there is a way to encode it in the ARM compiler, but I think it's not as flexible
as one would like in order to get the good performance.
Those are just the things I've noticed and came to mind now.
I wonder if this is an area we'll start to care about more as the CPU ecosystem changes.
Because power matters to us too, it turns out.
And it's interesting to think about, like, Intel has had a kind of monopoly on servers
for quite a long time.
And we care much more about servers.
You know, at Gainsuite, we don't really write much code that's targeted to, say, mobile
phones.
But it's possible that we'll see the server world move as well.
And that would probably change our priorities a fair amount.
I think the comments about the register allocator also applied to the AMD64 backend
where we know that the code generation is particularly simple-minded when it comes
to registers, even though there is a good register allocation algorithm, but the end result
often can be improved. But there's something that I think is a big investment.
And I think in order to know that we will get the benefits we expect from it,
we need to have some examples or some idea of why we expect this to be better
in terms of the faster code at the end for our systems.
And that's what I really like about being in a place like Jane Street,
because I have access to these programs
and applications that matter, that I know matter.
And people who care about these applications
write good benchmarking infrastructure for them.
And to me, as someone who works on the compiler,
that makes it easier to test different ideas
and just get feedback straight away
and maybe get impact faster when I improve my things
or cause more trouble faster. So recently I made a small change to the compiler, which was
an optimization, but because I misunderstood something about the internals of a camel
compiler, it introduced a miscompilation bug and it was only noticed
because one of the tests in one of the
systems failed. It's not a very
common situation so I don't
think there was any
damage done for
systems running in production and there was
a test that caught it but
it was a test that we didn't routinely run
and it was very scary
because I broke the compiler.
And then I had to watch all these people who have very important things to do,
spending their time rolling back their applications and talking and communicating
and synchronizing on how to avoid hitting that problem.
So this just shows how important compiler work is and how much impact any correctness mistake makes.
Even if it didn't break any running system and production didn't cause problems, it took
time of these people to move on.
And I don't want it to happen.
Obviously, I did my best test, but I didn't test it enough.
And I think one of the things about the culture at Jane Street and what I really like is that
there is a process or I think it's not a process,
it's individuals.
And as a culture,
the mistakes are taken as,
OK, we'll fix it.
And then how can we learn from it?
Why did we not run this test?
And then two weeks later,
we have a system
to run these tests as well.
And also something
about the compiler esteem.
How come we didn't catch this before?
It's happened before to other people. How do we change? How can we improve the compiler intermediate representation
or how can we improve the compiler to prevent this kind of mistakes happening? Then we go and
sit, oh, that's what would need to be done to do it properly. Okay. So now we know this is something
that would be nice to be done. When do we do it? We'll figure it out separately. We need to
prioritize. But this whole process, I was so relieved to be part of this process, even though I
made this mistake and released a broken compiler.
But it was a learning experience and a positive one.
And I think reassuring.
I'm glad that came across as a positive experience because it's something that we try and keep
baked into the culture kind of throughout the organization. So I want to jump back to a thing that you said a moment ago. You were talking
about feedback and particularly getting feedback from benchmarks and being able to kind of see the
performance impact of the code you make and help that guide the optimization decisions that you're
making. So one of the challenges that we have with working on OCaml is we do not own the language.
The language exists outside of our walls. It was initially developed by the research team at Inria that created it.
And they continue to have a very deep role there.
And in the end, we have to convince upstream maintainers about changes that we want to make.
And one of the challenges there is there's lots of things that we see internally that are easy to see from the inside and harder to see from the outside.
And I'm curious, what are the things that we can do to try and explain to Upstream more effectively
why we're making the changes that we make and being able to kind of surface the motivations
in a way that is compelling to the core team that maintains the language.
We see many examples internally where we see a way to improve them.
We see that, for example, like with FDO or with register allocation, there are ways to
improve.
We would like this to be accepted upstream, but the motivation for it is very internal
to Chainstreet.
So for these examples, the relationship that Chain Street developers
and compiler core developers have
and their participation
in the upstream account developers community,
being part of it,
it really helps
because there is some trust
that's being developed between the two.
And a very technical conversation happens
between people outside of Chain Street and there is a lot happens between people outside of Chainstreet.
And there is a lot of collaboration also outside of Chainstreet in the core or camel developer
community. But there is no automatic acceptance to anything that comes from Chainstreet. If anything,
it's the other way around. If you only need it for Chainstreet, then it's probably not going to be a good feature for the general
purpose language and the community compiler.
So I think the best thing that could happen is finding other users of OCaml and sharing
our tools with them and developing tools that help us measure code performance and evaluate different
metrics and then sharing these tools upstream externally.
It can be open source just as the compiler is and other libraries that Jane Street open
sources and then give these tools to other users of OCaml.
And I think there is a growing community of high scale users OCaml, even outside of Jane Street.
And with these tools, this will encourage us in the community and will get us feedback
from the other users of OCaml, for example, at Facebook and other companies that use OCaml
industrially, but also from the researchers who use it in academia and perhaps are more
used to trying out new experimental
features. Now, to make that happen, there is already Opam and there is already Dune that
works externally and makes the process easier. But there is, to my knowledge, no place to share
results. I think it happens a bit on OCaml Discuss, but maybe a different channel
for communication of this would help and a more open conversation between the core developers of
OCaml and the users who are not part of that small group. Yeah, I suspect this is a place where you
probably just need to build some better tooling among other things to kind of share benchmarks. And there is some work in this direction. So OCaml Labs has done a bunch
of work for preparing for multi-core OCaml, which is now being pushed very aggressively to the point
where for the first time, I feel like it's kind of insight that we'll actually have a version of
the compiler that's released with a multi-core garbage collector. We have this repo called
Sandmark, which is a place where they have a bunch of benchmarks
all together and they use that for evaluating.
And the idea that we can try and leverage that
for other kinds of optimizations that we're doing,
I think is a promising one.
But yeah, I think the communication is critical and hard.
And it's especially hard when some of the problems
just have to do with scale.
And it's a scale that other people may not see, right?
I think we release a surprising amount of open source code.
We have like a half a million lines of our internal code is released on GitHub and kind
of updated a couple of times a week.
And that helps because any problem that you can isolate to that code base, you can actually
demonstrate it externally.
But FDO is a good example where some of the strongest results
for FDO improving the performance of things has come with very large executables with lots of
dependencies. And those aren't things we tend to open source, in part because it's internal stuff
that's not of any interest, and in part because we have some kind of proprietary intellectual
property reasons to not want to release it. And so it's at those things that operate at larger scale that I feel
like is some of the most challenging things to demonstrate effectively. I think your point about
if it looks like it's an optimization which really just affects stuff inside of a company
like Jane Street and has no other use elsewhere, Upstream is going to be a lot less excited about taking it.
I think the core for this objection
is in the approach to code generation for OCaml.
The backend is not, it is important for it to perform,
but it is not perhaps the top priority.
Portability is maybe a higher priority than that,
portability between different systems
or being able to perform well
on range of different input programs so it's something that also needs to change not just in
the in the communication but also in what is valued by the community and And I understand the resistance to introducing new obscure features. I mean,
if you look at what happened to LLVM, where it's a lot easier to contribute code and add passes,
then it's a lot harder to keep the compiler uniformly reliable and predictable and correct.
Or on the other hand, maybe these two are the extremes,
and maybe there's some point in between. So if there was more infrastructure in the
OCaml backend that would enable people to add their own passes, optimizations, transformations
that benefit particular code patterns that they care about, and more flexibility in
enabling such passes,
then perhaps it would be easier to do. Yeah. This is maybe a case where modularity
is just a thing that can help. I think one of the things that upstream is afraid of is they're
worried about a lot of complexity being added to the compiler and then it being their responsibility
to maintain it. Because I think both people at Jane Street, but also the other members of the
core team feel an enormous amount of responsibility about the language.
Everything that's in there, they think of themselves as responsible for.
And they think of it as a collective tax on the overall community to add new functionality.
And so making it so that you can maintain some pieces of functionality on the side where
it doesn't become part of the shared ownership, that's, I think, a valuable move.
I think another thing that I think can also help is having more modularity makes it easier to do experiments
where maybe in the long term, you do want to contribute this back upstream. But in the
meantime, you'd like to make it easy to maintain a patch set that modifies just this one part
so that you can try it out and see how effective it is and iterate quickly on the technology so you can
get to a point where you can propose it up to them and say, hey, we've done this work
and look, we can demonstrate a lot of benefits from it.
It's really worth taking on the complexity, but making it easier to do that kind of experimentation,
I think, is critical to basically being able to do that at a larger scale.
Indeed.
And also getting feedback, thumbs up from other users of Recaml would be easier if it did not require them to change
their build system in order to try a new feature. Here we're talking about feedback-directed
compiler construction. There we go. Different kinds of feedback. Okay. Well, I think we're
at this point out of time, but thank you so much
for taking the time to talk with me about this. It's been a lot of fun.
Thank you.
You can find links to some of the things that we talked about, including Greta's work on
feedback-directed optimization, as well as a full transcript of the episode at
signalsandthreads.com. Thanks for joining us and see you next week.