Signals and Threads - Compiler optimization with Greta Yorsh

Starting point is 00:00:00 Welcome to Signals and Threads, in-depth conversations about every layer of the tech stack from Chainstream. I'm Ron Minsky. So it's my pleasure today to have a conversation with Greta Jorsch. Greta is a compiler engineer in our compilers team in the London office, and she also has a rich and interesting background in compilers and programming languages and program analysis and verification. And we're going to talk about all that. She's also had a lot of experience in both industry and academia. And since 2018, Greta has worked with us working on the OCaml compiler. So Greta, to start with, could you talk about how you got interested in

Starting point is 00:00:42 compiler optimization? I studied in Israel. I did my undergraduate studies in computer science and economics. I found that I don't cope well with uncertainty. So the interesting mathematical bits of economics were not the right choice for me. So I decided to do a PhD in computer science in the area of software verification, where a lot of the concerns are about ensuring things so that that fits well with my approach to certainties. And after my PhD, I worked for a few years at IBM Watson Research Center in New York, working on program analysis and verification problems.

Starting point is 00:01:28 And after that, I moved to Cambridge in England, and I joined an amazing company, ARM, that makes processors. And I worked there in the compilers team, working on GCC compiler backend for ARM. And after that, I worked at Queen Mary University of London as an assistant professor, or lecturer it's called in England, where I did research and teaching in the area of compilers. And then I recently, a year and a half ago, I joined Chainstreet. And at Chainstreet, I work on the compilers team on a Camel compiler. Tell a little bit more about how that fits into the arc of your research.

Starting point is 00:02:14 I started my research working on the problem of software verification or certain aspects of this problem, proving that software satisfies certain good properties, correctness properties that the users expect from it. And that research involved a lot of reasoning about the source code of the program, rather than executing it, and then using certain formal methods and the logical reasoning techniques, decision procedures, in order to prove properties of the program. And I worked on that for a while. And there is this ultimate goal of proving that the software behaves just the way it should be so that real

Starting point is 00:02:58 world executions of your program don't cause terrible failures. And I was very excited about it for a while, but it is an undecidable problem, a problem that is very hard to solve in general. And that becomes a bit frustrating after a while. And I was interested in what is it that real programmers and users of software want to prove about their programs and how I can help them using the techniques that I know and the logic that I know.

Starting point is 00:03:26 And so I worked on that. And in the process of that, I've discovered that it's really fun to work with real users and have people who care about certain things. And I don't know what they care about, but if I talk to them, maybe I'll find out. And maybe one of the tools I have in my toolbox will help me to give users a tool. And I found it very exciting. I think the first time it happened, it was at IBM. And I was really excited about it. But what I also realized is that not everyone really cares about correctness.

Starting point is 00:03:55 That sometimes it's okay to have bugs in their programs. There are other important aspects of software, like performance, that make a big change to my understanding. because I could now put these very formal techniques together with something practical and try to combine the two. And then when I had the chance to work at ARM on the GCC compiler backend for ARM, I've discovered how important it is, the performance, and how exciting it is to work on these micro-optimization, CPU-specific, machine-specific optimization, how much difference they make to the behavior of the program,

Starting point is 00:04:30 to the performance behavior, performance characteristics of the program. And at the same time, working on a compiler, you just can't have bugs. Important property of the compiler is that so much else in our system is built on top of it. That was a tool that combines correctness and performance, some optimization function that is very clear and measurable as opposed to what properties do users want to have for their programs. And that's how I came to work on compilers. Yeah, it's interesting you're kind of starting off by focusing on verification

Starting point is 00:05:02 because I think verification is a very shiny and attractive goal because there's this basic fact about programming, which is we're all terrible at it. Humans are bad at writing programs. Programming languages are things that, at their best, do infuriatingly only the very specific things you ask them to do and not what you intended. And so trying to close that gap between the thing you try to do and the terrible thing that you actually wrote down is an attractive goal. You were talking about one of the problems

Starting point is 00:05:28 of verification, and just to be clear, by verification, formal methods, lots of terms here, we basically mean coming up with the moral equivalent of a proof that a program matches some specification. So you write in some precise higher level way what the program is supposed to do, and then there's like the actual program that you've written and some kind of proof that the program does what it's intended to do. And you mentioned undecidability, right? And undecidability roughly means an undecidable problem is one where you can show that there is no general technique that solves it all the time. And you mentioned that as like a sign of how difficult the problem is. But it's a funny thing.

Starting point is 00:06:04 In some sense, almost everything that we do is undecidable, right? You can't answer almost any interesting question about a program in full generality, like the classic result being the halting problem. You can't even tell whether a given program is going to ever finish executing in a general way. But that normally doesn't stop us. The way we deal with this is we tend to provide people not with fully automated systems that solve all of their problems, but instead tools that they can use while they write programs. They can maybe do some extra work along the side to help ensure the correctness of those programs. And those tools might be in the form of testing systems, and they might be in the forms of something that looks more like formal methods, things that essentially, along with the program, develop some aspect of the proof of correctness

Starting point is 00:06:48 of the program. So I'm curious what you think about the kind of practicality and ergonomics of formal methods as a thing that can be layered into the programming process to make it easier for people to write programs that do the right thing. The kind of verification and proving properties of programs that I think of when I The kind of verification and proving properties of programs that I think of when I say software verification, traditional methods involve writing a program in one language and then providing specification of its behavior, telling what properties should hold about the program in a different language, in the logic of some kind that describes it, and then proving that the specification and the implementation are conformed.

Starting point is 00:07:30 That is very different from the type system approach that OCaml takes, where the types are the specification of the program. And that is something I'm learning as I become better at programming in OCaml. But the kind of verification I'm talking about, the idea of specifying these properties on the side somewhere, or even in the program, but in a different syntax, I think it's a big problem for adopting this kind of technologies. It's possible to specify something about the library, but keeping it up to date as the code evolves is very hard.

Starting point is 00:08:04 Same problem with documentation. But when you try to reason about it and you have a tool that runs together with the code, takes the specification and the code and checks them against each other as you do the development, then there is some hope to keep the code and the spec aligned. But still, the programmer needs to write both of them. So if there is opportunity for automation to save some work for the programmer and keep the two aligned. I also wanted to mention when you were talking about verification, so different people think

Starting point is 00:08:34 of verification of different things. To me, it is what you said. It's the more formal proof about the source code of the program and its properties. But for most of the development process, it is testing. It's running the program on some inputs and checking that the outputs behave as we expect. So after working a bit and trying to find the right logics that are expressive enough to describe the properties we want, and at the same time, allow us to prove automatically that the program conforms to these properties, you find, oh, this is very restrictive. And you look, but testing is so much better.

Starting point is 00:09:05 I've been working on this for two years, but I could have found this bug with a simple test. So how do we find tests that are good tests? Or do we have good tests? So this problem has been worked on by many people. And what I find exciting is that taking the testing approach, finding test cases, and the formal proof approach and combining them helps us build better tools. So there are many ways in which each side can help the other.

Starting point is 00:09:32 And I worked on exploring some of these ways, and I found it very interesting. And I think this shows up in verification, but there are also similar combinations of static and dynamic information or execution time and compile time information that are useful for compilers during compilation. It's interesting for you to say this thing about the way in which tests and formal proofs mix together. I feel like this is a thing that I've noticed informally on my own in programming, which is the OCaml type system and in general type systems are a kind of lightweight formal method.

Starting point is 00:10:03 Types only capture some fairly limited properties of what your program can do, and some type systems are better at this. OCaml has a relatively capable one. Languages like Java have comparatively simpler ones. But you can capture a bunch of useful properties in your type system, and the properties you capture in your type system are universally quantified. You know those properties are insured all the time for all executions without compromise. And then tests let you kind of test out what's going on at particular points. You can say for this particular run, for that particular run, you can check that certain properties hold. And there's a nice thing that happens, a kind of virtuous combination of the two, which is when you have a well-typed program, one finds that

Starting point is 00:10:45 testing does more to help you bring it into place. It's easy to write a well-typed program that's wrong, and it's easy to write a program that you've written a few tests for where the tests pass, but it doesn't really work all the time. But there's a way in which when you have a well-typed program and you test it in just a few places, it tends to snap into place, and you get much better results in terms of the correctness of the result than either one alone. So that's like a very kind of informal programmer experience point of view on this. I'm curious from a kind of more research perspective, what are the ways that you found in your research to combine testing and formal verification? So one of the ways that I think is quite in its own,

Starting point is 00:11:26 it gives understanding of the formal reasoning methods via examples of execution. So in the way to think about it is you execute your program, you get some program states, you collect some information about them, they satisfy the property you need, and then you take and abstract these states. So you find some more general way of describing that state and similar states. And that describes more than the states that you've explored during execution. So the idea is you execute the program and collect the states that you observed in that execution, and then you apply some abstraction function that's somehow related to the property you want to prove. And this gives you a bigger set of states that you have explored that satisfy the property. And then you can go back into the formal world and say,

Starting point is 00:12:17 what are all the possible states and how much of that have I covered? So you could use logical tools like decision procedures to find how much of that have I covered. If I have I covered? So you could use logical tools like decision procedures to find how much of that have I covered? If I have not covered everything, give me an example state that was not covered, and then take that and see, can I cover it in my real execution? Can I get to it? What happens if I start running from that state? So that's one combination. So let me see if I understand this correctly. It sounds like what you're doing is you start with an example run. That example run essentially explores some part of the shape of how the program might execute. And then you kind of execute that in some sense symbolically, where you sort of aren't keeping exactly all the details, but you're taking that one run through the program and generalizing it so that you think about it as covering not just a single run, but some bounded, but in some sense,

Starting point is 00:13:04 very large subset of possible executions. Am I getting in the right direction of understanding how the technique you're talking about works? It's not one technique. There are many techniques, but this is the general way in which I think they help each other. Yes. So you execute down one path with one example, and it says, okay, you've covered all the states, all possible states that go down that path, but you have not covered different paths. So let's now go down a different path and try to find states that take us there.

Starting point is 00:13:33 And to find states, you can use more formal verification. So that's one way in which formal verification can help increase coverage of testing. But there's also the other way that the tests help. By running these tests, you have a fast way to cover a certain area and it speeds up the verification of the rest. Because one of the problems with verification is that if you want to prove expressive properties about your program, it becomes very expensive, even if it succeeds at the end. There is a way

Starting point is 00:14:03 to use concrete execution to speed up verification,. There is a way to use concrete execution to speed up verification, or there is a way to use information gathered from testing to speed up execution. Got it. So these are essentially all ways of kind of popping back and forth between the formal method style, larger scale analysis, and examples as a way of digging into the program. And the tools can help you move back and forth between these two and more efficiently do both tasks, essentially. And more precisely. This combination, I think it's beautiful mathematically, but there are so many ways in which it helps in practice. And that's why I find it really fun to work on that.

Starting point is 00:14:41 To step back for a moment, it sounds like some of your frustration with working on this verification style approach, despite the fact that you clearly love a lot of the techniques and ideas there, is that it felt to you like in practice, verification often wasn't serving the actual needs of developers, in part because correctness wasn't quite as high on their list of requirements for them to put the amount of effort required to use these formal techniques. Yes. This leads me to ask about something that you've worked on, which is super optimization, which is an example of a compiler technique where correctness becomes incredibly critical.

Starting point is 00:15:18 So maybe you could tell us a little bit about what super optimization is and the role that verification plays in it. So let me tell you a story first. So when I worked at ARM on the ARM compiler backend, there is interaction between the hardware and the compiler backend. And in order to generate efficient code for a particular architecture, you need to understand about the costs of each instruction and the combination of these instructions and which registers to use, how to store memory. And then the compiler has some limited information about these properties of the hardware that the program is going to run on. And the compiler is trying to produce good code, but doesn't provide any guarantees about

Starting point is 00:16:03 how well the code will perform. And there is a trade-off there. There is a reason for that. Even if we could provide guarantees, it would be very long compilation time. It would be very costly to generate code that's efficient. But often we can't provide the guarantees because we don't know everything that's going on in the system. So the way the compiler approaches it is that we have some internal representation in the system. So the way the compiler approaches it is that we have some internal

Starting point is 00:16:25 representation in the compiler of what the source program was, and the compiler transforms, it applies transformations to the program. And each transformation has a particular goal. For example, one transmission could take one intermediate representation and generate a simpler intermediate representation that's closer to the instructions that hardware understands. So instruction selection. Another pass will choose which values to store where, whether to store values and registers on the stack. Another pass would choose in which order to execute instructions that are not dependent

Starting point is 00:16:58 on each other, and so on. And these decisions, they are made as separate passes on the intermediate representation that the compiler holds for the program. And even if you make each pass optimal with respect to knowledge that that pass holds, in variance, there are dependencies between choices made in one pass and another pass. And you can repeat that on and on and maybe keep improving the code, maybe not, depending on the properties of these passes. But it is very hard to make an improvement to the compiler because you know what you want to generate, but you change this pass and there are internal dependencies between them. And it's very hard to improve the

Starting point is 00:17:35 compiler in order to get the codes that you know will be better than what the compiler generates currently. Again, changing the compiler is a difficult process because of the correctness requirements on it. So when I worked at ARM and I was doing this kind of transformation and trying to change the compiler to emit better code for a new microarchitecture, for some improved aspect,

Starting point is 00:17:59 to take advantage of some improved aspect of the microarchitecture, it is actually very hard to convince the compiler to generate the code you want because of this interaction between passes and the limited way in which we can change them in compilation. And so this made me think about the technique of superoptimization where the technique says,

Starting point is 00:18:19 well, just give me a solution for the whole thing. Don't apply passes one by one. Give me the best solution possible for all of these constraints together. Maybe that's how I think about it. And that's why it's related to my research and verification. So I should define the super optimization properly, though. A little more mechanically, if you could just describe how super optimization works. In super optimization, you have an instruction sequence and the goal is to find another instruction sequence that is optimal, but has the same behavior as the given instruction

Starting point is 00:18:56 sequence. So the goal of super optimization is to generate, for example, the shortest instruction sequence that has the same behavior. That is originally, but it's been taken over the years and extended. This word is applied in a more general context of an intermediate representation of the program. And I will generate an instruction sequence for it that's the optimal one with respect to some measurement of what optimal is. So initially optimal meant the shortest instruction sequence because there was a close correspondence between the length of the instruction sequence and the time that it takes to execute. But now it's not true anymore. There are other metrics. So sometimes you have more expensive instructions that take longer to execute that can be replaced with longer instruction sequence that will end up faster.

Starting point is 00:19:44 So this was the problem. And the solution that was proposed has to do again with verification. So if two instruction sequences, the candidate instruction sequence, what the optimal must be, and then you need to show that they have the same behavior. Now, what does it mean to have the same behavior? It means basically the first instruction given an input returns some output so the second instruction needs to return the same outputs for the same inputs so it computes the same function so to do that we now need to prove equivalence between two arbitrary programs and of course that's very hard we can't do it in general it's undecidable again and so what people did initially they said okay let's not look at arbitrary programs.

Starting point is 00:20:25 Let's just look at straight line instruction sequences, instruction sequences with no branches, no loops. Can we do that then? And initially people also didn't know how to do that. So there was a form of testing between the two sequences that was applied in the first paper in super optimization, which is in 1982 by Massalin. So after that, people worked on the verification. They came, oh, we have tools that can prove equivalence of two straight line instruction sequences. So let's now try to formulate these instruction sequences, their meaning as a set of constraints

Starting point is 00:20:59 or as a logical formula, and then prove that the constraints are satisfied or that the formula is satisfied. It's a very interesting connection from compilation to verification, but it brings in all the costs of verification with it. How expensive is it to run this decision procedure? Can we run this kind of generation process every time for every basic block in our program? That would be too expensive. Can we do it at compile time and prepare a few more of

Starting point is 00:21:30 these instruction sequences that we know they're optimal and then use them whenever we need? It sounds like this all connects to two very different models of how you think about building a compiler, right? There's the classic, I remember who this is attributed to, but there's an old comment of you end up with a seven-pass compiler when you have seven teams working on the compiler, which is to say the traditional structure of a compiler, where you have a sequence of what are called IRs or intermediate representations, essentially little languages that at each layer represent some different way of viewing the program. And then you work on what you were talking about called passes, which are transformations,

Starting point is 00:22:06 either moving from one IR to another IR or transformations within the same IR. And this is all in some sense, a way of modularizing the program. So you can have a way of thinking about what you're doing. So you don't have to think about the entire compiler all at the same time. And you were highlighting,

Starting point is 00:22:22 it's not quite as modular as you might imagine, because there are actually invariants that spread across the same time. And you were highlighting, it's not quite as modular as you might imagine because there are actually invariants that spread across the entire stack and the modularity itself gets in the way of some things in the sense that if you want to make certain changes to the output, it's hard to know where to do it. And the thing you might do in one phase doesn't necessarily transfer the way through

Starting point is 00:22:40 to the other phase and getting the actual outcome you want is hard. And then super optimization treats compilers, I guess, kind of as a search problem where it says, well, let's just have a criterion for what it means to be good and what it means to be right. And let's just search. We'll do some kind of, in some sense, the stupidest possible thing. We'll look over all the possible instruction sequences that look better and just find the ones that are provably the same and we'll pick those. So it's interesting because it's both a concrete technique of how to make things faster, but also just a very different mental frame to use for thinking about what a compiler is doing

Starting point is 00:23:15 and how a compiler is structured. So with all that about super optimization, I'm curious, as a practical matter, what do you think is a good role for super-optimization to play in a modern compiler, given all the costs you mentioned about how expensive verification can be? It's an interesting question. I would like to see it used more. I think it depends on how fast the different engines get for checking and for searching through the space of possible candidate solutions and finding the correct ones and there is a lot of research in that area so obviously right now where it stands we can't use it for compilation on daily basis for our builds on the developer's desktop. And I don't think it will get there ever. Maybe. I don't know. I don't see how there are inerrant costs. I don't know how to overcome. But I think that the way it stands, the technology right now, is that it would be a very valuable tool to help build compilers and to help optimize code that is incredibly critical for performance.

Starting point is 00:24:30 That's, again, something that in the real world, as it's called in academia, it's so important to understand. We have a compiler that needs to compile vastly different styles of code, even in the same language, with different performance requirements and characteristics. Why do we use the same backend for all of them? Maybe the more critical code should be compiled separately with different backend, different optimization, maybe even super optimization when it goes onto production. So that's one way in which super optimization could help. And the exciting aspect in what I believe is really the power of this super optimization

Starting point is 00:25:08 approach is takes as input logical representation of what the semantics of the underlying hardware architecture is. And if you take that as a parameter of the system, then when you have new hardware design with different costs or maybe even different specification, then you can just plug it into that system and get a very good code out of it. After a while, after it finishes code generation, get the new code without having to rewrite all these very intricate passes and redesign them or without having to write a new backend. Basically, it gives new backend for free in some utopic world. And this is very important in the world where hardware changes and develops in so many different

Starting point is 00:26:03 ways than it used to be before. That's, again, something I learned at Arm, where I got to work close to hardware designers. There's such a close interaction between the way that hardware is designed and the way the compilers are designed. And with every new hardware, microarchitecture design, to get the best performance out of it, to get the maximum benefit from it, we would want to generate different code. So why do we invest so much in hardware

Starting point is 00:26:30 and developing new hardware when we can't extract the best performance out of it? So is your point that super optimization improves the collaboration cycle between the hardware engineer and the compiler engineer in the sense that like hardware engineer proposes new change to the hardware to make things faster. And then super optimization gives you a way

Starting point is 00:26:50 which is computationally expensive, but cheap in terms of human resources to see, well, how good is this hardware? If we actually compiled in a way that took advantage of the hardware, what would that look like? How good could we get at that? Even if that doesn't lead to a compiler

Starting point is 00:27:03 that you can really ship to people that's really useful in the end, it lets you learn more efficiently what are the possibilities by finding essentially the best possibilities for you. So that's one of the ways in which super optimization already today can help, I think. It can help compiler designs

Starting point is 00:27:20 and can help hardware designers. But I think if these tools, when they reach maturity, maybe they're very close now, they can also help users outside of the compiler and hardware developer, because it will make it easier for users to just plug in new architectures that come in

Starting point is 00:27:37 and use them and take the best advantage of them. The idea of super optimization as a way of helping people who are building the core infrastructure seems to me very compelling, right? The idea of, among other things, being just a way of checking how close you are to getting the goals that you want out of a compiler, right? You know, you have your practical compilation stack that runs in a reasonable time and it generates some code and you can now ask the question, okay, if I now apply a super optimizer to it, how much better does it get? And that gives you a kind of loss function, a thing to optimize for. The other thing you described I worry about just because in terms of like saying, oh,

Starting point is 00:28:10 you can have a cheap compiler for your domain specific language done this way. A cheap but very slow compiler, which is to say, one of the things I think is critical to making the development experience work well for people is having fast feedback time where they can make a change and see how it improves things. So like, I wonder how you deal with the fact that the super optimization step would add a big temporal gap. Every time you wanted to see how your program performs, you have to go through this maybe hours long optimization process of going and trying to find the actual best code sequences. Do you think that compromises the practicality of the approach? So I think once when you decide on your DSL and your hardware, and then you need to kind of make the expensive step once, but then perhaps there is a way to,

Starting point is 00:28:59 I don't want to call it partial evaluation, but partially evaluate that formula or that constraint system to get more efficient compilation. One, you decide on the pair of source language and target architecture. Once you have that pair, you can reduce the costs. But so I agree, the most useful at the moment application of super optimization is the help it gives compiler developers and hardware designers. But I think it also opens up the possibility of a lot more flexible combinations of which hardware to use without losing the performance characteristics

Starting point is 00:29:34 that that hardware was designed for. So let's switch gears. I want to talk a little bit about how you got to where you are now. You have spent a lot of time working in both industrial context and in research context. And when you joined us, you moved both industrial context and in research context. And

Starting point is 00:29:45 when you joined us, you moved very much back into an industrial mode. And I'm curious about what you think about the trade-off between these two and why you decided to make the move back into a more industrial, practically focused role. So I feel like through the research, I became familiar with a set of powerful tools that can be combined in different ways. And in academia, there is time to think about carefully design these tools without the time to market pressure that some industries have. Even in academia, it always feels a bit like sitting in a tower and it's hard to get to the users

Starting point is 00:30:25 who really care about these tools. These people are very busy and they are not there available always to test the tools or to try things out on one hand. And on the other hand, I feel that there are a lot of tools in this toolbox that I can put to good use, but maybe it's not very novel and exciting in terms of

Starting point is 00:30:46 a research paper, but it is exciting to me because I can help developers do their job better maybe, or improve the compiler that they use. So these techniques that I am familiar with, but I don't know what it is that is needed. And I don't have any feedback of how these tools are suitable and what problems there should be. It feels like in academia, there is more time, but there is more pressure to get novel ideas and results out of the existing tools. And I feel that there are a lot of low hanging fruit there that can be picked up with tools that we already have or techniques that are almost close to what is needed. but the trade-offs are different. The scale of code is different

Starting point is 00:31:28 when you work on it in industry. I'm very excited about this aspect of having people who actually care about what I produce and use it on a daily basis. At Chain Street, I sit in the compiler's room, but the other people

Starting point is 00:31:41 who use the compiler are right there and everyone communicates and talks about their problems and their ideas, what should be improved. And you get feedback so quickly when you're there. So part of what I'm hearing from here is that the thing that excites you about the industrial context is the connection with users, right? You get to actually build things which get picked up and used, and you get feedback. And that feedback helps drive the work, decide what's important, build new things based on what is actually useful. So maybe you can talk about some of the projects that you've worked on at Jane Street. One big project I was involved in at Jane Street is doing profile-guided optimizations

Starting point is 00:32:20 in OCaml. We call them feedback-directed optimization these days. And the idea is to collect execution profiles during program execution and then use them to improve code generation. It was traditionally called profile-guided optimization. So profile-guided optimization traditionally uses instrumentation. So it changes the program that executes by inserting special points to keep track of the execution and then uses the profile collected by that. And more recently, a sampling-based approach was used to reduce the overhead running the instrumentation and also to improve the precision because the sampling-based approach, as opposed to the instrumentation approach, doesn't have to change the code that executes.

Starting point is 00:33:10 And by changing the code, as I mentioned earlier, it may have effects that are not related to the code. So feedback-directed optimizations were not used by OCaml compiler, but they're actually traditionally used in other compilers, C++ compilers in particular. They're used a lot to improve performance. One of the problems in static compilation is that we don't know what inputs the program will get when it is executed, down which paths the program execution will go.

Starting point is 00:33:44 And so the compiler doesn't know where it is worth optimizing more and where it's not. But if we execute the program on inputs that are representative of the way the program is used in production, then we will get some idea of what paths in the program are hot, what paths are important. And if this information can be communicated to the compiler, then the compiler can make better decisions or it can guide the compilation

Starting point is 00:34:11 decisions. For example, where to inline more or because the code is important, where to optimize more. So the idea is to run some benchmarks and during the execution, collect some information about the execution of where the hot paths are, and then create this profile that is then used by the compiler statically. And the way where we applied it in OCaml compiler is to optimize the code layout. So the order in which the functions are arranged in the final executable and the order in which within a function different basic blocks or different pieces of straight line code are arranged. What it achieves is improvement in performance because when code that executes together,

Starting point is 00:34:57 when two functions that call each other often and that they are hot, when they are laid out together in memory close to each other, then the cost of accessing each one of them is lower. So the program runs faster. I remember when we first started working on this and saw, especially when we saw some of the early results, I was kind of shocked at the size of improvements, which were like 10, 15, 20% improvements to runtime of various critical programs just for moving memory around and not even moving the memory of the running program, which I'm already used to thinking of as a big deal, but the memory of just

Starting point is 00:35:31 the instructions. And so the idea that makes such a big impact on how fast your program runs, I found pretty astonishing at the time. Why is it so slow to have things spread across your executable? When the functions that call each other often are far away, when the hardware executes the code, it needs to get the code from memory. And memory is arranged in layers. So there's the cache memory that is very fast to access. And there are multiple levels of cache that they each increasingly slower to access, but increasingly larger. When the code that we execute is not in cache, then it takes longer to access. So by laying the two functions that frequently call each other together,

Starting point is 00:36:13 they get into the cache faster. It increases the access, which means that the program can execute the code without needing to wait for loading the code from some lower cache level or from memory. When working on the FDO tool, I spent a lot of time trying to get the improved performance by reordering. And then it turned out that one of the valuable uses of this is just to stabilize the performance. So one of the developers complained that their code suddenly got slower than before. And the only change was a library that was doing some test manipulation that was not related to the code that was executing.

Starting point is 00:37:00 And they were trying to understand what changed. So one of the people on the compiler's team suggested to run it with FDO. And running it with FDO removed the differences in performance because what the FDO tool did is the code that was used was placed together. So the changes in the library that were not affecting the executions that the benchmark was evaluating were not affecting the layout of the code. So the benefit is not just improving the performance, but making it more predictable. And maybe slower, but usually faster. Sometimes slower, but predictable is better. Right, because if it's predictable and you know whether or not your changes are real changes to the performance, that helps you guide your work and make changes that incrementally, like change after change, drive the program overall to be better performing, you know, a few percent at a time.

Starting point is 00:37:51 And if every time you compile your program, it moves around by three percent, just for no reason at all, then it's extremely hard to do this kind of sequence of small scale improvements. And I assume the same problem hits in compiler development. Like if every time you modify the compiler and it gets a little faster or a little slower, you don't know whether it's really gotten faster or it's just mucked with the alignment a tiny bit. So is there a hope to more or less do this same thing on the compiler side

Starting point is 00:38:17 when you're trying to evaluate compiler changes as a matter of course to apply FDO to reduce the amount of noise before and after? So this in fact ties in with our previous discussion about benchmarks. So there are some people in the tools and compilers team who developed a benchmarking infrastructure to help us benchmark compiler versions and compiler improvements that we make. And one of the things that we've considered is to add FDO into that benchmarking infrastructure

Starting point is 00:38:45 to help us level the results. And another way that this could be used is actually there are some interesting research recently about introducing random changes to the layout and then re-measuring the performance to see which transformations actually change performance and which transformations do not. So that's interesting work by Emery Berger from UMass. That's good. So these are almost the opposite approach. One of them is to say, let's squish everything into a kind of canonical representation that has closer to the same behavior. And the other is to say, let's give up on this, getting like things,

Starting point is 00:39:30 a single artifact that's predictable and instead try and understand the distribution. And so you're randomly creating a bunch of them. And I guess the disadvantage of the latter one is your benchmarks are going to take a lot longer to run because every time you sample the distribution, you basically have to perturb the executable and rerun the benchmark and perturb the executable and rerun the benchmark. So they might be orders of magnitude different in terms of how long it takes the benchmarks to run. It's interesting. This research work I mentioned is a way to address this problem in some sense by using sophisticated statistical aggregation sampling. My understanding is that one of the approaches they have

Starting point is 00:40:00 is change the layout every few seconds. During the execution of the program? During the execution, yes. Oh, that's awesome. Yes, I know. Another thing to talk about here with FDO is FDO is an interesting kind of example on a spectrum, right? There are techniques for optimization that are very dynamic, and there's techniques that are very static ahead of time. And FDO is one way of trying to forge a compromise between these two, right? Where you mostly do ahead of time compilation,

Starting point is 00:40:28 but then you gather profile information that lets you go and redo some of the work there and further optimize. But there are lots of other places that this comes up. Like one that shows up more and more over time is various forms of just-in-time compilation. So already many years ago, Java had just-in-time compilation. So already many years ago, Java had just-in-time compilation for its bytecode.

Starting point is 00:40:48 Languages like JavaScript do an enormous amount of trace-based optimizations where you are running the program, monitoring it as it runs, observing things about what paths are hot and what are the likely paths to be taken and what are things that need further optimization and applying further compilation there.

Starting point is 00:41:08 And then it also happens in the actual hardware, right? The hardware itself is constantly gathering statistics and making guesses about how your program is going to behave based on how it behaved just recently. I wonder how much of the work that one does in FDO and similar kinds of techniques where you're trying to take profile information and apply it to your static compilation process, how much that depends on what else is happening in the stream. If you were to do something like some kind of feedback directed optimization

Starting point is 00:41:37 for a simpler architecture that does less in the way of branch prediction and prefetching and so on and so forth, would that change the amount of work that would be profitable to do in the form of feedback-directed optimization? So there are many aspects of this, right? So one is if you switch to a different hardware,

Starting point is 00:41:58 as you said before, that say doesn't do as much prediction, then you would need to do more work on the compiler side to arrange the code in the best possible way for hardware to do it. That's one aspect of who rearranges the code. But then there is the aspect of the feedback directed or the profile. Do we know what the hot paths in that program are? And for that, when the hardware only knows the current execution

Starting point is 00:42:24 and it only knows the history of the current execution, and I think it has a pretty small window on that particular program's execution, it doesn't have enough information on where that execution came from and where it is going to. Whereas the compiler statically has more information about what will happen or what are all possible things that can happen to this execution in the future. So there are some things that hardware probably do better at a cost, again, perhaps energy cost. And some things that the compiler has more information about, which is a completely different point in this whole spectrum, but related to the just-in-time compilation that you said before, because the just-in-time compilation

Starting point is 00:43:07 is connected to it, is what is the overhead of doing the optimization? If you put this optimization in the hardware, then the hardware will have to do it, but maybe there will be some overhead for doing it in hardware.

Starting point is 00:43:19 If it's possible to do it statically at compile time or at load time or earlier, then that will give us better, faster code. The core trade-off being the compiler knows more about the structure and therefore in some sense about some aspects of the future behavior of the program and the hardware knows more about what's happening right now. I guess there's another trade-off, which is any optimization you get in the hardware from the point of view of a hardware vendor has the

Starting point is 00:43:44 nice property of applying to all the compilers. Whereas things that you get in the hardware, from the point of view of a hardware vendor, has the nice property of applying to all the compilers. Whereas things that you do in a particular compiler benefits the particular community that uses that, but not everyone who uses the hardware in question. Which, I guess, affects the economics of which side of the line you want to do the improvement on. Intel surely wants their particular CPUs to be better than other people's CPUs, whereas someone working on the OCaml compiler wants the OCaml compiler to work well in as broad a set of architectures as possible. But also the same trade, if you take it one step back to source, and again, there are different kinds of programs.

Starting point is 00:44:18 Some of them, it matters more, these optimizations. Others, the cost of accessing memory and doing garbage collection just overpowers the micro-optimizations that we can get from reordering instructions and fitting them in because it's so much more expensive to access memory. And then the data layout matters a lot or the way that we access whatever we can learn from the way that the program accesses its data or its containers will matter more for optimization, which is not something currently compilers do. But I think there is some tooling like the space time and the mem trace tooling that people in the compilers team have been working on to identify access patterns and help that.

Starting point is 00:45:01 And it's so exciting to be part of this team and seeing this work happening. And it's a community of people who are so technical, so strong in what they do. And they sit right there and I'm in the same room with them and I can ask them questions or just listen to their conversations and learn from it and see how the development goes. It's a great team. Yes. I always love visiting and hanging out with the compiler team because this is a ton of exciting stuff going on all the time. So we talked a little bit about what it was like moving to JaneSuite in terms of the difference in interaction and the kind of feedback that you get and some about the concrete work that you've been doing. I wanted to ask about a different aspect, which is

Starting point is 00:45:40 what has it been like switching languages? And you switch languages really in two ways, which is the optimization work that you did, say, at ARM was all about C and C++ compilers. And now you're working on a compiler where the source language it's trying to optimize is different. But also you're working in a different language, right? The compiler that you're working on is written also in OCaml. And I'm curious about both of those aspects. But to start with, can you tell me a little bit about

Starting point is 00:46:08 what it's been like starting to do compiler work in OCaml rather than in C and C++? Before coming to Chainstreet, I have never programmed in OCaml. I knew about the existence of OCaml and I sort of knew that I should learn this language, but all the times I tried, it didn't quite work out. I even tried F-sharps once because there was a good book, but it still didn't help. And when I came to Jane Street and I started writing in OCaml after getting used to the change in the thinking about my programs, it was amazing. I couldn't believe I've been writing my program analysis in Java. OCaml is so much better suited for it. The language is just perfect. The code that I need to express a transformation is so small compared to multiple Java classes that I would need to write some

Starting point is 00:46:56 visitors and traversals and so on. So this has been great for me, for what I need. OCaml is just the perfect language for writing compilers and tools around compilation. But also, it has this nice environment around the ecosystem of OCaml where there are build tools. But yes, but for the language, I think one of the reasons

Starting point is 00:47:21 that it's convenient for me is the way that I think about it as more a mathematical description of what I want to do when I program rather than the imperative approach with languages I used before, especially with C and C++, where not only you need to think about what you're doing, but also about where the memory and the allocated, which in OCaml is not a concern because of the garbage collector. And the other aspect which I really enjoyed,

Starting point is 00:47:51 after getting some intuition about the type error messages, after getting some idea of what the type checker is trying to tell me, it became a great tool because when my program type checks, I can see how many errors, how many silly mistakes in coding it caught. And I was able to fix thanks to this. So this is great. and Dune and Opam, of course, when I do the things that go outside of Jane Street

Starting point is 00:48:25 and also the type checker holding my hand and helping me not make silly mistakes. But then you always need to test. It's the thing I think that people often don't appreciate who have not used OCaml or a similar language is how much better language is kind of in the OCaml like part of the design space of programming languages, how much better they are at catching errors at the type level. Like if you're used to programming

Starting point is 00:48:48 in a language like Java, you think, oh, I know what a type system is like. And it's like, no, you really don't know what a type system is like. Like you've seen one particular kind of type system with a lot of weird trade-offs and a lot of limitations. And it really is better in this part of the pool. The type system is really much, much better at catching errors. This is so true. You know, I have a real story how true it is. I remember a few years ago talking to someone saying, no, what do you mean? C++ has types. It does have type.

Starting point is 00:49:16 And then we had a little conversation. And I couldn't really appreciate it until after I started programming in OCaml and how much more I get from the type system, how much more reassurance and guarantees it provides. Okay, so that was all about OCaml as a language to work in. How do you find OCaml as a language to optimize? What challenges does that present? I'm learning to appreciate the value of the choices that were made by OCaml developers regarding the structure of the backend, and in particular not just the structure of the choices that were made by OCaml developers regarding the structure of the backend

Starting point is 00:49:46 and in particular, not just the structure of the way that the compiler backend is, but the choice of let's not make it overly complicated. Let's make assembly code or generated code that is easier to understand and is as close as possible to the original program because the hardware will optimize it in ways that

Starting point is 00:50:05 will be far superior. And we need to focus on generating the correct code, first of all. Having said that, there are some optimizations, but they are not phrased in the way that I am familiar with from the literature. I think it's also because of the divide between the functional languages and imperative languages and how things are sometimes called a single static assignment, SSA, and sometimes they're called something else about continuations, but they are equivalent in some place and different compilers use different approaches to implement it. So that was a big change for me, but I think I now appreciate the choices and how they

Starting point is 00:50:44 were made. And I focus with what I do on the code generation for AMD64, so for the compiler backend. But the compiler backend is structured to be very portable and independent of the target hardware. Most of the backend is independent. And that's another reason why it is the way it is. And it is not diverging. The code that is specific to each hardware architecture is actually quite small.

Starting point is 00:51:10 But that restricts us in ways that I think would be good to lift the restrictions. For example, the kind of optimizations and transformations that are performed right now in the compiler backend, they're statically defined and there is basically no mechanism for changing them or adding to them, which is one of the problems that we've encountered recently with feedback-directed optimization work. So that's all about, in some sense, the structure of how the compiler itself is built and how that affects the work. I'm wondering if there's anything about the surface language that affects how you think about the cogeneration process. Do OCaml programmers generate different patterns that require different optimizations than you might see in a language

Starting point is 00:51:54 like C or C++? I'm probably not the right person to ask about it, right? But on that subject, I think that, again, the way that the OCaml compiler is structured with most sophisticated part of it is the type checking and the optimizations like inlining. This is necessary in order to lower the higher order functional language that it is down to assembly and produce efficient code. And having that in place lets programmers write their code in a natural, more natural way than they would have to without the support, right? And this is very powerful. And it has also been a subject of work at Chain Street, which I was not involved in, but I get to learn from it and benefit from it as a user for Camel as well.

Starting point is 00:52:45 The work on improving the middle end, as it's called, the optimizations that are happening. And actually, let's demystify that terminology. Front end, middle end, back end is like standard ways people talk about different parts of a compiler. Can you say roughly what each of those means? You can think of a compiler as applying a sequence of transformations, starting from source code and changing it gradually into machine code. These transformations are divided into three stages called front-end, middle-end, and back-end. Usually front-end performs things like parsing and type checking, and back end performs optimizations that are specific to a given hardware, and it knows about

Starting point is 00:53:31 the exact registers and the pipeline design. So the front end, which takes as input the source code, is specific to the programming language that is being compiled, for example, OCaml. The backend, the third stage generates a machine code or assembly code targeting a specific hardware architecture so that it's specific to the target architecture. For example, the OCaml compiler can generate code that runs on AMD64 Intel hardware using one backend and a different backend to generate code for ARM hardware. And it has a few other backends for different hardware architectures. So the same front and middle end of the compiler can be used with different backends for the same source language. And similarly, there are compilers that can take as input different source languages. For example, LLVM framework can take as input C++ and C using one frontend.

Starting point is 00:54:34 It can also take as input Swift or Rust using different frontends. And then reuse the other two stages as they are. So this enables the compiler infrastructure to reuse a lot of the source code while supporting different source and target languages. The second stage of compilation, the transformations, can be independent of both the source language and the target language. And different analysis can be performed and reused. So because the first part is called front end and the third part is called back end then the middle part needs to also have end in

Starting point is 00:55:14 its name so we get the middle end of the compiler which i can never say without laughing but yes right and the middle end is where things, at least in OCaml, where things like inlining occur. And so that's what you're talking about, of taking all of this complex, higher-ordered functions, calling functions, being passed functions, and all of that, and getting it flattened out to a simpler representation that's more suitable for code generation. And the code generation is the part you've been focusing on, and that's the back end of the compiler. So one of the things, by the way, you mentioned is there's relatively small amounts of code that are dedicated to particular architectures. I'm curious actually how well

Starting point is 00:55:53 adapted you think OCaml is to ARM, right? You have a lot of experience working in ARM and ARM is an increasingly important architecture. It's a very good architecture for low power utilization. In fact, I think Apple recently announced that they're going to switch to using ARM for their desktop machines. So that's a big step. How good of a compiler do you think OCaml is for ARM at this point? I have not used OCaml on ARM. I have no experience with it. I've seen the code in the ARM backend to some extent, and to me, it looked like it would not be as good a fast code as, say, GCC generates. But that is a very difficult target and also not a fair comparison with garbage collector. But even without any allocation involved, I think that the ARM backend of OCaml has not received as much attention from what it

Starting point is 00:56:47 looks like from the code. So they are missing things that are specific to ARM. Well, again, it depends on what kind of ARM microarchitecture is being targeted. Some of ARM microarchitecture, low power designs are in order execution. So there's less reordering that the hardware does. So it's more important, for example, to do good scheduling of instructions at the block level. And there is a pass for doing scheduling of instructions in the OCaml compiler backend. It's not used for AMD64.

Starting point is 00:57:18 It seems to be generic for ARM. So all of ARMv8 get the same scheduling decisions or of ARM V7 target CPUs will get the same scheduling decisions, which is not always the best. The use of registers for ARM backend would be very important. So improving register allocation because using different registers with different instructions has restrictions and sometimes have different costs. And there is a way to encode it in the ARM compiler, but I think it's not as flexible as one would like in order to get the good performance.

Starting point is 00:57:56 Those are just the things I've noticed and came to mind now. I wonder if this is an area we'll start to care about more as the CPU ecosystem changes. Because power matters to us too, it turns out. And it's interesting to think about, like, Intel has had a kind of monopoly on servers for quite a long time. And we care much more about servers. You know, at Gainsuite, we don't really write much code that's targeted to, say, mobile phones.

Starting point is 00:58:18 But it's possible that we'll see the server world move as well. And that would probably change our priorities a fair amount. I think the comments about the register allocator also applied to the AMD64 backend where we know that the code generation is particularly simple-minded when it comes to registers, even though there is a good register allocation algorithm, but the end result often can be improved. But there's something that I think is a big investment. And I think in order to know that we will get the benefits we expect from it, we need to have some examples or some idea of why we expect this to be better

Starting point is 00:58:55 in terms of the faster code at the end for our systems. And that's what I really like about being in a place like Jane Street, because I have access to these programs and applications that matter, that I know matter. And people who care about these applications write good benchmarking infrastructure for them. And to me, as someone who works on the compiler, that makes it easier to test different ideas

Starting point is 00:59:20 and just get feedback straight away and maybe get impact faster when I improve my things or cause more trouble faster. So recently I made a small change to the compiler, which was an optimization, but because I misunderstood something about the internals of a camel compiler, it introduced a miscompilation bug and it was only noticed because one of the tests in one of the systems failed. It's not a very common situation so I don't

Starting point is 00:59:52 think there was any damage done for systems running in production and there was a test that caught it but it was a test that we didn't routinely run and it was very scary because I broke the compiler. And then I had to watch all these people who have very important things to do,

Starting point is 01:00:10 spending their time rolling back their applications and talking and communicating and synchronizing on how to avoid hitting that problem. So this just shows how important compiler work is and how much impact any correctness mistake makes. Even if it didn't break any running system and production didn't cause problems, it took time of these people to move on. And I don't want it to happen. Obviously, I did my best test, but I didn't test it enough. And I think one of the things about the culture at Jane Street and what I really like is that

Starting point is 01:00:43 there is a process or I think it's not a process, it's individuals. And as a culture, the mistakes are taken as, OK, we'll fix it. And then how can we learn from it? Why did we not run this test? And then two weeks later,

Starting point is 01:00:56 we have a system to run these tests as well. And also something about the compiler esteem. How come we didn't catch this before? It's happened before to other people. How do we change? How can we improve the compiler intermediate representation or how can we improve the compiler to prevent this kind of mistakes happening? Then we go and sit, oh, that's what would need to be done to do it properly. Okay. So now we know this is something

Starting point is 01:01:19 that would be nice to be done. When do we do it? We'll figure it out separately. We need to prioritize. But this whole process, I was so relieved to be part of this process, even though I made this mistake and released a broken compiler. But it was a learning experience and a positive one. And I think reassuring. I'm glad that came across as a positive experience because it's something that we try and keep baked into the culture kind of throughout the organization. So I want to jump back to a thing that you said a moment ago. You were talking about feedback and particularly getting feedback from benchmarks and being able to kind of see the

Starting point is 01:01:54 performance impact of the code you make and help that guide the optimization decisions that you're making. So one of the challenges that we have with working on OCaml is we do not own the language. The language exists outside of our walls. It was initially developed by the research team at Inria that created it. And they continue to have a very deep role there. And in the end, we have to convince upstream maintainers about changes that we want to make. And one of the challenges there is there's lots of things that we see internally that are easy to see from the inside and harder to see from the outside. And I'm curious, what are the things that we can do to try and explain to Upstream more effectively why we're making the changes that we make and being able to kind of surface the motivations

Starting point is 01:02:41 in a way that is compelling to the core team that maintains the language. We see many examples internally where we see a way to improve them. We see that, for example, like with FDO or with register allocation, there are ways to improve. We would like this to be accepted upstream, but the motivation for it is very internal to Chainstreet. So for these examples, the relationship that Chain Street developers and compiler core developers have

Starting point is 01:03:09 and their participation in the upstream account developers community, being part of it, it really helps because there is some trust that's being developed between the two. And a very technical conversation happens between people outside of Chain Street and there is a lot happens between people outside of Chainstreet.

Starting point is 01:03:26 And there is a lot of collaboration also outside of Chainstreet in the core or camel developer community. But there is no automatic acceptance to anything that comes from Chainstreet. If anything, it's the other way around. If you only need it for Chainstreet, then it's probably not going to be a good feature for the general purpose language and the community compiler. So I think the best thing that could happen is finding other users of OCaml and sharing our tools with them and developing tools that help us measure code performance and evaluate different metrics and then sharing these tools upstream externally. It can be open source just as the compiler is and other libraries that Jane Street open

Starting point is 01:04:15 sources and then give these tools to other users of OCaml. And I think there is a growing community of high scale users OCaml, even outside of Jane Street. And with these tools, this will encourage us in the community and will get us feedback from the other users of OCaml, for example, at Facebook and other companies that use OCaml industrially, but also from the researchers who use it in academia and perhaps are more used to trying out new experimental features. Now, to make that happen, there is already Opam and there is already Dune that works externally and makes the process easier. But there is, to my knowledge, no place to share

Starting point is 01:05:01 results. I think it happens a bit on OCaml Discuss, but maybe a different channel for communication of this would help and a more open conversation between the core developers of OCaml and the users who are not part of that small group. Yeah, I suspect this is a place where you probably just need to build some better tooling among other things to kind of share benchmarks. And there is some work in this direction. So OCaml Labs has done a bunch of work for preparing for multi-core OCaml, which is now being pushed very aggressively to the point where for the first time, I feel like it's kind of insight that we'll actually have a version of the compiler that's released with a multi-core garbage collector. We have this repo called Sandmark, which is a place where they have a bunch of benchmarks

Starting point is 01:05:46 all together and they use that for evaluating. And the idea that we can try and leverage that for other kinds of optimizations that we're doing, I think is a promising one. But yeah, I think the communication is critical and hard. And it's especially hard when some of the problems just have to do with scale. And it's a scale that other people may not see, right?

Starting point is 01:06:05 I think we release a surprising amount of open source code. We have like a half a million lines of our internal code is released on GitHub and kind of updated a couple of times a week. And that helps because any problem that you can isolate to that code base, you can actually demonstrate it externally. But FDO is a good example where some of the strongest results for FDO improving the performance of things has come with very large executables with lots of dependencies. And those aren't things we tend to open source, in part because it's internal stuff

Starting point is 01:06:39 that's not of any interest, and in part because we have some kind of proprietary intellectual property reasons to not want to release it. And so it's at those things that operate at larger scale that I feel like is some of the most challenging things to demonstrate effectively. I think your point about if it looks like it's an optimization which really just affects stuff inside of a company like Jane Street and has no other use elsewhere, Upstream is going to be a lot less excited about taking it. I think the core for this objection is in the approach to code generation for OCaml. The backend is not, it is important for it to perform,

Starting point is 01:07:14 but it is not perhaps the top priority. Portability is maybe a higher priority than that, portability between different systems or being able to perform well on range of different input programs so it's something that also needs to change not just in the in the communication but also in what is valued by the community and And I understand the resistance to introducing new obscure features. I mean, if you look at what happened to LLVM, where it's a lot easier to contribute code and add passes, then it's a lot harder to keep the compiler uniformly reliable and predictable and correct.

Starting point is 01:08:03 Or on the other hand, maybe these two are the extremes, and maybe there's some point in between. So if there was more infrastructure in the OCaml backend that would enable people to add their own passes, optimizations, transformations that benefit particular code patterns that they care about, and more flexibility in enabling such passes, then perhaps it would be easier to do. Yeah. This is maybe a case where modularity is just a thing that can help. I think one of the things that upstream is afraid of is they're worried about a lot of complexity being added to the compiler and then it being their responsibility

Starting point is 01:08:38 to maintain it. Because I think both people at Jane Street, but also the other members of the core team feel an enormous amount of responsibility about the language. Everything that's in there, they think of themselves as responsible for. And they think of it as a collective tax on the overall community to add new functionality. And so making it so that you can maintain some pieces of functionality on the side where it doesn't become part of the shared ownership, that's, I think, a valuable move. I think another thing that I think can also help is having more modularity makes it easier to do experiments where maybe in the long term, you do want to contribute this back upstream. But in the

Starting point is 01:09:13 meantime, you'd like to make it easy to maintain a patch set that modifies just this one part so that you can try it out and see how effective it is and iterate quickly on the technology so you can get to a point where you can propose it up to them and say, hey, we've done this work and look, we can demonstrate a lot of benefits from it. It's really worth taking on the complexity, but making it easier to do that kind of experimentation, I think, is critical to basically being able to do that at a larger scale. Indeed. And also getting feedback, thumbs up from other users of Recaml would be easier if it did not require them to change

Starting point is 01:09:50 their build system in order to try a new feature. Here we're talking about feedback-directed compiler construction. There we go. Different kinds of feedback. Okay. Well, I think we're at this point out of time, but thank you so much for taking the time to talk with me about this. It's been a lot of fun. Thank you. You can find links to some of the things that we talked about, including Greta's work on feedback-directed optimization, as well as a full transcript of the episode at signalsandthreads.com. Thanks for joining us and see you next week.

Your Ad Here

Signals and Threads - Compiler optimization with Greta Yorsh

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.