Embedded - 193: Axiomatically Did Not Happen

Episode Date: March 29, 2017

Owen Anderson (@OwenResistor) joined us to talk about how compilers are written. We discussed LLVM, intermediate representation, clang, and GPUs. As mentioned at the end of the show, Owen’s curre...nt employer is hiring. If you are interested and would like to get the brownie points that come with being a friend of a friend, contact us and we’ll connect you to Owen and he’ll submit your resume. Recent books Owen mentioned: Manager Path, Feminist Fight Club, The Laundry Files seriesby Charles Stross. LLVM Language Reference Teardown of what goes into rasterization What Every C Programmer Should Know About Undefined Behavior

Transcript
Discussion (0)
Starting point is 00:00:00 Welcome to Embedded. I'm Elysia White with Christopher White. Remember how we keep mentioning Clang? And that time we tried to explain what happens to your source code before it runs on your hardware? And when I dumbly asked, what is LLVM? And everybody who'd been naughty knowingly looked a little grateful that someone finally asked. Well, it is time for an expert. Our guest this week is Owen Anderson. Hi, Owen. Thanks for being on the show this week.
Starting point is 00:00:35 Hi, Chris. Hi, Alicia. Could you tell us a bit about yourself? So I am a compiler engineer by background. I got involved in the LLVM project from the open source side when it was a very, very young project. I think that was 2006. And that was while I was still in school and ended up becoming a professional compiler engineer where I worked on LLVM and the Clang C and C++ compiler, sort of through the birth and then lifecycle of that, porting it to different architectures,
Starting point is 00:01:12 and then eventually started working on it, targeting it towards graphics processors. I also studied computer architecture in school, and while I'm not a computer architect, I have worked with a lot of computer architects. All right. That leads to many, many questions. Hopefully many answers.
Starting point is 00:01:36 Before we get to those questions, Lightning Round is finally back from vacation. And so we're going to ask you a bunch of short questions that are relatively random and fairly meaningless. And we want short answers, which could be random or meaningless, whichever, you know, it's fine. Christopher, do you want to start? No. All right. So, Owen, do you start many projects and only finish a few, or do you only work on a few projects and finish most of them for personal projects my house is littered with unfinished projects all right uh tabs or spaces spaces two spaces oh anticipated the next question there's only one right answer preferred voltage
Starting point is 00:02:22 i i love i'm obsessed with the idea of everything being low power so i want to say some very low voltage but practically all the stuff i have is five volts favorite animal um no idea my daughter's obsessed with bats right now. Oh, that's a good one. Favorite by proxy. Yeah. Well, we talk about them a lot, so. Was there a movie or book that you encountered for the first time in the last year that now is a favorite? Fiction or nonfiction? Whatever. It's open.
Starting point is 00:03:02 Oh, okay. I can do one of each. So I listen to almost all my books on audiobook. I also listen to The Laundry Files in the past year, which was new to me. Oh, yes, by Strauss. So probably not. Yes. Charles Strauss, yes. Which was incredibly nerdy and awesome. In nonfiction, I read Feminist Fight Club,
Starting point is 00:03:21 which was really great about fighting sexism in the workplace. And I actually just read The Manager's Path, which is the first book about management I've ever read that wasn't complete garbage. So. And while you've been a compiler engineer for a long time, you are now a manager, right? Yeah, I've been a tech lead or engineering manager for a few years now. That grew directly out of my compiler engineering. I was an engineer and then a senior engineer and then sort of the tech lead on the project
Starting point is 00:03:51 and eventually became the engineering manager. Do you want to continue? Yes, sir. No, that's fine. If you could have dinner with anyone in the history of the world, who would it be? I'm going to cop out on that one. I'm actually, I have an interest in history because my father was a classicist and a historian, but I'm more interested in like social history
Starting point is 00:04:12 rather than a sort of great man view of history. So I tend to be interested in all these like periods of history that you never learn about in school. So I don't know if I could pick a person. There's plenty of time periods I'm interested in. I'm interested in the dark ages and all these things that you just don't read about in a history book very much. Fair enough. All right.
Starting point is 00:04:33 Last one. This one should be easy. Have you ever touched a penguin? I don't think I've touched one. I've seen them, including not behind glass, but I don't think I've touched one. Are you expecting anyone to answer this question in the affirmative? I'm hoping someday. For the record, they stink. I liked that last week Terry asked stuffed or live, and then the answer was no to both. So I don't know why we needed the clarification. Thus ends lightning round. Yes.
Starting point is 00:05:08 So let's get on to compiler engineer and what that means and what compilers really are. How many of you are there? Way more than you would think. Really? Really. It seems like a fairly rare thing. I mean, you make a compiler
Starting point is 00:05:22 and everybody uses it and then it's done. It's one of those tasks that's never really done. You can always make it better in some way. And it has certain characteristics as a sort of niche within engineering that appeal to people with certain mindsets, where it's a very sort of well-specified problem, right? You have to take in a particular input and produce a particular output. There's no non-determinism. There's no timing aspect. There's a lot of math
Starting point is 00:05:47 involved in certain parts of it. You're right that it is a very niche topic within engineering and people specialize into that. Some of it may be just that because I've done it a lot, I now know lots of compiler people.
Starting point is 00:06:06 There's lots of us sprinkled around. Okay. So I write something in C, C++, Python. No, no, no, C or C++, the compiled language. And what happens to it before I download my binary, my.out or or my srec to my hardware? A huge amount of stuff. First, you build the car, and then you drive it. No, it's... I mean, generally, we break a compiler down
Starting point is 00:06:36 into sort of three phases, which are not very meaningfully named, like front-end, optimizer, and back-end. The front-end would generally be the portion of the compiler that deals with the language-specific portions of your compiler. So you have a C front-end or a C++ front-end or a Rust front-end or whatever language you're writing in. It's the part that's responsible for giving you error messages
Starting point is 00:07:05 and working out all the types in your code and things like that. And the preprocessing, right? Sorry. The preprocessing? Not just preprocessing. So, for instance, C++ has a fairly complex semantic analysis system once you take into account templates, right? Something has to expand out all those templates,
Starting point is 00:07:29 plug in all the types, figure out does your program type check correctly? Did you leave out a semicolon somewhere? Okay. So front-end runs, it generates a representation of your program, usually a representation that's somewhat reflective of what the code you wrote looked like. And the second phase is the optimizer. Optimizers
Starting point is 00:07:51 generally perform sort of language and machine independent optimization. So things along the lines of, you know, you're computing some values and never using them. Oh, let's just not compute those at all. And there's, you know, a giant laundry list of all these sort of machine agnostic optimizations. And then finally, the backend component maps your program onto the processor you're actually targeting. So there are some intermediate things that happen. There's IR, the intermediate language that goes from C into a completely different language. Why? Why wouldn't it go straight to assembly? Well, so very old compilers did. All the sort of original 70s compilers are what we would now today call single-pass compilers or splat compilers,
Starting point is 00:08:45 where they read in one line of code and immediately wrote out the assembly to implement that line of code. Oh, no, that's clearly inefficient. You definitely need an optimizer. Yeah. The reason is that code, or even a sort of syntax tree, a data structure that represents code directly, is a fine way for humans to understand what's going on. It's not a particularly great representation for the compiler to actually do transformations on. It makes things harder than they need to be in terms of the mechanics of the compiler wants to make these changes, how hard is it to actually make that change. And also in terms of analysis, a lot of compilation and optimization in particular is the compiler has to demonstrate, is this fact definitely true?
Starting point is 00:09:37 And programming languages that are built for humans tend to make that much harder than it needs to be. Like, what fact might be true? So I can give you a very specific example of an intermediate representation aspect from LLVM, which is pretty widely used in a lot of compilers now, which is called SSA form. It stands for single static assignment form. I always get the two S's backwards.
Starting point is 00:10:10 In C, you can have a variable I. You can assign to it multiple times. You can say I equals 1, I equals 2, I equals 3. You can have, you can assign to it in one side of an if and then assign to it again in the else side. And this actually makes it really complicated
Starting point is 00:10:26 for the compiler to reason about, well, maybe one of those assignments wasn't needed because you never read it again after writing it. But if they're all called i, it's kind of tricky to reason about that. So most compilers use a representation where they would actually break those up into different variables. You would have sort of I0 and an I1 and an I2 that were all sort of the different assignments of I, so that each one only gets assigned to once you don't have this
Starting point is 00:10:57 overwriting, which then makes it very much easier to reason about because you can say I2 is equal to 2 and nothing ever uses i2 so i can get rid of it so the main value of the intermediate representation the ir is the acronym people use the main value is in optimization is that right generally speaking yes mean, it also enables a significant amount of reuse across different compilers, different languages, and different machine backends. So it turns out that writing an optimizer is fairly complicated. It's an ongoing process of many years, right?
Starting point is 00:11:41 And it's fairly detail-oriented, right? You have to write a lot of little pattern recognizers that say, oh, I recognize this little pattern of you do X, we should simplify that into Y. And that accumulated knowledge and that collection of pattern recognizers is very valuable and you don't want to waste it. It seems like it would also be a big advantage when you're targeting a vast array of different processors to have that intermediate step because then it's a shorter distance
Starting point is 00:12:12 between that and assembly necessarily. Generally, yes. So this sort of bleeds into an analogy I often make, which I can't take credit for. A co-worker originally came up with this, which is that you can think of the process of compilation as sort of this mountain. And you start at the beginning going up the mountain in a process that we call canonicalization, which is there's many ways to write the same program in terms of what the program does. If you're writing a factorial function, you can write it recursively or non-recursively and all these things, right?
Starting point is 00:12:46 Canonicalization is the process of trying to factor out all those differences. Trying to collapse all those programs that are equivalent in terms of what they do so that they all become the same representation. And then you go up the mountain, you're canonicalizing, you're canonicalizing, you reach enlightenment somewhere at the top
Starting point is 00:13:04 where it's perfectly all-factored. And then you start going down the mountain, you're canonicalizing, you're canonicalizing, you reach enlightenment somewhere at the top and where it's perfectly, perfectly all factors. And then you start going down the other side of the mountain, which is in the process called lowering, where now that you have sort of your ideal representation of the program, you want to lower it towards the way that it is best expressed for the machine you're targeting towards. And real compilation pipelines are not one mountain. It's actually like, you know, peaks and valleys. Well, yes, because if I want, I mean, determining canonical representation, I mean, what you're doing there is trying to read my mind through my code, which is an imperfect representation anyway, and now you're making assumptions about it. Not assumptions. So compilers have sort of this core mandate
Starting point is 00:13:52 that they have to always be safe, right? The compiler can't break your code. And, of course, what that actually means is specified in the C or C++ standard, which nobody actually reads and nobody actually understands, which then leads to many compiler engineers spending many hours having to explain it to people when their code doesn't do what they think it should do. The compiler, one of the reasons people often wonder why do compilers take a long time to run, especially in high optimization levels, and that's because the compiler doesn't get to make very many assumptions about what you mean. If it wants to do something, it has to prove that it can do it, which means it
Starting point is 00:14:29 has to consider all these different cases, has to reason about, oh, if it took exactly this pattern of control flow, could it violate some assumption I want to make? And that kind of theorem-proving aspect is usually what ends up being the big time sinks. Wouldn't it be more efficient for everybody if I just wrote my code in the intermediate representation? In principle, maybe. In practice, it's not generally very pleasant to write in. You can actually look up LLVM's intermediate representation. It's pretty well documented.
Starting point is 00:15:07 There's a document called the language reference. It's not too bad to read. It's a pain to write, is what I would say. You can also play with it on Compiler Explorer. If you use one of the Clang compilers and you add the command line option dash emit dash LLVM, it will actually show you the intermediate representation. And it's fairly readable.
Starting point is 00:15:29 It looks kind of like an abstract assembly language. And that's completely different from what GCC does, for example? GCC has analogous things inside of it. They don't have a nice readable form that they can easily print out. The general architectures have a lot in common. You've mentioned LLVM,
Starting point is 00:15:55 which used to stand for Low Level Virtual Machine, but now it stands for LLVM because that acronym doesn't have any real application to what it is anymore. Yep. We used to get a lot of questions about what is this LLVM thing and how is it related to a Java virtual machine? And everyone got fed up with that.
Starting point is 00:16:19 So what is LLVM? LLVM is an umbrella project for a toolkit for making compilers. One of the big differences between LLVM and GCC is that LLVM is structured as a set of libraries that you can mix and match, that you can build into your own software, and you can put them together to build compilers. The most well-known example is the Clang C and C++ compiler, but it's also used in lots of other compilers. Rust uses it. There's a D compiler based on it. Apple's Swift compiler uses it.
Starting point is 00:16:59 Lots of GPU compilers are based on LLVM. And it provides a lot of the components you need at various parts of the compilation pipeline, front-end optimizer and back-end. And one more time, wouldn't it be more efficient to write the assembly from the C code? I'm just really stuck on this.
Starting point is 00:17:22 We go to somewhere else. It depends on what you want. It'll certainly get you faster compile times, but the problem will certainly get you much worse code quality. So it's really all what you're optimizing for.
Starting point is 00:17:39 Some people want really fast compilers, particularly for their debug builds. I mean, I guess the thing is, you have to do this even if it's implicit, because you can't directly optimize C code. You have to keep a bunch of state around.
Starting point is 00:17:56 So if you're going to keep a bunch of state around, why not just make a separate language which you can use as a scratch pad to keep all of that state around and then optimize that? So I think maybe C is a little bit which you can use as a scratch pad to keep all of that state around and then optimize that. It seems inefficient. So maybe C is a little bit of an example that makes it a little more confusing, because C is a pretty low-level language to start with.
Starting point is 00:18:16 I would pick on something like Haskell, which is a compiled language as well. It's a functional language. It's pretty high-level. Haskell is the Haskell compiler, the Glasgow Haskell compiler, which is sort of the standard one, has an LLVM-based backend. And they do
Starting point is 00:18:33 multiple levels of you know, to think of that mountain that I described, right, they have sort of multiple of them, right, where they have a representation that's closer to the Haskell language, and they perform optimizations that are sort of derived from Haskell-y kind of knowledge. And then they translate it into LLVM, and they let LLVM worry about all the low-level, you know, simplify my divides into shifts kinds of things.
Starting point is 00:18:59 And it's sort of a separation of concerns kind of problem in that, you know, in Haskell, you have a lot more knowledge. You know, if you have a representation that's very close to the Haskell language, you have a lot of sort of implicit knowledge where you know that everything is a pure function. You know there are no side effects or you know there are only side effects within the IOMONad and things like that. But it's hard to do little fiddly, bit-twiddling optimizations directly on a representation that's very close to Haskell. When you convert to LLVM, you're trading benefits, right?
Starting point is 00:19:40 It's no longer as easy to reason about those sort of mathematical properties of Haskell, but you've opened up the ability to optimize at a finer granularity. It's all about trade-offs, really. Different representations of the program may be easier or harder to perform different kinds of analysis and different kinds of transformation on. And so most real compilers go through several different representations of the program. And I guess the thing that might have been bouncing around your head, Alicia,
Starting point is 00:20:14 and came to my mind is, oh, okay, well, why not just optimize the assembly and use that as the IR? It's because you've got tons of different architectures, and you'd have to rewrite your general-purpose optimizer for every single architecture. Exactly. So one of the things that people are very interested in with C++, for instance, is optimizers that can eliminate virtual calls and turn them into direct calls. And you really don't want to have to re-implement that for x86 and ARM and MIPS and every other target you could possibly care about. Are the compiler or are the instruction sets for the different processors all that different?
Starting point is 00:21:00 I mean, I look at the assemblies and some of them are quite different, but does it really matter? It matters, but yeah, that's actually a really complicated question. It definitely matters, but it matters both more and less than people think, and it depends on who you ask. I think the way I think of it is there's sort of a sweet spot in instruction set designs, this instruction set design space, where you want your instruction set to be sort of complete enough and dense enough um that you're not just wasting instruction size or decoding bandwidth um you want it to be sort of sane enough that the compiler doesn't have to struggle with generating good code for it um And this is where we start getting into very long instruction words. Yes.
Starting point is 00:21:52 Whether we have lots of assembly instructions with a few modifiers, or if we have a few assembly instructions and you just have to do a lot of them in order to get to where you want to go. Yeah, or going on the reverse you go like very uh risky where you have only very very simple instructions but then you have a lot of them um i like i said i think there's sort of the sweet spot in the middle and as long as your architecture is somewhere in that sort of middle region it doesn't make a huge amount of difference like there will be obviously there will be there will be difference within the compiler, because it needs to worry about whatever little quirks your processor has. But in terms of will the compiler be able to generate something
Starting point is 00:22:32 reasonably good for my processor, you know, if you stick to that middle region, you're probably going to be fine. And so many do stick to that region, because it is sort of the most obvious, it's the one you fall into unless you have a good reason not to. Yeah. So I will, so like to give some examples, right? So VLIW,
Starting point is 00:22:53 I think everyone knows why VLIW, very long instruction word is kind of a pain. Well, I guess I don't know if everyone listening knows about VLIW. VLIW is this idea of having instructions that are grouped by the programmer or by the compiler that are guaranteed to be independent of each other so that they can be run in parallel.
Starting point is 00:23:14 So if you have an ALU or you have three arithmetic units and you have maybe a floating point unit and a memory thing and a memory thing. And a memory accessor thing. Now you can have a very long instruction that says, add these three separate, entirely not dependent things as they need to be added and access this memory
Starting point is 00:23:41 and do some floating point operation. And that's all one instruction word because we are in very long instruction word land. And so if you wanted to do something else, you're going to end up with an entirely different word. I like to think of it as German where you end up with 29 consonants all in a row. I like that analogy. Yeah, VLW is painful for everybody who has to program it. VLAW is fundamentally a trade-off to make your hardware simpler and make your programmer's lives more difficult. And often the programmer ends up being really your compiler engineer.
Starting point is 00:24:17 VLAW is great for certain application domains where you really need to get peak performance out of the minimum number of transistors. But you are making a trade-off that you're going to make the compiler's life very, very difficult. And then on the opposite end, right, we have very risky designs like MIPS or more recently RISC-V. You have a different set of problems where the instructions are so simple that there's almost nothing the compiler can do to help you out. Like, there's no difference between good code and not good code. So you have to make sure you're building very good CPUs or else you're just going to end up with very poor performance.
Starting point is 00:24:58 I forgot what the question was that started me off. I mean, it was risk versus very long instruction words. And why does it matter? I mean, you're in this intermediate representation. Well, and there's the other axis, which is complex instruction set. The Intel side of things where you have the Windows instruction.
Starting point is 00:25:19 I'll pick on RISC-V a little bit. And I don't want to be too mean because I actually know a bunch of people who work on RISC-V and little bit. And I don't want to be too mean because I actually know a bunch of people who work on RISC-V and they're all very smart people. But I'm going to pick on it a little bit anyways, right? RISC-V is a very,
Starting point is 00:25:32 if people don't know about it, it's a new open source instruction set architecture out of Berkeley. And they're building open source processors based on it. It's very, very RISC-ish. You can see the sort of heritage through MIPS in it. You know, it doesn't have any kind of complicated addressing modes. It has a very simple instruction set,
Starting point is 00:25:57 which is great for teaching and great, very easy to learn. It also means like the compiler can't do a lot to help you out in terms of code quality. And it gives rise to this sort of conundrum, right? It also means the compiler can't do a lot to help you out in terms of code quality. And it gives rise to this sort of conundrum, right? If you look at performance of applications, most applications are going to be performance limited by some small number of relatively small loops in the program. And if your loop expressed in RISC-V is two times as many instructions because every load has to have an accompanying add
Starting point is 00:26:28 in order to do the addressing mode, then now your processor needs to have twice as many instructions per clock just to hold pace with one that could represent that loop in half the number of instructions. And a 2x increase in instructions per clock is a big increase for a CPU designer. Yes, yes.
Starting point is 00:26:51 We need you to go twice as fast. So there's sort of a sweet spot in there. You look at like ARM or PowerPC or even x86 to a degree, right? Modern x86. Most of the instructions are relatively simple. And then you have complicated instructions only when they're going to provide significant value.
Starting point is 00:27:13 So, you know, everyone sort of agrees that having a reasonable set of addressing modes is pretty good. Everyone agrees that having a pretty rich set of shifting and bitwise operators is good. Things like that. If you're missing some of the, you know, there's little islands where you really want to have a few sort of complicated instructions just to make sure that you're not going to be quadrupling the size of some tight loops. But otherwise, being riskISC is just fine. Okay.
Starting point is 00:27:46 So RISC, did we already say it stands for Reduced Instruction Set? Yeah, RISC stands for Reduced Instruction Set Computer. And it was, I think it originated in the 80s. And ARM Cortex-M, which I know that's not what everybody uses, but it certainly is what I've been using a lot of, are considered RISC processors. ARM is another one of those that started as an acronym. It's no longer, as far as I know. It's the ACORN RISC machine. Exactly.
Starting point is 00:28:18 So do you work much with the embedded processors? Not specifically. I mean, I worked on ARM a significant amount, and I was one of the two engineers on the first 64-bit ARM functional compiler. It's a little lazy because there were multiple compilers coming up at once, but we were the first one to boot a kernel. But that was all actually application processors, not MCUs.
Starting point is 00:28:48 I have programmed MCUs a little bit on my own time. That's not a requirement. I just was wondering into what parts of LLVM did you work on? So I've worked on the optimizer, particularly sort of earlier in my career, and then I moved more into the backend part of the compiler, working specifically on retargeting LLVM to new architectures, to 64-bit ARM, and then later to GPU architectures.
Starting point is 00:29:16 The GPU architectures. I usually think of CUDA. Or OpenCL. Yeah. architectures uh i usually think of like cuda or open yeah so is that what you're compiling or is this a general c compiler um so what what i guess are you working on the back end for that or yeah so i worked on the back ends that would be sort of the backing for all of those okay and not just those actually our primary clients were not any of those. They were actually games and graphics programs where you write small programs called shaders in a shading language like GLSL or HLSL.
Starting point is 00:29:57 And you did this for Apple or Google or one of the big giant names. Yeah, I worked for Apple. Why do for-profit companies support LLVM? It seems like, aren't they giving things away that they should be charging for? It turns out that there's actually no money in compilers, generally speaking, or no money that's big enough for Apple or Google or anyone else to care about. Even Microsoft has moved towards giving more and more of their tools away for free because the value to all these big companies is in getting developers to build applications for their platforms, for Mac or iOS or Android or Windows or whatever. And even for the hardware vendors,
Starting point is 00:30:49 their motivation is to get people to write applications for ARM or PowerPC or x86 or whatever, right? The amount of money they could make from charging people for a compiler is a pittance compared to selling, you know, 10 times more iPhones or Android phones or x86 processors or whatever else their goal is. So it'd be real nice if ARM came out with a free... Well, that's a different...
Starting point is 00:31:18 They have a compiler they charge for. So why... This is a different discussion. Why are we still paying anybody for compilers? That's the question. I mean, that's a different discussion. Why are we still paying anybody for compilers? I mean, that's a great question. Obviously, I come from the application processor domain where most people don't pay for compilers. There's very limited circumstances where people do.
Starting point is 00:31:38 And it's usually, I mean, there are still some domains where people pay for compilers. In scientific computing, people do still pay for specialty compilers that have very sort of domain-specific capabilities. And I remember Intel still sells theirs. Yeah. Yeah, they sell theirs. I don't know anyone who has actually paid for it.
Starting point is 00:32:03 I think they actually sell that primarily to sort of scientific and HPC users. I believe like Cray charges for theirs similarly. So I would guess that the reason is sort of similar where there's sort of domain-specific knowledge that needs to be applied to build a compiler specifically for that domain. And that's why there are embedded compilers for sale
Starting point is 00:32:28 because there are lots of embedded chips and more than paying for even the compiler or the optimizer, you're paying for being able to deal with the 9,000 different variations. Yeah, that's what I would think. It would be like having these complete index of memory maps and register layouts and which is less about the compiler and more about a very small part of the back end yeah i mean it's probably maybe even not part of the back end it may actually just be a header file somewhere five thousand dollar header file so does llvm work work for ARMs, for Cortex-Ms? I mean, you mentioned ARM64s.
Starting point is 00:33:06 Yeah. So in principle, yes. But there's no... In principle, yes, but you would have to supply your own memory map and register layout. And LLVM historically did not have its own linker, though they're actually working on writing one. And I'm not sure how mature it is for ARM right now. But yes, in principle, you could use Clang to target ARM.
Starting point is 00:33:31 I actually have. I have one of those, the DIP packaged LPC parts. And I have actually written a little program in Clang and then downloaded it onto it. But there's definitely a DIY aspect where you're missing a bunch of the pieces that would make up sort of a full tool chain, and you have to supply those yourself. And it's kind of sad because Clang is so good.
Starting point is 00:33:59 Yeah, you found Clang was nice to you recently, wasn't it, Christopher? Oh, it was just today I was playing with something, not in the embedded space where it was just like, oh, you misspelled this. You misspelled a variable name. It wasn't that you have a type name. It wasn't that you, I don't know what this type is. It was a, I think you misspelled this.
Starting point is 00:34:17 Yeah, I think you meant this. And I actually did that for a variable also, because I had a variable somewhere upstream, and I spelled it correctly, and then later spelled it differently. It's like, I think you meant this. That's so weird. It's just really smart. And some of the and I spelled it correctly, and then later spelled it differently. It's like, I think you meant this. That's so weird. It's just really smart. And some of the static analysis stuff it does, too.
Starting point is 00:34:29 It's just really amazing. So I would love for it to be. We tend to use Clang in our work for Embedded not to target the processor, but just to run it against the code and then throw it away and take its static analysis because it's that good at that. Yeah, I mean, if you're running it on host too,
Starting point is 00:34:48 Clang also has a number of dynamic analysis utilities that are very useful, address sanitizer and undefined behavior sanitizer, which have saved me a lot of pain in my life. So you mentioned undefined behavior, and I know that there are some points in the C-spec where it says this mentioned undefined behavior. And I know that there are some points in the C spec where it says this is undefined behavior. Why is that so important? That is one of the most contentious design points of the C family of languages and the source of
Starting point is 00:35:22 both a lot of optimization and a lot of pain for everybody. Undefined behavior basically is the standard throwing the compiler a bone and saying, you can assume that this thing never happens. This is sort of the exception where I was saying earlier that the compiler doesn't get to assume anything about your code. Anything it wants to know, it has to prove it. This is sort of the escape valve there. It's sort of the standard provides almost, if you think of it in mathematical terms, it's providing an axiom that says, whatever this thing is, axiomatically, it did not happen. And so the way this comes up, people think that compilers go looking for undefined behavior
Starting point is 00:36:07 and then decide to format your hard drive when they find it. That's not how it works, right? The compiler is trying to prove some fact. It says, I want to make this transformation. In order to make this transformation, I need to know that this thing is true. And in order to prove that this thing is true, it needs to consider, don't know some handful of
Starting point is 00:36:25 cases and then it'll you know it'll look at one case and it'll say ah this case if it were true we would be in undefined behavior and so i can just assume that this case is not true it's if you've you ever did a proof-oriented mathematics class, it's sort of saying any situation that would give rise to undefined behavior is just an automatic contradiction, can't happen. You've stunned into silence. A little bit, a little bit, because that seems, I used to hate those proofs because it seemed like you were cheating. I'm going to assume this isn't true. And therefore, if I can prove that what you just told me leads to this, then what you just told me isn't true. Well, there's more to it than that.
Starting point is 00:37:21 But there's some logical connections that, yes. But it is proof by contradiction. So like a very standard example here would be related to null checking. That in a lot of places, the compiler may want to ask, can I prove that this pointer wasn't null? And the C and C++ standards provide various ways that the compiler can determine that. For instance, if you've already dereferenced the pointer, then it can't have been null.
Starting point is 00:37:51 And in fact, that is actually an example of undefined behavior right there. So if you had dereference A and then dereference A again, if A was null, the first one was undefined behavior. And so when you're talking about optimizing the second one, you can assume it was not null, because if it was null, you were already in undefined behavior. Let me ask you kind of a weird question.
Starting point is 00:38:17 Talking about a weird question, and then I have a follow-up. And then I probably have a follow-up to that. What does the structure of the optimizer look like from a code perspective because it sounds like a collection of heuristics and it seems like that could get into just a mess really quickly especially if you're reusing them all the time yeah um so most compilers are broken down into, you keep using the phrase pipeline, and that's actually pretty accurate. They're broken down into a series of phases.
Starting point is 00:38:49 And I already sort of said there were three phases from a coarse granularity, and each of those phases is then subdivided into smaller phases. LLVM makes this actually very explicit. So LLVM in the optimizer has this concept of a pass, which is, it's just an object with a run on function method. So it runs on each function, it processes each function in your compilation unit, it maybe
Starting point is 00:39:12 does some analysis, does some transformation. And those passes generally implement some sort of logical optimization. So examples there might be dead store elimination or de-virtualization or global value numbering, which is a horribly indescriptive name, but it's basically redundancy elimination. If you're computing the same thing twice, don't compute it twice,
Starting point is 00:39:42 only compute it once. And that's also a nice abstraction, and a large part of the compiler fits into that model. There are places where that model sort of breaks down. There's a past in LLVM called instcombine, which is what we, compiler engineers,
Starting point is 00:40:00 would call a people optimizer. That is the place where you just sort of pile in all the little heuristics. Say, like, hey, if you're dividing by a power of two, just turn it into a shift kind of stuff. And that, you know, the last time I looked, Instacombine was many tens of thousands of lines of code. What language is LLVM written in?
Starting point is 00:40:24 LLVM is written in C++. Fortran. C++. Fortran. C++. Okay. You had follow-ons to that. I did have follow-ons. You were going to go on for like an hour. I heard that. So.
Starting point is 00:40:35 So. So the optimizer is a collection of heuristics and phases. And that's very structured. It looks for certain things. Is there a place for machine learning and sort of deep learning? Has anybody looked at doing that? I know that sounds pop science-y.
Starting point is 00:40:56 Many, many PhD theses have been written about trying to get machine learning into compilers. And lots of them are about using machine learning to tune the thresholds for different things, right? Like the decision for should I unroll this loop or not? Okay.
Starting point is 00:41:16 It turns out that in general, compilers is not a particularly great application domain for machine learning. Generally speaking, the way I think about it is that machine learning tends to work well in domains where you have a lot of unstructured data. Where you need a lot of flexibility because the data might change. It might change and, like I said, sort of not structured. Compilers are a very structured domain.
Starting point is 00:41:42 So, like, it's, I'm trying to think of a way to sort of explain this, right? We don't... A compiler doesn't have, like, big buffers full of data. A compiler has a very complex web of objects that are all interlinked and have, you know, connections to each other. You have, like, this instruction, which uses the output of that instruction. And it's much more related to graph theory than it is to, say, numerical analysis or something like that.
Starting point is 00:42:08 Okay. And generally speaking, my experience has been that for highly structured problems, humans are much better at developing solutions than learning algorithms. And the reverse is sometimes true for more what I would call dense domains or numerical domains. I mean, you mentioned that the output needed to be deterministic. Yes, absolutely. People get very unhappy if they run their compiler twice and get different results. And machine learning, I usually think it applies more to things that aren't as deterministic, that require more flexibility.
Starting point is 00:42:44 I mean, machine learning can be deterministic in the sense that once you have a training set, you're going to hope, and as long as you don't change any of the inputs, you're still going to get the same output. But it's harder to understand why. And yeah. No, I'm done.
Starting point is 00:43:02 Okay. I wanted to go back to GPUs. sure uh how are they different from regular cpus oh boy that is a very broad question um so the first thing i would say is that a gpu is not like a cpu is not even necessarily the right comparison for a GPU off the start. A GPU is much more than just a compute core. It's a batch-oriented bandwidth processor. And I'm going to pull out my favorite analogy, which is that mainframes and GPUs have a lot of things in common. right? They're both systems that are designed
Starting point is 00:43:45 to process huge amounts of data in a batch-oriented fashion where you don't care so much about how long it takes to process any individual record in the case of a mainframe or pixel in the case of a GPU, but you care about the aggregate throughput of the whole system.
Starting point is 00:44:06 And part, you know, there's sort of, I think of there being sort of four parts of that, which is a high bandwidth computational core, accelerated input, accelerated output, and conflict resolution. And both of those are systems that have built all four of those into hardware. So I've wanted to use GPUs on some projects. And it's usually when I need to optimize signal processing, maybe for images, maybe for lots of very similar but not quite the same audio channels, lots. Those are potentially good domains for GPUs. They're also potentially good domains for a DSP. Well, and that was what we finally went to was DSPs
Starting point is 00:44:54 because they were simpler to explain to everybody, but also because they were more power efficient for what we were doing. So it made sense. When we think about GPUs, when we do VR or computer graphics or games and there are shaders and lots of things moving and physics engines,
Starting point is 00:45:15 what else can GPUs do? I mean, where are they interesting when the G isn't really for graphics? This is an interesting question, right? So, I mean, GPUs are originally designed to do one algorithm really, really well, which is the rasterization algorithm, which is a total hack that is used for rendering pixels on the screen. It just happens to work really well well so all computer graphics uses it um and gpgpu or
Starting point is 00:45:50 compute uses of gpu are all about finding ways to take existing problems and sort of poke them until they happen to look enough like rasterization that they then run well on a gpu yeah um so like i said i mentioned sort of four four i think of sort of core elements of what makes a GPU a GPU. The one that's easiest for people to sort of grasp is the high bandwidth compute core. Yes. And that's the part that's most, you can explain how that aligns with what's in a CPU, right?
Starting point is 00:46:17 So a GPU's compute core is actually not that different from a CPU, but with certain elements sort of turned up to 11. If you're familiar with vector processing or SIMD processing on a CPU, GPU compute cores are even more so. Your typical GPU is a vector processor with a vector width of maybe 32 or 64 in contrast to most CPUs, which might have 4 or 8.
Starting point is 00:46:50 They're also highly hyper-threaded, or simultaneously multi-threaded. They have multiple hardware threads running on the same core. So you can have one thread doing a memory access while another thread is using the ALU. And then finally, they're also multi-core. Extremely multi-core, right? Extremely.
Starting point is 00:47:11 Yes, though the technical marketing for GPU vendors tends to obscure that. Or they make the numbers seem bigger than they really are. When you look at the way marketing for gpu vendors works they often refer to when they'll say like we have so many thousand cores often what they're saying is they're referring to each lane of the vector as being a separate core so it's 32 times the number yeah yeah and then also they are probably also multiplying by the number of hardware threads per core. Marketing lies. It's all, yeah, marketing is, GPU marketing is interesting.
Starting point is 00:47:56 I mean, it's actually, you know, it's not that complicated a concept once you sort of break it down into what it's made up of. And a lot of these sort of things that people think of as being weird on GPUs make sense once you sort of understand what it's built up out of. For instance, people talk about control flow being expensive on a GPU. That's because you're doing everything on this vector processor. And if you want to have an if statement, the way you do it is you have predication per lane of the vector, and you do both sides of the if statement with the predicate mask inverted. And so what seemed like an if to choose a shorter path now becomes a way to go both ways, a much longer path.
Starting point is 00:48:34 Exactly. I've done a little bit of GPU stuff for a medical device company, and it was an OpenCL. And the hardest thing initially was just getting my brain to switch away from that kind of thinking to okay no i don't have this list of sequential steps i have this extremely wide list of maybe very short things i want to do all at once to uh you know one grid square of the
Starting point is 00:49:01 screen and divide it up that way that the parallelism is such a different way of thinking. And some other people I was working with, they would read that and go, I don't understand how this works. It's like, well, imagine this splatted for us, you know, hundreds of whatever, maybe not hundreds, but marketing hundreds. And they all run at the same time.
Starting point is 00:49:21 And then we get it over a high bandwidth memory interface. But it's still really tricky, and there were times, like you said, with the conditionals that I was thinking, oh, I need to do with it. No, no, I don't need to do that at all. But it was hard to turn your head around. Yeah, I mean, the other thing that people sometimes run into
Starting point is 00:49:39 that's a little strange, I mentioned the term bandwidth-oriented processor, right? So their GPUs are all about running, you hand them a batch of work and they get that batch done quickly, but they don't really try to make any one thread run particularly quickly. What that means is that they are trying
Starting point is 00:49:58 to keep every unit on the chip occupied all the time. They're doing all kinds of crazy swapping threads around, letting one thread run on the ALU while another one is running on the memory unit or whatever, right? And so people will sometimes write programs in a style where maybe you have a kernel that fetches something from memory and then uses that to do another fetch, and they wonder why their program's really slow.
Starting point is 00:50:30 It's because your thread stopped and waited for that memory, and you couldn't run again, and was sitting there tying up space on the processor that it couldn't then use for other threads. It gets kind of complicated. It sounds like the back-end problem for a compiler for a GPU is much different than for a CPU because you have to have it seems like you have to have more architectural knowledge of how the
Starting point is 00:50:57 structure of the thing works rather than just the instruction set. Does that make sense? Or am I misrepresenting it? There's some of that. I mean, generally, when you're compiling for a GPU, you're sort of transforming the... You know, programmers, when they program a GPU, they program in what is sometimes called a SPMD style, which is a really terrible name,
Starting point is 00:51:19 but it stands for single-program multiple data. And the idea is you write your program as if you were programming one lane of the vector machine, and then the compiler sort of implicitly undoes that for you and turns it into a normal vector program. And so
Starting point is 00:51:35 there's sort of that extra transformation that the compiler is doing for you behind the scenes. Also, in contrast to your typical CPUs, GPUs generally don't have publicly documented assembly formats, and they generally change much more frequently than CPUs would. So your compiler can be a lot more tied to the specifics of the piece of hardware you're targeting. That makes it challenging. Yeah, it makes it fun and exciting, right?
Starting point is 00:52:07 As a compiler engineer. Good. You know, it's in contrast to a CPU, right, where it may take years and years and years to add three instructions. On GPU, they can add new instructions and remove old ones in any new generation they want because there's no binaries that have to keep working. they can add new instructions and remove old ones in any new generation they want because no one
Starting point is 00:52:25 no there's no binaries that have to keep working okay that makes sense and it makes it fun and interesting if you're interested in architecture then because you can do all kinds of wild and crazy things okay so i have a question switching subjects subjects that I'm asking you because you sort of asked me to ask you. What's the worst bug you've seen that has been caused by a difference between C and C++? And I am really looking forward to this story. This one is great. And I can't take all the credit for it because I was not the one who debugged this, but I did sit down with Hall from the guy who debugged this. And it involves everyone's favorite feature of C as well, which is volatile.
Starting point is 00:53:11 Oh, yes, yes, yes. Let's talk about this. Volatile actually has a very specific meaning in both C and C++. Everyone thinks that it means basically don't touch my code, which is true to a first order approximation. Sometimes true. Most of the time, if that is your mental model of it, you will be correct. And this is a story about one of those times when it was not correct. So we got a bug report about a device driver that was not functioning properly.
Starting point is 00:53:49 And I don't really know the specifics of what this thing was doing, other than it was something very, very deeply embedded that had been programmed in C++ for no good reason that I understand. It had been waking up from sleep. And for some reason that i do not understand it it woke up before its associated um the like the controller on its d-ram had come up and so it was in a little busy loop trying to manually strobe its ram and the way we've all been there.
Starting point is 00:54:26 I have not. And the way it did this was it had a pointer, which it dereferenced, and then cast the result to void, which is sort of the standard way you avoid getting a warning about an unused variable. And they'd been smart, right? The pointer was volatile. And yet, when it generated the code, there was no load in the generated code. And it had taken them months and months and months to track this down because it was, you know, sporadic memory corruption in the DRAM, right? And it boiled down to a difference
Starting point is 00:55:06 between C and C++ in terms of the rules of what volatile actually means. Volatile, what volatile means in C and C++ is that, you know, there's some sort of sequence of events that the standard says must happen. And volatile says basically
Starting point is 00:55:23 the compiler can't elide or remove or add any events to that list. And in C++, the definition of when those events occur is different. And in particular, for dereferencing a volatile value, dereferencing it and casting it to void doesn't trigger what's called an L-value to R-value conversion, and therefore doesn't actually cause an event, a volatile event, to happen.
Starting point is 00:55:51 You needed to cast it to volatile void. No, you had to actually assign it to something. Oh. And it was, like I said, it was one of these deep, dark quarters of C++. That seems like something that they would harmonize.
Starting point is 00:56:10 I believe they did in C++ 11 or 14. Jeez. But it was just horrible, right? And because the rules of volatile say that the compiler can't add or remove volatile accesses, the standard actually required us not to do the dereference there because we can't add them where they're not supposed to be. And we'll say we made a special one-off compiler to fix that bug. So this wasn't actually a compiler bug? This was not a compiler bug.
Starting point is 00:56:45 We were implementing the specification exactly as written. So how often do people, I know you live in Silicon Valley, how often at dinner parties do people say, well, I found a compiler bug, and you just put your head in your hands and cry? Well, maybe not at dinner parties, but it's, those would be very interesting dinner parties. The life of compiler engineers is often spent debugging other people's code, which does not work because they upgraded their compiler, and then finding all the places where they have
Starting point is 00:57:19 undefined behavior, and telling them, and then telling them again when they don't believe you, and then telling them again when they don't believe you and then telling them again when they don't believe you and then writing the patch for them that fixes their code. I remember interviewing a company and they told me there was a compiler bug and that was why they couldn't do X, Y, or Z. And I sat there and tried not to say, I doubt it. I doubt that it's a compiler bug. What about when it says internal error?
Starting point is 00:57:49 Look, those are legit. I mean, there are real compiler bugs, obviously. Like, I've written my fair share of bugs. And, you know, when you encounter a bug in, actually, it's not the compiler bugs that scare me, right? It's the assembler bugs. You hit an assembler bug. Oh, God.
Starting point is 00:58:03 And you're like, how does anything ever work? But realistically, if you look at the bugs we would get coming in from a, if we deployed a new version of the compiler and all the teams across the company updated, the fraction that were real, honest to God, compiler bugs was
Starting point is 00:58:22 less than 10% probably. It's hard because you don't want to believe it's your code. Yeah. And yet as a compiler engineer, you have to believe it's your code. You're kind of the last defense. Yeah, I mean, it's a tricky place to be. And you get pretty good at identifying sort of common failure modes. This is also a great motivator for compilers to have really good warnings and diagnostics so that you can tell people, yes, the compiler is doing weird things to your code because you
Starting point is 00:58:55 have undefined behavior and look, the compiler tried to tell you. Right. Yes, that's a good reason to document what you do. And we were discussing undefined behavior earlier. Like I was saying, it's sort of the way undefined behavior creeps in in sort of these insidious ways where, you know, it's not, things that happened to work previously could sometimes stop working in a new version of the compiler because, you know, the theorem prover in the compiler got smarter
Starting point is 00:59:25 and suddenly it's able to make use of these axioms to, you know, prove some transformation safe. And now it's deleted all your code. Which is better. It's faster and more power efficient now. I was about to say, we optimized it to return zero. Yep. Well, we've kept you for a while. now? I I had one, but now I've forgotten it. I should have written it down. That's what I should do with these things.
Starting point is 01:00:09 Or type it at me in the little screen we share. Well, that was too late. Shockingly useful. Yes. All right. Well, then I will ask you about our new toy. Oh? We got an NVIDIA board sent to us from a friend who works at NVIDIA. And it's the Jetson TX2.
Starting point is 01:00:30 And I gather it has more processing power in its little board than my entire garage full of dev kits. It's got, gosh, I don't even know. It seemed like it could run a whole college's worth of computers, at least if you went to school in the 90s and that's what you think of as computers. Well, to be fair, I think your phone could do that now too. That's true. What should I do with it? I'm a little afraid of it because it clearly is smarter than I am.
Starting point is 01:01:08 Well, you can build an AI that takes over the world. And now let's maybe not do that one. You know, I'm not super familiar with the Jetson boards from NVIDIA, but I'm sure it would make a great, you know, sort of small console if you wanted to try to build like, I, you know, sort of small console if you wanted to try to build, like, I don't know, the modern equivalent of an Atari 2600 or whatever. Those kinds of boards are often targeted for computer vision applications.
Starting point is 01:01:37 Yeah. So, you know, you can start building your own self-driving car. And they tend to be really good for any kind of numerical application, which generally means doing matrix multiplies. Having worked with people who are into numerical systems,
Starting point is 01:01:52 I learned that everything in the world can be done with a matrix multiply. So just find any problem that can be done with a matrix multiply and it'll be great at it. Yeah, but it does it so much faster. I have a lot of matrix multiplies in my current code
Starting point is 01:02:07 and I'm still... You can do bigger matrices. Yes, exactly. The biggest. Yeah, I do. I think computer vision is one of the big things and maybe a little bit of robot control.
Starting point is 01:02:22 Yeah, we saw that a lot when I was working on GPUs. Our primary client was, of course, traditional graphics rasterization, putting pixels on the screen. But even when you looked at compute applications, the biggest clients of compute were still applications that were sort of graphics-y, that just didn't happen to fit within the exact confines of the rasterization algorithm. So computer vision type algorithms were a big application domain there. That makes sense. And every time I have tried to apply these things to non-computer vision, non-graphics,
Starting point is 01:02:57 it never quite works out because the DSP just makes a lot of sense and it's simpler and easier and you don't have chris's oh my god why am i trying to run everything through one thing when i don't have this many vectors to go in yeah i mean one of those so one thing things i learned from the architects i worked with was that um the area of a like the square millimeters of a processor is a pretty good proxy for the power that it's going to draw. And if you think about a GPU, like if you're only using the vector processor, just because, I don't know, you wanted to do some random computation on it and you wanted wide vectors, you're not really using a large fraction of the transistors that are on there, right? Your GPU,
Starting point is 01:03:43 like I said, it has this high bandwidth compute core, but it also has these IO accelerators for reading and writing images very fast. It does all the format conversions. It can do gamma correction and all this stuff. And it also has this conflict detection engine in it. There's a lot of stuff in it besides just this vector core. And so the larger the fraction of the stuff
Starting point is 01:04:07 in the GPU that you're using, sort of the better bang for buck you're getting in terms of value per watt, I guess. That makes sense. All right. You know, this is why like for processing on like deep learning applications applications on GPUs. There's a huge amount of...
Starting point is 01:04:27 I mean, GPUs are already a decent thing for it just because they have a lot of floating point flops. But they're even better for it when you can take advantage of the texture units. You can pretend that your neural network weights are the pixels of an image and you use the texture input and output units to do the format conversions for you. Suddenly you're getting value out of a much larger fraction of the chip and your performance per watt goes way up.
Starting point is 01:04:54 I like this metric. Now I'm going to be thinking about what to do with this board, trying to utilize as much of it as I can instead of my previous. Well, maybe I'll dip my toe in. Now I have to use the whole thing. Well, then you got to write some real graphics code. We'll see. How many different programming languages are you comfortable programming in?
Starting point is 01:05:17 I mean, you've mentioned many. I have dabbled in many. The only ones that I've written large things in are C++ and Python. I have passing familiarity with many others. And there's sort of a Dunning-Kruger kind of effect where I've worked on C++ compilers. I have fixed bugs in C++ compilers.
Starting point is 01:05:39 I have ported C++ compilers to new architectures. And I absolutely do not consider myself an expert in C++. But you're an expert in Haskell because you've never actually used it. Maybe. I can pontificate about Haskell, certainly. What new language would you want to use? If you could choose all the languages that are out there. Oh, geez. I mean, i'm very interested in rust i've never written anything beyond like a little baby program in it um and certainly if i were like starting some significant new project from scratch i would
Starting point is 01:06:17 at least consider that as an option um i don't know like i said i have sort of diverse tastes in languages and like i I like to go, you know, going along with your lightning round question. Like I try them. I write a little tiny program and then it's not like, this is cool. I should use it for more stuff and then never end up revisiting it.
Starting point is 01:06:34 Yes. I understand. I understand quite a lot. Well, Owen, thank you so much for being with us. Do you have any thoughts you'd like to leave us with? Yeah, I would encourage anyone who has any interest in compilers or processors or anything else,
Starting point is 01:06:54 just take a look at what your compiler is generating. And in particular, if you're a student, look into compilers, and compilers in open source in particular. I think compilers are not something that gets talked about a lot in school. It's actually really interesting. There's a lot, particularly if you have
Starting point is 01:07:14 a little bit more of a mathematical bent. It's really interesting. There's a lot of fun challenges. There's a lot of different sub-niches you can find your area of comfort. And it's something that, because a lot of it's done in open source these days, it's very easy to get into as a student. Google runs the Google Summer of Code program,
Starting point is 01:07:35 which is actually exactly how I got started. I did a Google Summer of Code project in 2006, which feels like forever ago. So, you know, I encourage all students to look into things like that. And you answered the question that I forgot, which was how would you get into this if you were interested in playing around with compositors? I got really bored one summer is the answer.
Starting point is 01:08:02 And you have a job now at Oculus, is that right? That's correct. I work at Oculus now, not particularly on compilers. And we are hiring for all kinds of different things from hardware engineers, electrical, mechanical, optical. Don't ask me anything about optics, please. please um firmware engineers embedded systems engineers um cpu system engineers all you know whole stack all the way up lots of fun and interesting challenges um and building virtual reality which is a fun and exciting challenging area with a lot of uh interactions between you know embedded systems and graphics and very demanding performance.
Starting point is 01:08:46 And so people will go to the Oculus website and we'll link in a careers page? Yeah, it's oculus.com slash careers. Or they can reach out to me directly and I'll help people get connected with the right folks. I guess the way to reach out to you is either through Twitter or more likely send email to show at embedded.fm
Starting point is 01:09:06 and we will connect you. Awesome. Our guest has been Owen Anderson, an engineering manager at Oculus, an LLVM old timer and self-proclaimed GPU crazy person. You have to be pretty crazy. Links to everything will be in the show notes. You can find the show notes on embedded.fm along with the current March Madness rankings, the blog contact link, and assorted other goodies.
Starting point is 01:09:35 Owen, it's been great talking to you. Thank you very much. It's been great. Thank you also to my Christopher for producing and co-hosting. And of course, thank you for listening. I have a thought to leave you with from Donald Nooth. The best programs are written so that computing machines can perform them quickly and so that humans can understand them clearly. A programmer is ideally an essayist who works with traditional aesthetic and literary forms, as well as mathematical concepts, to communicate the way that an algorithm works and to convince a reader that the results will be correct.
Starting point is 01:10:16 Embedded is an independently produced radio show that focuses on the many aspects of engineering. It is a production of Logical Elegance, an embedded software consulting company in California. If there are advertisements in the show, we did not put them there and do not receive money from them. At this time, our sponsors are Logical Elegance and listeners like you.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.