The Pragmatic Engineer - From Swift to Mojo and high-performance AI Engineering with Chris Lattner

Starting point is 00:00:00 So I just started, again, nights and weekends, didn't ask permission, just started fiddling around, seeing what could be done that would be better. The first year and half, it was literally just me working on. Nights and weekends, and I had a day job, and I was managing a big team and a lot of stuff going on. Hold on. So at this point, yeah, you were already second level manager or something and had a team of 40 people. Wow. Running a lot of very interesting and very fun stuff. You're good at juggling stuff. This is a passion project.

Starting point is 00:00:20 I'm also a fairly good programmer, so that helps. How would you go from this blind canvas to like, okay, here's what I think the language will be? I start from a lot of blank canvases across my career. So LVM, Blank Canvas. MLIR, later, blank canvas, Mojo, Swift, many of the systems I build are blank canvas projects. Do you think there is any logic in having a new language that is designed to make it easy for LLMs to code with? So I often get asked, given the AI is writing all the code, why are you building a programming language? Which is another way of asking the same question, maybe a little bit more agro.

Starting point is 00:00:50 And so, to me, I don't think that optimizing for the LLM is the right thing to do at all. Because the important thing is... Chris Lathner created some of the most influential programming language and compiled technology of the past 20 years. He's a creator of LLVM used by languages like Swift, Ruffed, and C++, created the Swift programming language, worked on TensorFlow, and now works on the Mojo programming language. In this conversation, we cover the origin story of LLVM and how Chris managed to convince Apple

Starting point is 00:01:16 to move all major Apple DevTools over to support this new technology, how Chris created Swift at Apple, including how we worked on this new language in secrecy for a year and a half. And why Mojo is a language Chris expects to help build efficient AI. programs easier and faster, how Chris uses AI tools and what productivity improvements he sees as a very experienced programmer and many more. If you'd like to understand how a truly standout software engineer like Chris thinks and gets things done and how he's designing a language that could be a very important part of AI engineering, then this episode is for you.

Starting point is 00:01:49 This podcast episode is presented by Statsig, the Unified Platform for Flags, Analytics Experiments, and more. Check out the show notes to learn more about them and our other season spawn. answer. So Chris, welcome to the podcast. Well, thank you for having me. It is so nice to meet you in person because I feel like the work that you did has had an impact personally on me at Uber. We migrate to Swift. And of course, a lot of software we use runs on things that you've built. So can we rewind time back all the way to the early 2000s? And for some of us who have not really been around there, can you describe like what was the industry like in terms of compilers, languages, and what was the status? go back then. Yeah, so, I mean, I was involved in the kind of early days of Linux. And so I started maybe not the really early days, but I started working with Linux in the mid-90s.

Starting point is 00:02:39 And this is when Java was coming on the scene and a lot of things were changing in the industry. As that played out, GCC was the compiler that standardized a lot of software and a lot of people were using it to build Linux-based software. And this is how I got to know it. GCC is a great thing. Before GCC, a lot of people don't know this. every hardware vendor was making their own compiler, and it was a gigantic mess for even just C code,

Starting point is 00:03:05 because none of these compilers were compatible with each other. And the reason they did that just so I understand is, like, you have C code. It needed to translate to assembly for a specific hardware, and then the hardware vendor, you know, they did the mapping and figuring out like what custom instructions or something like that. Was that a reason? Yeah, so basically back in, particularly like the late 80s and then the early 90s,

Starting point is 00:03:24 there's a lot more diversity in terms of like instruction sets and chip makers were innovating, and you had HP building things, and you had Intel was making a lot different systems back then, and there's all kinds of different stuff going on. And so you had risk computers that were being invented. And so the challenge at the time was that everybody had to build their own compiler. And so because everybody had to build their own compiler,

Starting point is 00:03:46 they all wanted C, and some C++ was emerging, and C was a standard, and so there was a spec that said this is what the code is supposed to look like. but as we know for many standards and many specifications, they're never complete. And so what ended up happening is each of those compilers was a gigantic mess because they had different bugs, they had different miss features, they lacked certain capabilities. And so software of the day was built with systems like autocomph, which was this really weird macro processing thingy that tried to work around some of the limitations. And it was a gigantic nightmare.

Starting point is 00:04:18 So GCC came on the scene, kind of in the 90s, really, and clean up that mess. it became the standardized thing that people could plug into, and a lot of chipmakers embraced it, and it was free software. And so that led to the rise of Linux, links, adopted it, and that really kind of brought the world forward. Now, GCC was a good thing. It really did help standardize the world,

Starting point is 00:04:39 and I think a lot of free software and open source pays a debt of gratitude towards GCC. But also, it was a really old thing. And so I made fun of it at the time. It was over 20 years old, and its architecture was very, very, very old. old school in many different ways. It was also not built to be a modular design. It had, you know, it was designed to do one thing, take C code in, and then put out assembly code for a given chip.

Starting point is 00:05:04 And so there's a lot of things that you couldn't do, like Jit compilation. And you couldn't, at the time, even optimize across files within. And JIT is just in time compilation, right? Just in time compilation, exactly. And so there's a lot of things that people wanted to do that you couldn't do with GCC. And so in around 2000, that's when I was in university and I was studying compilers. And I said, Oh, okay. Well, wouldn't be interesting to build something new in the space. And so I started working on LLVM. LLVM is a, it started out as a code generation system so you could target multiple different architectures. But really, for me, it was a learning process.

Starting point is 00:05:36 It was about saying, okay, compilers are cool. Don't let anybody tell you otherwise. Compilers are cool. Even today, right? Even today. But I didn't really understand anything about it. And so I wanted to learn by building. And so across my university project, I built this thing up more and more and more.

Starting point is 00:05:52 We then open sourced it. It got a little bit of a community. Later, I went to Apple, which really helped foster and fund a lot of the development. And that early starting point was coming from, you know, GCC and open source technologies are really good. But there's a lot of things we can't do. And so that's where I kind of fell down this rabbit hole. And like when you started an open source, what did you open source? What could it do?

Starting point is 00:06:15 Yeah. And then, you know, what was the reaction to this early version of LLVM? Yeah. So LVM back in the day, and this is super funny because many people look back on, successful systems, and they assumed that everything was obvious. When actually every step along the way is challenging, you have to earn any success you get. Very rarely do you, like, luck into it. And so when I first started working at LVM, again, it was a research project out of university.

Starting point is 00:06:41 There's a lot of research projects at a university that don't go anywhere. I'd argue most of them probably won't go in there. Exactly. And so when, and the default assumption is that LVM also would not go anywhere. that's a safe assumption for a university project. And so we used it internally. And so my advisor, Vikramadvei, encouraged us to continue building it. And then we taught it to a couple of classes.

Starting point is 00:07:04 And so we had a few people using it internally. And we got some use case with it. But at that point, it was really just optimization and code generation. And so it plugged into GCC is the parser to actually parse C code, for example. And so it was just about code generation. And it was useful for a compiler people, and learn things, but it wasn't very useful for general application developers, really. When we open sourced it, then we got, I know, two or three people that were interested in it.

Starting point is 00:07:30 And it was mostly other compiler nerds that, you know, are delightful. But there was no major community. There was no major reason for people to contribute. And what I did was I said, okay, well, I actually just treat the world like the rest of the research group and just have open development, encourage people if they combine, they want to help and do something awesome. if they would have questions, I'll answer them, and just treated people with respect. And what happened over time is slowly the community grew. You got an individual people that were interested in different things.

Starting point is 00:07:59 Some people were interested in esoteric programming languages, and they were interested in compiler technology for that reason. Other people were interested in performance, and different people had different interests. And so very, very, very, very slowly it kind of grew. And what was the big difference between GCC and LLVM? Like, of course, modular is one thing, but also you mentioned capability. that you wanted to do that GCC was just either couldn't do or was really difficult to do. What were those capabilities?

Starting point is 00:08:26 So originally it was just in time compilation, so runtime code generation. That was the thing with GCC. GCC couldn't do it. They could do only compile time. They could only do a kind of batch up front, traditional Unix style code generation. At the time, GCC could not optimize across files in your project. And so if you had a function in one C file, it could not inline it into another C file. So things like that.

Starting point is 00:08:49 LVM is also written in C++. which was highly controversial at the time because GCC was all C. And Richard Stallman was very opposed to C++. And so actually there's a parallel universe because in 2005 had joined Apple, we decided, okay, well, let's see if maybe LVM and GCC should merge. And I actually proposed to the GCC community, hey, we've got this interesting technology, this seems very complimentary, it could lift the architecture of GCC.

Starting point is 00:09:16 Maybe we should do this. And it didn't go anywhere because primarily, was written in C++ plus, but it was also not invented here. There was a bunch of other, you know, very serious, very experienced people. And they're like, okay, kid, why would we listen to you? And so it did not go anywhere. And thankfully so, it meant that LVM had to grow up on its own. And then did LLVM really start to take off when you got to move to Apple to now work on

Starting point is 00:09:42 it full time? I assume there must have been at least a small team, like, investing in this. Is that how it went or something different? So the story of getting to Apple, so when I was graduating, I was looking for jobs in compilers and jobs that ideally would let me work on LovM and continue the work because it was still an advanced research project. It was five years in. The community had grown quite a bit. I was doing regular releases every six months, and so it was really putting a lot of energy into advocating for the work and showing momentum and velocity. I caught the eye of a VP at Apple.

Starting point is 00:10:12 And there was a couple of people that were working on adding a power PC back end to LVM because that's what Apple was doing. And so I got to know them and they said, hey, come. Like, we are frustrated with GCC. That is the foundation of all of our compiler technology. Because GCC compiled Objective C, for example, right? That's right. Exactly. And so, and it was, again, an older architecture.

Starting point is 00:10:33 It was very difficult to work with. They weren't getting the performance they wanted. It was very difficult to add features. The community was also kind of annoyed with Apple for a variety of reasons. And so basically management said, come, like work on LVM. I said, yes, sign me up. And so when I started, they did give me one other engineer to work with, but really, they didn't give me a whole lot of guidance. And so I was like, okay.

Starting point is 00:10:53 Do you pretty much? Cool. I will, I will go implement power PC support and integrate with the MacOS back in the day and things like this. And then there became a time when they said, okay, cool, you're working on this. And my manager said, it's great that you're working on this cool technology. But at some point, we need to make sure that it's actually relevant to Apple. Was it not relevant? Well, it wasn't used any products yet.

Starting point is 00:11:15 Oh, I see you out of support, but it wasn't used just yet. Just yet, yeah, exactly. And it wasn't as good as GCC in a number of different ways. And so I basically got the vibe that it's like, okay, well, after a year, if Apple's not using in some product, then we'll ask you to start doing something else. Because we're not just going to pay you to work on an open source project. It's how it works. Actually, I have impact on our company.

Starting point is 00:11:35 And so when I got that memo suddenly, and my manager helped me a lot, he was amazing, went around shopping for, okay, well, what are the near-term impact that we could have on the business? And so from there, the first use case was actually for graphics. And so OpenGL was doing just-in-time compilation to do some graphics stuff. And so we were able to do something very small that actually had value. And suddenly, aha, this is not just a science project. This is actually interesting. Now, it's still missing tons of features, but that enabled the development of the next set of features,

Starting point is 00:12:05 the next set of features, the next set of features. And it was now in a product that Apple actually used and shipped. And it was generating business value, if we want to put it like that. A small amount, but at least non-zero. Exactly. And so what I did over the course of many years across Apple is then say, okay, cool, we'll keep investing the technology, keep building an open technology and an open team, so community. But make sure to deliver more and more and more and more value to Apple. And as that happens, suddenly, what ended up happening is I ended up replacing all of the developer tools, compiler, cogeneration debugger technologies within Apple. And that. Once at a bit of time. Exactly. And so it took about five years to the point where we built a new C++. PlusPars are Kling and Objective C front end and all this kind of stuff. We got to about 2010.

Starting point is 00:12:49 That's when Apple was releasing the first 64-bit iPhone. And suddenly, this was made possible by LVM, by all the technology we had built and all this kind of stuff. And it was an epic moment in the industry because everybody was convinced that 64-bit phones were stupid. That was not a thing. And 64-bits doesn't make sense. Why are you going to have more than 4 gigabytes of RAM and a phone, right?

Starting point is 00:13:12 hilarious. Wow, back in the day. But that's how people thought, right? That's how they thought. And they're like, it's an efficient, it can never work. And so it actually, the first 64-bit iPhone, it was the iPhone 5S, shipped in production before Arm got their chips back to test. And so we had built the entire software stack. We had enabled the entire operating system, the entire tool chain, all this kind of stuff internally.

Starting point is 00:13:32 Then we were collaborating with Arm, but they had no idea how far ahead we were. And then suddenly we're shipping it in production and the whole world's heads explode because how good the performance was and how all the capabilities we enabled. and then it was quite fun. So I'm having just trouble putting two things together, and you can help me out with what I'm missing on. On one end, you know, you made it seem like, okay, like, you know, initially when you got to Apple for a year, it was an experimental project.

Starting point is 00:13:54 Then you just went slowly with one team and after the other. But now next thing, we know fast forward to four or five years later, it's actually empowering like the core of Apple. How did it go from like, you know, the small projects to actually getting into like the core of the business? Did it help that you start? with developer tools and more developers were convinced? Like, how did you get through to, you know, like probably the highest level kind of architects

Starting point is 00:14:19 and decision makers who in the end, I assume they had to take a risk on technology that was like somewhat new compared to like at least GCC that's been around for like 25 years at this point? Well, so it's a combination of having top-down support. So people that wanted me to be successful. And that's because they're frustrated with GCC. And so they want a new technology. So that was very helpful. A combination of bottom-up success.

Starting point is 00:14:39 And it wasn't one thing. It was like every six months, we'd have a new thing that, worked well and go. I became known as somebody who would get things done. And so I got more scope and responsibility. And so I went from being an engineer to a manager to a second line manager to eventually a senior director over the course of a number of years. And so it wasn't one thing. It was just a lot of hard work. And it was a lot of fun because we were able to, you know, a great thing about Apple back in that time was that you could have a lot of impact because the team was growing. But also, if you could get stuff done,

Starting point is 00:15:12 you had an amazing platform in which to work because the iPhone came into existence and a whole bunch of other technologies were made possible. And Objective C was very Apple specific, but we could move it. That was huge. We had a whole bunch of features to Objective C. And so as we built into this, like there's a lot of opportunity, but it was hard one, right? It wasn't in any given moment, it wasn't guaranteed. But as my team at Apple was making progress, we were doing all this in the open. Then suddenly other people started seeing value. And like, Cray, build a supercomputer and used LVM for it, and Google eventually adopted it 2010

Starting point is 00:15:47 and started using it within Google for some small things. And then a parallel project within Google of another hero within Google, like built a team and add a lot of impact to Google, therefore justifying more people to work on LVM within Google. And different tooling, different projects and different things that all happened.

Starting point is 00:16:03 Eventually, you know, Intel or Arm or these companies ended up canceling their internal compilers and switching to LVM. And so then suddenly the vast majority of their engineering teams all working other thing. And then what you get is you get value that compounds and you get a, today LVM has a community of thousands of people that come together. And so it's amazing to see that. It's incredible. It feels a little bit like when you're starting to roll snow falls and you're

Starting point is 00:16:26 starting to roll this small thing. And then, you know, you do it for long enough and most people don't do it. But then you could have a snowman and you're like, oh, where did you get all that snow from? It's incredible. Well, so, I mean, it's funny because people always imagine success. And so you start from a starting point and you say, ah, I want to be the president of the United States or something. But, but, and some people get very motivated by that and it's very powerful. What I really love is doing all the work, taking each of the steps, doing the thankless thing, figuring out that really obscure thing that, you know, if you get the architecture, right, it makes the whole thing compound and scale and compose the right way and it allows

Starting point is 00:17:03 you to solve problems that nobody else can solve. But what that means is most of the time I'm working on things people don't understand. And so I'm fine with that. I've normalized to this. This is what I expect at this point. I don't expect people to understand me. But what happens is that these projects over time, they grow and they get to the point where suddenly it clicks and suddenly people start to understand. And suddenly you can explain it because it maps into their value system in a way that they can measure, basically. Chris just mentioned how things click a lot more for people if it maps their value system in a way that they can measure it. In my experience, measuring things is helpful all around, not just when trying to convince people, but also in figuring

Starting point is 00:17:38 out if a feature you built works as you'd expect. Here's a challenge more. teams face. They shift features, but cannot easy the measure if they're working. Did that new checkout flow improve conversion? Is the feature helping retention or hurting it? Without measurement, you're just hoping things will work. That's where Statsic comes in. Statsic is a presenting partner for the season, and they've challenged the status quo of how product teams work, much like how Chris challenged how compilers and programming languages are supposed to work. Most teams accept fragmented tools. Feature flies in one system, analytics in another, experiment somewhere else. You're stitching together points.

Starting point is 00:18:12 solutions, running ETL jobs to sync user segments, hoping timestamps align across different systems. That's the status quo, but it doesn't have to be. Statsick built a unified platform that gives you everything in one place. Feature flags, experimentation, analytics, and session replay, all using the same user assignments and event tracking. So you ship feature to 10% of users, and the other 90% automatically become your control group. You can immediately see if conversion changed, drill down to where users drop off in your funnel, then watch session recordings to understand what went wrong, all with the same data. This is how you measure impact at the pace you ship.

Starting point is 00:18:46 Companies like Graphite, Notion, and Brex rely on this approach to make data-driven decisions without waiting for data teams or manually correlating information across tools. StatSik has a generous free tier to get started, and pro-pricing for teams starts at $150 per month. So learn more and get a 30-day enterprise trial. Go to statics.com slash pragmatic. With this, let's get back to the conversation with Chris. One thing I feel that is probably pretty applicable for anyone is the fact that you were inside a company that it was pretty big and successful at that time Apple and you still managed to get so much things done just as you said by by solving hard technical problems continuously figuring how to help the business and there was this interesting part where Clang got open sourced and Apple back then LLVM was open source so that kind of I understand how to help the business. And there was this interesting part where Clang got open sourced and Apple back then LLVM was open source so that kind of I understand how.

Starting point is 00:19:39 Yeah, those grandfathered in, yeah. But I want to ask you on how you managed to convince a company that back then was pretty close, and even today is pretty close outside of a few things, to be okay open sourcing a new project. And I assume your reputation or hard work or getting things that might have been there. But what else did you kind of use for leadership to like green light this? Because I'm sure top down they need to say like yes, right? Yeah. Well, so Clang was actually the easy one because we didn't really ask to make people.

Starting point is 00:20:09 for permission. We just kind of did it. And so that was the easy one. The hard one was Swift. Let's get the so. I think we'll get there, but that was the actual hard one. And with them, let's get to Swift. The story of Swift as you as you shared many times is how you started to work on it in secret. Can you tell me like what led you to start experimenting with the new language, what you saw the problems being with objectives, especially because now we solve the compiler problem. Like, you know, the compilation was probably as good as it could have gone. And how did you pull off this kind of, what does secret mean, right? Yeah. Well, so, so the backstory on Swift is that, so I joined Apple 1 in 2005 and I built and basically replumbed their entire C++

Starting point is 00:20:54 and C and Objective C tool chain. And by 2010 at WWC, which is their Sumner Conference, we had launched a full stack replacement and it was working. And building a full stack, fully integrated C++ compiler is no small feat. C++'s even then was a very complicated language and it was a very big technical challenge but it was also very demotivating because C++ plus, at least to me,

Starting point is 00:21:19 because C++ plus is just a beast of a language. And so he couldn't come out of that and not wonder, could we do something better? And as you said, we'd build a lot of this fundamental technology could build pretty much anything we wanted at that point. And so I just started, again, nights and weekends, didn't ask permission, just started fiddling around, seeing what could be done that would be better.

Starting point is 00:21:39 And to me, what I did was actually said, okay, let's go look at a lot of existing languages. So let's go look at both the new ones. Let's look at Java or let's look at C-Bos-plus things. But let's also look at functional languages like Okamol, and Haskell and things like this. Let's also look at esoteric languages, like the really weird other things.

Starting point is 00:21:58 Let's look at typescript and dart and other things that were kind of on the scene and happening and go and languages like that. And so it was really about saying, like, well, what do I like and how do I see success? And where I defined success as was Swift was making it easy to use and scaled. And so I saw a lot of languages. JavaScript is probably the best example of this where, you know, you start with something really simple. You know, we can hack with JavaScript.

Starting point is 00:22:25 And then it gets adoption. And then you get developers. And then developers bring their tools to other domains. And so, you know, JavaScript was designed for on-com. click handlers, but now people are wearing web servers in it. Yep. Well, and so... We see it out all too much. Exactly. And so what my observation was, is that success meant that you would be naturally

Starting point is 00:22:45 brought into other demands. And so think beyond just the first win, but think towards more generality and scale. And what I realize is if you take something like Objective C, which I was deeply familiar with, for example, you know, this dichotomy between objects and C, so Objective C had this small talk-inspired object model. and then I had C for performance. It was actually a really beautiful combination because you can get all the way down to the metal with C, but you had high-level APIs.

Starting point is 00:23:14 And what I realized is that they didn't have to be two languages. So you could take the good parts and pull it together into one system that could actually scale, and it would be way easier to teach, way more memory-safe, way more modern, good type inference, fix some of the syntax problems because Objective C was a little bit of a mash-up

Starting point is 00:23:32 between two different worlds. And the funny thing about that is that you fast forward all the way to today, that's exactly what Python is. Python has a beautiful language for objects built in top of C and C++ and rust, and it's got this two-world problem going on within it. And so it's very interesting to me, just, again, how history rhymes with itself, and there's so many different things going on. But the initial, the initial ideas was Swift was really about saying, how do we build a scalable language and how do we pull together the best ideas, shamelessly, you know, borrowing ideas from different places. And by

Starting point is 00:24:03 scalable as you meant that, that means that it can be used for like your specific task that may that be, let's say, for iOS, but then it naturally lends itself to like other domains. Yeah, and so specifically scaling all the way down to embedded systems development, all the way up to something that kind of approach scripting, right? So super easy use for very high level programming. Can you tell me how did you design a language in the sense of like, as developers, we use a lot of languages, right? Like, again, like I think all of us like have tried and build with many. different ones, especially with AI tools, which it's so easy to try out, you know, for myself,

Starting point is 00:24:37 like TypeScript, Russ, go, you know, the Haskell try out, this and that prolog. Like, and I'm used to using them, right? And I kind of discover it, figure out what is there. But when you have a black canvas or when you were thinking, like, how do you go about it? Do you kind of like, I don't know, a meta program, like going to write down what you love to see? You also, of course, build compilers. You start to immediately think of how that will work. Like, can you give us a little bit of a view of like what, what? How would you go from this blind canvas to like,

Starting point is 00:25:05 okay, here's what I think the language will be? Well, so that's a good question for me in general, because I start from a lot of blank canvases across my career. Right, so L-O-V-M, blank canvas. Yep. Right, MLIR, later, blank canvas, mojo, Swift. Like many of the systems I build are blank canvas projects. And so for me, it's a couple of things.

Starting point is 00:25:21 First of all, doing that kind of analysis, the reflection, the deciding what is success ultimately look like. Then do the survey. Go look at what's out there. What's good? What's bad? One of the things I really strongly believe is that you can take a, a system that you don't like, but buried within a system, even if you don't like it,

Starting point is 00:25:37 that are often good ideas. And so you can take the ideas that are good and ignore all the rest. So, for example, Java, I didn't like garbage collection and finalizers. And like, there's all the stuff that came with Java, but being able to be Jit compiled was very powerful. Right. And so there's a lot of things that you can look at and you can say, and Java is a huge step forward from the technology space for a variety of reasons.

Starting point is 00:25:59 But you can look at that and you can say, okay, well, I like that one aspect of it. all the rest of these things are separable and maybe I don't like how they work and do that survey. And then when I start building a new system, I expect to design and redesign and iterate and learn and grow. And what I try to do is I try to go through

Starting point is 00:26:16 proof points and validate specific assumptions and say, okay, I'm not going to solve all the world's problems. I'll keep this part and this part and this part super simple. It's like the two circles that make your owl. Just leave them the circles. And then invest in proving this piece. And if I can get this

Starting point is 00:26:32 done and I can understand that it works and I can gain validation and confidence in that. I can know the interfaces going in and coming out of it. Well, now I can go out into the next system and build that out and build this thing out so that we can understand it and learn as we go. But every step along the way, being unafraid to go massively change and flip over the table and try again. And so you started to excrement with Swift, like the weekends and on the side in 2010-ish. How did it take until 2014 for a first release?

Starting point is 00:27:02 Like what really goes into building out a language? Because again, I think we can all, I can kind of emphasize, like, this is cool. Like you have these ideas. Of course, you have a lot of experience. You try out things. But then, like, as a software engineer, I shouldn't ask this. But I'm going to ask us what took so long. Oh, well, so it turns out building programming language is not easy.

Starting point is 00:27:21 So it takes about four years, I guess. Can you give us a bit of like empathy of like, you know, like what makes it so hard? Yeah. Well, so let me just frame up how it went, just so you know, because the first year and half, it was literally just me working on it. Oh. Okay. So nights and weekends, and I had a day job, and I was managing a big team and a lot of stuff

Starting point is 00:27:40 going on. Hold on. So at this point, yeah, you were already like pretty second level manager or something and had a team of 40 people. Oh, wow. Running, yeah, a lot of very interesting and very fun stuff. Wow, okay. You're good at juggling stuff.

Starting point is 00:27:52 This is passion project. Yeah. I'm also a fairly good programmer, so that helps. but after a year and a half got to the point where I told management about it. I said, hey, I'm working on this thing. What do you think? They're like, uh, yeah, uh, no, why would we want a new language? Objective C is what made the iPhone successful.

Starting point is 00:28:09 Right, right? And so like, no, what? And I'm like, well, how about I just have like one or two of the people in my team, like, work on this part time. They're like, uh, I guess you can tell them about it, but we can't commit resources really. You know, and so, and so, but, you know, again, I was respected and doing good things. And so I was given a little bit more rope than some other people

Starting point is 00:28:26 would. And it got to the point of saying, like, okay, let's build up some demos. And then it was about three years in, and most of the feedback I was given was, okay, well, if you don't like Objective C, go make Object C better. You own Objective C also. It's a natural reaction. I mean, as much as this engineer, I don't want to hear it. If I put a little bit of business hat on, it's what you would say. Makes total sense. That's what the entire community is using. It's very low risk. It makes your existing investment even bigger. And so what I did was I said, okay, well, across that four-year journey was not just building the thing full out with like a team of people that knew what they're doing. That's the discovery, the research, and also the socialization process with executive leadership.

Starting point is 00:29:03 And so what I realized is I said, okay, well, I want to have memory safety. Objective C does not have memory safety. That was one of the biggest things. Retain and release, and it was completely manual. It was just kind of a nightmare. And so we invented ARC, so we have doing reference counting automatically. That made Objectsie way more teachable and safer. it also pulled it much closer to Swift.

Starting point is 00:29:24 We then did modules. We then did literals and objectives C. We added all these features that what was doing was pulling the lived experience of doing Objective C programming closer to what Swift would be so that ultimately when Swift came out, it would be less of an abrupt leap and it was still a pretty big leap, but

Starting point is 00:29:39 we were doing that. But meanwhile, you know, management's like, okay, cool, why are you doing this? And they're having their own issues. And we got to about three years in and said, okay, well, we think we should do this in a serious way. At that point, I had a team of 250 people-ish, something like that. I was running Xcode and the whole developer tools team.

Starting point is 00:29:56 And still on the weekend, you're still doing this. Well, yeah, you can look at my GitHub history. Like, you can go back to 2014 and see what I was doing, yes. But at that time, right, they said there was this big executive review and a whole bunch of internal discussion. And I said, basically, look, I want this to be the focus of the department to bring us into market. Yeah.

Starting point is 00:30:15 And so at that point, we didn't really have it talking to iOS. Right. It was still missing a lot of the object. features and things like this. We had no apps that were built with it. And so that last year was a massive amount of work in terms of finishing language, but also integrating with the debugger into Xcode, into code formatting, and to tons of different features. Building more integration with the iOS SDK, that was a big deal, and we had not done that. And so a lot of that made it possible to finally launch it four years later. And I would say when we launched it, so a mistake we made

Starting point is 00:30:50 as we called it 1.0. I was about to come to this because that's where I have my first has an experience with 1.2. But yes, 1.0 came out and, you know, the feedback I can just talk from, you know, like what I remember from there. There's two things. People were super excited that Apple is releasing a new language. And iPhone was huge. 2014 was the time where like you knew that you could make a build a big business on iPhone, Uber, I think Snap. All of these companies were just going up. It was the status symbol already. And not. everyone had iPhone just yet. However, like when you tried it out,

Starting point is 00:31:25 it was just not good. Exco didn't have refactoring support. Debugging was a nightmare. We sure launched it as 0.5. So that's, so how did it go for? Why do you decide at the time? Like, you know, just I'm interested in your thought process at the time on how we went about it and, you know, like what were some

Starting point is 00:31:42 learnings now that you look back? Yeah. So, um, so it had to be one auto to launch it because Apple doesn't launch 0.5s, basically. So if we want to launch, it had to be called 1.0. So that was one aspect of it. But the other aspect is we wanted to get it out there. And at the point that we launched it,

Starting point is 00:31:58 only about 250 people at Apple knew about it at all. Most of those people are in my team, and then some executives in marketing and other people like this. And so we knew we couldn't build a good programming language in a vacuum with an NDA that you had to sign to get access to it, right? You have to actually get it out there. You have to get usage experience.

Starting point is 00:32:15 And so the learning for me is don't call some things or 1.0 until it's actually run. ready and it's validated and you have a lot of usage experience and things like this. We bring a lot of these lessons to Mojo, by the way. Chris Lathner and the team launched Swift back in 2014, and it since become the de facto way of building native iOS apps, which brings this likely to our season sponsor, Linear. Linear's iOS app is also built on Swift. It's a fully native app.

Starting point is 00:32:40 It's not built with React Native or using a web wrapper. And there's good reason for this. Linear's entire philosophy is about speed and performance. The desktop app is famously fast, but the desktop app is famously fast, but the internet. But that same obsession about speed carries through to mobile. Linear have just redesigned their mobile app with something pretty interesting. Their own take on Apple's liquid glass design language. But instead of just using Apple's API under the hood, they rebuild the UI from scratch.

Starting point is 00:33:05 The linear team wanted more control over the navigation and customization than Apple's system would allow. Classic engineering mindset. If the existing solution doesn't give you what you need, build your own. Here's my favorite detail about the app though. Linear has this feature called Pulse. It's basically a feed that gives you an update. on all your development work.

Starting point is 00:33:21 Project updates, what's on track, what's blocked, etc. On mobile, you can actually listen to it as an audio briefing. The linear team told me how they get raving feedback from engineering leaders to listen to their polls update on their commute. It's like a personal podcast about the team's work. When VP shared how he gets to his entire stand-up prep while driving to the office using polls. This is why I love partnering with engineering teams, like linearers.

Starting point is 00:33:43 They're not just porting desktop features to mobile. They're actually thinking about how people use their phones differently. If you want to see what they're building, check out linear that app slash pragmatic. You can use the mobile app to manage history tracking on the go, and that liquid glass implementation is genuinely beautiful engineering work. And now, let's get back to Swift 1.0 and how the language is pretty painful to work with on launch back in 2014. But also, it's about making sure that we communicate clearly with the community.

Starting point is 00:34:09 And so one of the things I think we did well is we said, look, internally to Apple, we said, look, we need to launch this, we know it won't be perfect. And so we're going to tell the community that you can adopt it, could build apps and submit them to the store, but we will break your source code. And this is because we know that we have to change language when we get usage experience. And so we told people, like, we will help you, but expect code breakage. And I'm glad we did that. I remember that was really clear.

Starting point is 00:34:34 People were upset about it, but they knew. Like when we had a discussion of should we move over to Swift or not, and at Uber, the rewrite happened in 2016, where Swift was added 1.2. and the platform team made the decision after evaluation that they will take a risk and it was a big risk and there's now stories about it. I have a blog post. There are some other people with their experiences. But they did it because they knew that Apple is committed. I think the open source part also helped build a lot of trust. And I almost feel that that was one of the last times that I saw the Apple community really unite against like iOS developers specifically just go like, you know,

Starting point is 00:35:14 what will be in the pain together, which we know it's going to be painful, but like all mobile engineers I know knew that this was mainly told their management and all that. They said, this is the right thing to do. And of course, I think it really helped that Apple have like this trajectory. And it was like, okay, well, they didn't want to be left behind. They wanted to risk not having access maybe in the future one day to APIs. Not at that point. One thing I was surprised when Swift was launched. I don't know if this was right on launch day, but Excode Playgrounds and the Ripple, the read about print loop, was also part of launch day. Yeah.

Starting point is 00:35:47 Was also part of launch day. And for those that don't know Xcode, it was like, you could just like try inside Xcode. You could just try Swift. And with Ripple, you could also just really easily play. Why did you decide that that needed to be there on launch? Well, so part of that came from the year long process of like getting it to launch. And so part of it was, okay, if we're going to do a new language, what are we going to get out of it? What is the developer experience?

Starting point is 00:36:13 everybody wanted to get something, right? And so, for example, the documentation team, who I love, said, this is our chance to write a book, which was awesome, because Swift had a book on launch, and it was fantastic. The ESCO team said, okay, well, and management and marketing people said, okay, well, can we get interactive programming? This is something we've never had, and so we want that, and so that's where playgrounds came from. And so the REPL came from me saying, okay, well, modern languages should have a REPL. This is a thing that builds into the debugger, and there's a bunch of of different capabilities we could use, and it would make a lot of sense to make it much more modern feeling, and if we're going to do this, let's do it right. And so there's a lot of that,

Starting point is 00:36:50 okay, if you're going to do it, let's get the advantages from it. Now, the flip side of it is we were massively overextended. It's a lot of the vision was right, but particularly at 1.0 and even 1.2, many of these things were still unbaked, and it took a long time, and I think that we underestimated how hard some of these things were, and waiting for 2.0 or 3.0, you know, maybe it would have been better for some of this. But anyways, it's easy to be negative and judgmental in hindsight. At the time, we're all working really hard and trying to make something amazing. And we're learning as we went.

Starting point is 00:37:19 And I was really thankful for the community. But one of the other things I'll say is that while you may have been on the, oh, wow, awesome thing and let's adopt and let's get ahead of the curve, there are just as many people that were the, what are you talking about? Objective C is great. If they're broke, don't fix it. And they're like shaking their fist and saying, like, I will never use a new thing. blah, blah, blah, blah, blah, and there's all these bad things, which they were right about with Swift. And so it took quite a long time for the community center of gravity to move. Well, this was a fun fact.

Starting point is 00:37:48 In 2016, I remember middle of 2016, or maybe a little bit earlier, Uber was one of the big mobile first. Companies built on iOS, grew up on iOS, lived and breathed iOS and of course Android. And the mobile platform team ran a survey in sometime early 2016, two years after Swift was out. this is SWIP1.2 saying, we are planning to rewrite the app. The rewrite was a thing. Just because they needed to need it to redo every single screen, every single interaction, which kind of meant up a rewrite, all the business logic. So it was going to be a rewrite no matter what. And they ran a survey, would you prefer, you know, as an iOS engineer, objective C or SWIFT? And the results were pretty much 50-50.

Starting point is 00:38:29 Yep. So, and these were, again, very experienced engineers that they knew. And the two camps were violently disagreeing with another. And I don't, I don't. I still don't know why Uber went with Swift. Someone was a tiebreaker in the end. But back then, this is how foot on foot it was. And of course, there was grumbling. There was problems. So the objective people said, like, told you so.

Starting point is 00:38:49 We could have just done. There was at some point thinking of like, maybe you should have go. But as you say, it was not trivial. But I guess this is what you don't like see in hindsight. Well, so, but I've learned a lot. And again, you reflect forward because this impacts exactly what I'm working on now. Same thing. Yep, with Mojo.

Starting point is 00:39:05 Experts don't like change. Right. What I got. This is counterintuitive. Well, I mean, if you think about it, like, if you're an expert object to see programmer, I don't know if you were, but if you're an expert object to see programmer, you've been doing it for five or ten years. You've learned all the bag of tricks, all the weird cases, you know, all the arcane thing. You are the person that people turn to for help. Swift comes out.

Starting point is 00:39:27 Now, your expertise is invalidated. You're on day one, just like everybody else. You are not the expert. You don't really want a new thing. you want to be the king of the hill of the old thing. And you don't want the new thing to be successful. And so now some people are intellectually flexible and will switch over, and there are early adopters and things like this

Starting point is 00:39:46 or people that like new things. And so it's not true of everybody. But it is actually sensible human behavior to protect your prior investment. And for a lot of people, you have to have a really good reason to switch over. And so in the case of Swift, you know, Swift UI was like a reason that moved some of the long-tail people over.

Starting point is 00:40:05 Years later. Years later, right? So there's different inflection points. And so what you look at is it's almost technology diffusion, right? And so they're early adopters, and they'll get on board quickly just because it's a shiny new thing. Yeah. And then there are some people you have to drag objectives to the other, they're called dead hands. Right?

Starting point is 00:40:21 And so, like, everybody else gets mapped somewhere along this curve, and it kind of looks like an S curve. But this is interesting because this will, you know, we're going to get to like some of the, like, AI and LMs, but I feel this playing out. That's right. That's exactly the same thing. And it's part of human behavior. It's not about Objective C. Now, let me tell you the thing that I find that I'm most proud of in this journey.

Starting point is 00:40:43 So when I started 0.10, and it was just a toy project, like playing around within spare time, not expecting to go anywhere, I was kind of implicitly coming from the assumption that we could do something that was better than C++ and Objective C and that was more easy to use and things like this,

Starting point is 00:40:58 but didn't really know how it would go. After we launched, I would have people that stopped me in the streets and said, thank you, Chris. Because of Swift, and because of not just me, but the whole team working on this, I was able to become an app developer. Before Swift, I tried building apps with Objective C, but I could never figure out the pointers and the square brackets and it would crash. And I couldn't, and like, I'm not smart enough to build an app. But now with Swift, it became easy enough that even I could do it.

Starting point is 00:41:24 And now I am an app developer instead of a web developer, and I've become the expert in my company. And now I have learned and grown in my career. And so now, thanks to this new technology, like I progressed. And if not for this enabling technology, it never would have happened. And that's the exact same thing happening with GPUs right now.

Starting point is 00:41:42 We have the exact same thing with CUDA and C++ plus and all these tools that are gatekeeping so many developers away from this emerging kind of compute and kind of technology. And it's the same exact thing. And so this is what really makes me passionate about working on this kind of technology

Starting point is 00:41:57 because I believe in the power of programmers. I believe in the human potential of people that want to create things. And that's fundamentally why I love software is that you can create anything that you can imagine. And to me, that's what's so powerful about working in this space. I really feel it and see it. Just looking back on SWIP, because you created it. It's your baby. It started as a project.

Starting point is 00:42:18 You were a guardian of it or a shepherd or however you want to call it for a while. And of course, now you've stepped away. It has its own life. Looking back at it, how do you feel on how the language has evolved? what are parts that you're really kind of happy on how it lived up? What are parts where you're thinking that maybe if you would have stayed longer, maybe you would have tried to have it in different ways if there's any. So it's very hard for me to say if I'd stayed at Apple and if I had kept my fists on

Starting point is 00:42:45 and controlled it if it would have been better or worse. But I can talk to the mistakes that I made, right? Because in their learnings that I could bring forward, and that way it's my fault. And it's not me saying other people weren't able to do it, right? I like that. But early Swift, it was, you know, I'll just say it bluntly, I didn't know what I was doing, right? So I'd never done it before.

Starting point is 00:43:06 I'd build a C++ plus compiler, which had a spec. Well, it was the first language you ever built. It was the first language I'd ever built, right? And so a lot of the ideas going into it were new. A lot of the type checking algorithms were exciting, but I didn't really know what I was doing. I'm also not a math guy, and so if I actually understood math, then maybe it would have been obvious. but things in Swift like Expression too complex to type check

Starting point is 00:43:28 and reasonable amount of time, that's my fault. And there's specific features that cause that to happen, and it's very difficult to fix. And so that's one mistake. Another mistake is that we were very ambitious in terms of adding features that we wanted to see in the language, and we added them faster than the

Starting point is 00:43:44 architecture could keep up. And so particularly if you talk about Swift 1, but even today, what ended up happening is the internals of the compiler and the design didn't really line up the right way. and so you'd get a lot of things that work kind of 80% or 90% and they didn't quite fit together and then as you start adding more and more and more stuff,

Starting point is 00:44:01 it doesn't really line up right and it just gets more and more complicated. C++ has the same thing going by the way, and Swift's not unique this way. But as a consequence of that, you get an emergent complexity that then developers can feel. And Swift today, people make fun of two make keywords

Starting point is 00:44:15 and things like this. Well, it's because the origin story was really not, in my opinion, didn't put enough energy into like really refactoring, really redesigning, really making sure the core was great. Instead, it was, okay, let's get to app development as quickly as possible.

Starting point is 00:44:31 And so as a consequence of that, what happened is, and I and many other good people put more stuff onto and more stuff onto and more stuff onto and the language got more complicated. And so, you know, partially that's maybe in any individual decision, but also it's because some of those original things didn't line up the right way. And so the complexity kind of compounded.

Starting point is 00:44:50 Wow. I'm just like hearing we have a lot of tech depth inside a language and it builds up. Yeah. Well, I mean, this is just one way of putting it, of course, you put it differently,

Starting point is 00:45:01 but it just feels this all repeats itself. Like it's whatever project you do. Absolutely. Well, let me talk about like a simple example in C++. In C++, int is hard-coded into the language. Yep. Float is hard-coded into the language.

Starting point is 00:45:15 Complex numbers are not. They're templates. Why is that? Why don't you just get rid of these concepts, have one fewer thing, and just make int and float part of the library. And then everything's more self-similar and everything will work the same way, et cetera.

Starting point is 00:45:29 And so Swift fix that problem, right? So int and float are now part of the library, and all the stuff is much simpler in that way, right? But C++, you can't go back and fix it because it comes from C. And so they added templates after C, and so all the C stuff had to be hard-coded in. They couldn't fix that.

Starting point is 00:45:46 And that's the origin story of why C-plus-works the way it does. when I was talking with Steve McConnell on the podcast, he told me that he, when he worked at Microsoft for a while, the best developer he met there, did this thing where he wrote everything three times. First, he wrote a first version in a first version in like a week and a half, that a second version in three days and a third version in a day. And when I think back of the history of Python that I just heard there's documentary where we can listen to it, it was also the second or third iteration of language. And what I'm also here is, hearing now is also, you know, like in Swift, you saw some mistakes in C++, but you couldn't forecast some of the things. But, but now we're going to get to Mojo as well. And that's always my goal, right? The goal is, so you're always going to make mistakes in life. So there's things you can do. You can decide to make new mistakes. So learn from the past and make new mistakes, not old mistakes. And then you can make the decision to fix mistakes wherever possible. And then

Starting point is 00:46:44 you try not to paint yourself into a corner. And so, again, a thing that I think Swift did well is it said, okay, well, the initial release was 1.0. That was predestined. But we're going to break the language. And so Swift 1, Swift 2, ultimately to Swift 3, broke the language to try and make it better. And then only at Swift 3 did we say, okay, now it's stable, right?

Starting point is 00:47:03 And I think that was really good because can you imagine if it's still Swift 1. But we would be left with all these mistakes? All the mistakes, right? And so it was painful, but I think that was a good set of decisions we made and that led to a much better result. Yeah, there's a lot of tradeoffs.

Starting point is 00:47:16 So as we get to what you're doing today, you're after Apple, you worked at a few interesting places, Tesla, Google Brain, this is high five. Can you talk about what you've learned at each of these places and kind of how it's, because I think it'll be interesting

Starting point is 00:47:31 to look back on how it just all is leading to what you're doing now. Yeah, totally. So the way I frame it up is that, you know, one dot over my career was at Apple and building developer tools and understanding CPUs and I built OpenCL,

Starting point is 00:47:43 which is a GPU thing also, and compilers for the GPU and like all this kind of stuff. And so learning a lot about that hardware software boundary and developers and developer tools and Xcode and like that whole thing. 2016, I fell in love with AI. And the reason was, is that about 2016, the Photosap had the ability to tell you, is this a cat or is this a dog? And it was a very, like it was inception v1 or some really ancient AI model was in there. I'm like, I own developer tools.

Starting point is 00:48:10 I know a lot about programming. I have no idea how to write an algorithm to detect a cat in a picture. How the heck does that work? So I fell down this rabbit hole in 2016, and this is what leads me to 2.D.O. of my journey, which is a very AI-focused, like, go figure all this stuff out. And so Tesla is doing Applied with self-driving cars. That's where I learned CAFE and TensorFlow, and I realized TensorFlow was a good thing, but it could be a way better thing. And so that led me to join in Google and helping scale TPUs.

Starting point is 00:48:37 There I ended up owning the bottom part of TensorFlow, all the CPU, GPU, TPU, plus a bunch of other weird internal ASICs within TensorFlow. but notably I scaled the software platform for TPUs. And across years, learned both, hey, it's actually really hard to make AI software without Kuda. TPUs don't have Kuda. Kuda, which is NVIDIA's way to program kernels, right? Yep. So basically the entire AI ecosystem back then, but also today, ends up being very heavily biased by Nvidia and by their API and software and language toolkit called Kuda.

Starting point is 00:49:11 Which is like 20 years old or so? It's about 20 years old. And so it's awesome, and it's very mature, and there's a lot of good things about it, but it's also 20 years old. And so there's time and space for new things. But when you're building an ASIC, so TPUs are a very large-scale data center training and inference accelerator, very fancy, very frontier, particularly back in 2017. You don't have software. And so you have to create everything from scratch, and then you have to get TensorFlow and Pytechorch to talk to it. And nobody really understood how that worked.

Starting point is 00:49:39 And so across the years, I learned so much. And I'm so thankful for my experience at Google because I learned about the algorithms, I learned about AI, the frontier applications. Like the attentions, all you need paper, was invented in that timeframe on TPUs. And so we said, wow. Yes. And so tremendous amount of things

Starting point is 00:49:58 going on large-scale embedding systems for ads and recommenders and stuff like this was gone. It was a brilliant time. And so across that, kind of an early version of what I'm doing now said, okay, well, we need to reinvest in the software technology and the platform for technologies. And so I built components of what is now, I think, pretty widespread AI software. So there's this MLIR framework, new runtime technologies, a whole bunch of different things.

Starting point is 00:50:22 So each of these things ended up being kind of 2.0 of the first leg of my journey. And so MLIR is a compiler framework. It's kind of like LVM2.0. And so it's a rough analogy, and we probably won't go into NLR full details. But it's for ML, right? Well, it's for domain-specific chips. The main for civic chips, yep. Yep.

Starting point is 00:50:42 And so, but I could not have built that without the experience of building L-OVM, right? And so having gone through that experience and being willing to do a 2.0, a clean slate, you know, looking into the abyss, you know, you're standing on a cliff and it's just blackness around you and you could do anything. But you have to decide where to start, right? That really helped having gone through that and having made a bunch of mistakes with LLVM and being able to then fix them. And so we're able to build something that has now been widely adopted by roughly every AI company or, every AI chip company that exists, which is very exciting. Wow. And through that journey at SciFive, for example, then said,

Starting point is 00:51:16 and SciFiFi built CPUs, right? So, one of the best in the world, as I understand. So SciFive builds Risk V hardware. Risk five hardware. Yeah. And so I decided, okay, I've always been on just the software side of the hardware software boundary. How about we go play on the other side?

Starting point is 00:51:33 That sounds fun. And so I learned a tremendous amount about physical design and chip design and the business model of hardware. and IPs and like all the different technology for verification and things like this and tremendous amount of fun. And one of the things we decided to do is let's go build an AI chip or an AI IP. And so you get to that and then suddenly you get to where's the software? And so across like each of these legs in my journey, I'm like, okay, where is the software?

Starting point is 00:52:00 Where's the software? And in the case of AI, there's basically nothing there. There's nothing that you could take off the shelf that provides an end-to-end AI solution. that you can then just go implement your chip. And so, I mean, the origin story of modular really came down to my co-founder and I who were buddies. We got to know each other previously at Google, but really saying, like, you know, we need something like LVM, but for AI.

Starting point is 00:52:22 For AI. Like, we need the thing that people can implement support for their chip into it and then get an AI software stack because, you know, if you're building CPUs 20 years ago, you didn't want to have to build a whole C++ compiler and C compiler and code generator and kernel and web browser. you just want to do a little bit of work and then get software. And that doesn't exist for AI right now. And then for those of us who are software engineers,

Starting point is 00:52:47 but we don't build AI software, we just, you know, we use the models, we might invoke the API or we might invoke the trained one. Can you explain to us if, you know, we moved into the world of actually building AI applications, running them directly on GPUs, may that be MBIDIA or now there's AMD or Google TPUs? What software would we use today? Like, what is the status quo or what was at least this? status quo when you started modular? And then what is your vision, which I sense there's a bunch of similarities with like GCC and LLBM here. Yep, absolutely. So, um, so before GCC,

Starting point is 00:53:19 every chipmaker had to build it on software say. That's literally the shape of AI software today. Because there is no thing to plug into, besides modular, uh, everybody has to build their own vertical software stay. And so when I was at Google, we built a thing called XLA. That's the thingy that we built, kind of like Kuda, but for Google TPUs. Or Google TPUs. Google TIPUs. If you look at Apple, they have metal and MLX and their whole stack. If you go look at AMD, they have Rock M, their whole stack. Everybody has to have their own verticalized stack, and they share very low code.

Starting point is 00:53:52 Oh, so this is why I was just looking at, I talked with some folks on Tropic and look at their job sites, and I saw that they're hiring both Kuda engineers, but also Google TIPU kernel engineers. And Traneum, which is the ATS show. And Traneum. And so they're writing the same models three times over and over again. depending on the hardware they do. And then I guess when it's a new version, okay, I get it. So this is where we are today,

Starting point is 00:54:14 or where we were at least. Well, that, and so, like you mentioned Anthropic, they have an amazing engineering blog post from a few weeks ago. I can remind me I can dig up the link. Yeah, we'll put it in the show notes. It talks about the problem of re-implaying same models three times over.

Starting point is 00:54:27 Turns out they don't work right. You have bugs. You have quality problems that then users can feel in your product. You have outages because implementing things three times in parallel with three different teams, of course they don't wind up.

Starting point is 00:54:39 So where modular came from was saying, okay, well, I've lived this experience with Google, with Tfu's, and a number of other chips, by the way, but TPUs are the public one, have seen this at SciFive building into the hardware ecosystem. At that point, this was about four years ago at this point. The AI hardware ecosystem wasn't mature, but suddenly I had built up this wealth of knowledge

Starting point is 00:55:02 of how to do this stuff. And when I looked around at all these projects, I looked at PyTorch, I looked at, And there's onyx runtime and like there's a million of these things out there. I don't think anybody's on the right path. Nobody was willing to do the work because what I saw is I saw the problems I faced at Google. I love my experience of Google. I learned tremendous amount that people are amazingly brilliant.

Starting point is 00:55:22 But the problem that I had and that we had as a team was that every year there'd be a new chip. And every year you're scrambling and get the next ship to work. At the same time, all the product teams are using the current chip. And so all your best people are getting assigned to put out the burning fire. for search or ads or whatever the work, YouTube, whatever the workload is. And so you could never actually get the mindset or the bandwidth to be able to do something fundamentally new,

Starting point is 00:55:46 particularly if it took more than a six or 12-month performance review cycle. Oh, yeah. Google was infamous for that. And so what I saw is I saw a lot of these very short-term projects and a lot of improvements and a lot of micro-optimizations and a lot of great work, a lot of massive innovation in models and the architectures. But the progress, the research progress, was happening because Google had infinitely smart engineers

Starting point is 00:56:09 that had built every level of the stack and could hack the daylights out of it just to make simple things work. It wasn't scalable, it wasn't beautiful, it wasn't built to compose, and it certainly wasn't built to bring up new hardware quickly. And so, as you know, I've been working compilers and dev tools and languages

Starting point is 00:56:25 and stuff for quite a long time. And so I said, okay, well, I think it's finally time to solve this problem. But to solve it, you can't solve it in a weekend with, like, three people. You need years of time and a deep, very high-tech team to be able to invest in this technology because Kuda, as you mentioned, is 20 years old, it's had thousands of people work on it. Huge investment.

Starting point is 00:56:47 Any one of these chips and these algorithms is extremely complicated. The AI research landscape moves really fast. You need a huge amount of effort. And the only way to tackle this was with a VC-funded startup. Because that's the only way we could pull people out of Apple and Nvidia and META and Google and all these companies to be able to go build this. And you said something interesting in a previous podcast or interview. You said between kernel engineers, compiler engineers and AI engineers, they just kind of think

Starting point is 00:57:17 differently and there's not much overlap. Can you tell me a little bit about, because you have been all three. You, of course, know, know, so many of them. But like, what is the difference? In the end, you know, they all need to work together at a place like this. But you said that somehow they don't, like, understand each other. Yeah, so I'll tell you the aha moment that. came to our technology approach for modular.

Starting point is 00:57:37 Again, if you go back four years ago, you're talking 2021. Funny how that works. Okay, so late pandemic, a lot of amazing stuff had been done. And a lot of people have built, first of all, TensorFlow and Pytorch. I consider them to be some of the very early production frameworks. TensorFlow PiTorch both come from 2015, by the way. So they're nearly 10 years old, which is ancient and AI time. Right.

Starting point is 00:58:03 And so they were built under this idea of we will take, Kuta kernels and we'll take Intel MKL kernels and then we'll build Python APIs on top of these. And so one of the challenges with that generation system is that it was so easy to write kernels that people made thousands and thousands of kernels

Starting point is 00:58:19 but then they couldn't be ported and you had to rewrite thousands and thousands of kernels to be able to bring up a new chip. When we worked on TPUs, we said, aha, you know what's cool? Compilers. And so instead of writing thousands and thousands of kernels, what we can do is we can use compilers to generate the code and automatically

Starting point is 00:58:35 synthesize pernals on the fly. This was a huge thing, and we said, hey, look, we can get performance, we can get things like auto fusion, where you merge two kernels together. And compilers are a lot of fun, by the way. I encourage people to learn them. And so that was great fun. Now, the problem with this is

Starting point is 00:58:51 where you can't scale kernels by writing thousands of kernels every time you bring up a new chip, turns out AI moves really fast. Turns out compiler engineers are lovely. I love many of them. But there's very few of them. There's very few of them. And so you now, what you ended up with is you ended up with a bunch of these technologies

Starting point is 00:59:10 where it was cool because you could support, you know, some compiler for some ship, but suddenly you couldn't do flash attention or some new algorithm comes on the scene. It's a massive breakthrough in research, but the compiler people couldn't be in the loop to implement the new sparsity or the new float four format or the new this or that or the, and so having compiler engineers in the loop was a huge problem. And it held back, for example, Google TPUs, a lot of what happened, particularly when Gen AI, Gen AI came on the scene, really invalidated that technology approach. So I believed, and so going into this, and also part of the learning from working on TPUs, is that we needed programmability. We need full power over the hardware. We didn't need

Starting point is 00:59:50 the wateredown, simple solution that made it so that you got most of the performance. What I believe in is getting all of the performance. And so... Otherwise engineers would just like write kernels as they do. Exactly. Exactly. And there's so many technologies you see this across software where it's like, okay, it works up until a point, and then you have to switch to a different system or a different stack, and I don't like those things. Go back to Swift. I wanted to be able to scale from embedded all the way up to application level. Like, actually unify this. Don't make it so that you have like fragmentation in different systems. And so what we bet on was we bet on two level stack. We bet on Mojo.

Starting point is 01:00:25 What programming language? A brand new programming language. And as we all know, making a new programming language is doomed and like never works. And you cannot get adopted, like all those things. Well, you have a good track record, though. Yeah, it's actually just a hard problem, but you have to be smart about it, and we can talk about that. And then on top of that, saying, like, let's take this programming language and design it so it can solve these problems and scale across hardware, etc.

Starting point is 01:00:49 But then take these compiler algorithms and put them into the language so that you as a mojo programmer get the power of compilers without having to be a compiler engineer. And so a lot of what makes Mojo really special is it takes power out of the compiler and gives it to you as a developer because it turns out a lot of people can write code. They can't necessarily write compilers. And so what this does is breaks open the talent problem. And so now we can have a lot more people building these very highly adaptive, highly scalable, highly composable systems in ways that they couldn't do before. So can you tell me about both Mojo but and also how this language can actually break the compiler problem.

Starting point is 01:01:29 That part, you know, the new language, I think we can all understand, but how you can get through to compilers and get the performance out of it, how does the language do that or how does the stuff around it are under and I need to do it. Well, so again, this is my life's work, my journey, and so I've made a lot of mistakes along the way. And so a lot of it, as we talked about before, when you don't know what you're doing, go learn from other people and go decide what they've done right and wrong and what you like. for me, a lot of it was learn from what I had done wrong and make sure to do it better the second time. And so one of the things that classical compiler engineers and myself back in 2005 or something, which is now 20 years ago somehow,

Starting point is 01:02:08 was that compiler engineers want to show how awesome their compiler can be. Let me show you. I can influence this optimization. I can detect this pattern in this case. I can make this code go two times faster, or five times faster, 10 times faster. The challenge with that approach, and this is, actually, this is really important for benchmarks,

Starting point is 01:02:27 because if you think about, if you're a compiler engineer, you generally can't change the source code. Like, you know, if you're optimizing for the spec benchmark suite or something, the source code's fixed. And so all you can do is implement crazy stuff within the compiler. Within the compiler. And it's a power matching and transformations and maybe change the memory layout of something, using some heuristic. But there's a problem for that as a user. And so if you're a programmer and you're using one of these compilers and say you're working with a loop vector, For example, loop vectorization can make your code go four times faster by using SIMD to take advantage of the hardware, right?

Starting point is 01:03:00 But it requires a lot of pattern matching on your code. And it has analysis of how you use pointers and all kinds of other stuff that all has to line up for the optimization to work. So now you can have this wonderful moment where you pick up one of these magic compilers and then you get amazing performance. You're like, wow, this is cool. It goes four times faster. I love this thing. This is great. But then you go and fix a bug.

Starting point is 01:03:23 If you fix a bug and you break the hack in the compiler, well, suddenly you get a four-time slowdown. And you're not sure why, and you cannot really debug it unless you're a compiler engineer. Exactly. So now you file a bug, and again, there's not enough compilers in the compiler engineers in the world, and so they may or may not look at it,

Starting point is 01:03:40 or they may tell you you're holding it wrong, it's too bad, like it has to be this way, right? But what that experience is, is that's caused by compilers that are trying to be sufficiently smart. And the sufficiently smart compiler never works. What it ends up being is it ends up being an abstraction that's leaky. Leaky abstractions exist in lots of software and lots of different domains.

Starting point is 01:04:00 But when they exist in a language, it means that you have an unpredictable programming model. So fast forward to today and AI. Today and AI, basically people want peak performance because GPUs are really expensive. As you said, if you build a system that doesn't deliver performance, they'll switch to a different system because they have to get that performance, right? particularly for the really large-scale frontier labs and things like this. They just squeeze everything out of it. They need to. It's required. The cost is just astronomical otherwise. Well, you can't have a system that is like unpredictable or flaky or weird.

Starting point is 01:04:32 You want something that has control. And so what Mojo does, unlike Swift and many L of M languages, Rust, all these guys build on top of the same kind of technologies, we go for a much simpler compiler. It's way more modern. It's way fancier than what like Swift or Rust languages like that are built on for a variety of reasons, but it's way more predictable. And so it puts control on your hands.

Starting point is 01:04:56 And so instead of, for example, vectorization, instead of like hoping to magically do it for you and maybe it works out, maybe it doesn't, it gives you library features or you can say, hey, vectorize this for me. And I saw one at Mojo, of course, there's a tutorial and getting started. I saw there's these special character slash keywords where if you want to, I mean, you can just like write code that looks like Python is a subset of it. And you're like, whatever.

Starting point is 01:05:18 But then you can kind of take control, So like as a, when you're coding, when you're like, you decide you want a certain compiler feature that it supports. Can you give examples of like what I can do with it? Yeah. So, for example, I take SIMD, SIMD, so vectors. Even, I don't think this is even a really standardized feature today in most programs, which is, but it's essential to get performance out of a CPU. Right. Most of the operations and Fluidpoint operations in a CPU are in SIMD because they allow parallelism.

Starting point is 01:05:48 So Mojo makes that very native. Mojo also makes things like B-Float 16 and compressed floating point formats and stuff like that, very native. And so it exposes that to you as a programmer. But then you have this problem of you want productivity. Like you don't want to have to write assembly code effectively or very low-level code to be able to do basic things. And so you want very powerful abstractions that you can build in the library. And so what Mojo does is it has very powerful metaprogramming capabilities. And so, and by the way, it didn't invent these.

Starting point is 01:06:19 It roughly stole them from Zig and made them better. So thank you to the Zig people. They have amazing ideas. But the idea is to say, let's pull this compile time metaprogramming system up, make it much more powerful, embedded in the language. And so now you can build compile time transformations of your code. And so you can say, hey, I want a vectorized function. Well, what does it do?

Starting point is 01:06:39 It takes a higher order function, which is, you know, all the operations you want to vectorize, and then it stamps it out across your vector lanes, and it handles that transformation for you, giving you a very productive way of expressing these very high-performance algorithms that give you extreme predictability. And so this design point is very different. It's very important for accelerators,

Starting point is 01:07:01 because with accelerators, you can't have the memory allocation that magically happens. Like, you need control, but these features are a very different design point than what most languages are coming from. The other advantage, as you say, is it makes the language way simpler. And so if you take a language like C++, C++ has metaprogramming, has templates, and ConstExper and like this whole weird system that is evolving.

Starting point is 01:07:25 Not the simplest twos. I mean, you learn it. You learn it. And I've written way too much C++. And so like, you know, even I can handle it. But what C++ does is it has the program and then the metaprogram. And they're effectively different languages. And so this is really complicated.

Starting point is 01:07:41 And writing something using templates, it's almost impossible to debug. You can't even pass a string to a template, really. Like, there are hacks, but, like, it's just really difficult to do things. You certainly can't, like, make a binary tree or something and pass it into a template. That's not a thing. And so in Mojo and Zigg, what they do is they unify the program and the meta program. And so you can just write code and then run that at compile time. And so now if you want to debug it, cool.

Starting point is 01:08:06 Just use the same code at runtime and step through it in a debugger. It makes it super simple. It means you have one thing to learn, and it works in both places. It means you can have real types. It's in general types. You can do, in Mojo, it's very fancy because you can use like a list or a dictionary kind of data structure that does heap allocations. And you can build it all at compile time and then block the output of that into your program. And so it will synthesize the mallex for you for a bunch of things that you compile time evaluated.

Starting point is 01:08:34 And it all composes the right way. And so what this does is it gives you tremendous amount of power when you care about performance, when you care about building these higher level algorithms, which end up being like mini compilers. People generally don't think about it that way, but they give you the ability to specialize based on dimensions and numbers and magic constants in your algorithms and allows you to build these very high-power abstractions

Starting point is 01:08:58 very naturally. And if I want to get started with Mojo outside of just the basics, which are, so like using some of these more advanced features, is it the best way to actually like rent a, may that be a GP or like some these days? is pretty easy and go and try to implement an algorithm, a program that actually needs high performance and start to play and understand these additional, kind of the ways that I can control a compiler.

Starting point is 01:09:27 Or what would your suggestion be for someone who is like, okay, Liam believes that this is the future and this is like cool to experiment with on their job. They don't currently have a problem like this. How would you go about it on the weekends, right? Yeah. Yeah, so I can talk about what makes Mojo cool. Yeah, let's do that. Well, but see, that ends up in the rabbit hole of really weird technical things that only Chris understands.

Starting point is 01:09:52 Wait, let's not talk about linear types. Let's not talk. But instead, let me tell you about why you might care. Okay, so Mojo is a member of the Python family. So it's super familiar, easy to learn. And by the way, with the AI coding, it's easiest time ever to learn programming language, just like you mentioned. One of the simplest easy things to do with Mojo is to extend an existing Python module. So have you ever been coding a Python, building, and building, building, it gets big and big and big and big and suddenly get slow?

Starting point is 01:10:19 Well, historically, your option is to go rewrite a big chunk of it in Rust or C+++, and then you have to use bindings, and it's real kind of a pain in the butt to be able to do this kind of work. Mojo is the most beautiful way to extend Python you can imagine. No bindings, just like Swift and Objective C, they natively spoke to each other. No, no, because Rust is a popular choice to extend Python, but with a binding. Yeah, but you need bindings, right? And so it's very mechanical, and you can get it wrong. And so Mojo does that without the bindings, but also integrated with pit packages and the build systems all integrated.

Starting point is 01:10:52 And it's just super beautiful and awesome. And you can stay on CPU. You don't need anything. It's all free. Like, go nuts and go do this. We have very simple tutorials for this. Because I guess one thing with Mojo, that I don't think we, would you mention, but I know that it's also not just for GPUs or TPUs, but also CPU.

Starting point is 01:11:07 is the CPUs. And today's CPUs are extremely fancy and complicated machines, and getting performance out of them is also really interesting. And so what I've seen people do, and come back to the empowering stories that I love, we had a person come to one of our community meetings, and they rocked up, and it's like, oh, okay, tell us what you're working on. And I said, well, I don't know anything about this fancy, you know, AI stuff. Right. I am merely a geneticist, and I'm like, like, merely, right? It's like, I work with DNA sequencing and the stuff,

Starting point is 01:11:42 and I'm a Python programmer, and so I saw the Mojo Mix Python go fast, awesome. And so I said, okay, well, maybe I would try using it from some of my DNA sequencing stuff. And so I pulled over some code, and wow, it was like 100 times faster because I pulled it over to Mojo, and I can learn how to use threads,

Starting point is 01:11:58 and then I could factorize, and that I could use, it was very cool because I could do this. And then you can start to use the CPUs resources if you're running on a CPU. Which, as you said, have a lot of capabilities, and you can play around with it if you learn how to control. Exactly. And so Mojo makes it really easy to do that, where you are on a CPU.

Starting point is 01:12:14 And then the thing I loved about it was he said, and then I read that Mojo was really cool for GPUs. And I'd never done that before. I don't know anything about GPUs, but I decided to give it a shot. And in an afternoon, I got my thing running on a GPU. Now it's a million times faster. And now I'm a GPU programmer, right?

Starting point is 01:12:30 And so that, to me, is super empowering. And this is where when you look at technologies like Kuda and things like this, again, they're amazing systems. And there's a lot of good energy and good work put into it. But it's 20 years old. It's C++. It wasn't designed for modern architectures. GPUs today have tensor cores. And Kuda can't even acknowledge that that exists, really.

Starting point is 01:12:51 And so this is a huge, huge opportunity for us to get more people into the space and for people to learn about this for the first time. I mean, you know, just as an engineer who is like, I don't program GPUs. But of course, like everything I write runs on CPUs. or cloud or wherever. This reminds you a little bit of a story of the, what would you previously share of the Swift person? Because like I can, first speaking for yourself,

Starting point is 01:13:13 I can't really imagine myself like just saying, all right, I'm going to like write Kuda. I don't really have a use case, but I can't absolutely imagine myself just like adding a bit of mojo, optimizing it and then maybe just deploying it on a GPU to see how it is.

Starting point is 01:13:24 If I see stuff that are paralyzed, play around with it, is it faster? And then just like with this genesis. Build and learn as you go. Yeah. And then start to understand. and these other hardwheres, which of course will increasingly probably be useful,

Starting point is 01:13:37 if we know that. And also they're now available. We can rent them on the clouds. You can soon, maybe not today, but I think, I mean today, you can rent them by the hour. If not, it's going to happen for sure. Yeah, you can totally do that. Well, and so we even go further. It turns out that lots of things have GPUs.

Starting point is 01:13:52 Today with Mojo, you can run on AMD, Nvidia, and Apple GPUs. And so if you do want to learn about GPUs, you can go to our website, search for GPU puzzles, and we'll teach you how to do this. And actually, it's an amazing time. I love it. Because what we're doing, and we're still early. We're still building into this. We'd love help.

Starting point is 01:14:08 Please come join our open source community and help us build in this. But what we really truly believe is that GPUs and AI accelerators and like all these chips, whether they're for AI or for DNA sequencing or chemistry or bioinformatics or oil and gas exploration. Like there's all these HPC, there's like all these applications that can use these accelerators. But all the software around them is really weird. Right. And if we can break that down, if we can make it easy to learn, if we can teach people, if we can make them consistent across the hardware vendors, because these hardware people don't always get along, by the way. But if we can make it more uniform, then we can get a bigger community. In that community, sure, some of the people, some of the old dogs from the existing community will convert over. And we love that. And, you know, Mojo's way nicer than C++. And so it's way better from user experience perspective and compile times. And there's so many things that are better about it than older technologies like Kuta. But all the other. Also, what I believe is that there's an entire new generation of people that should be programming these chips. And if we can get more people into this ecosystem, they can upskill, they can get better jobs.

Starting point is 01:15:12 These are high-paying jobs to do this kind of work. And so it's the best time, it's a better time than ever to learn how to do this. And so it's, you know, partially my life's work here is to enable this and enable people to get into this ecosystem and break down the gatekeeping. And really, it really feels, you know, you have them just always been passionate about compiler's technology and going towards here. Can you tell me where modular is as a team? You started about three years ago. Can you tell me how much progress you made, how large a team is right now,

Starting point is 01:15:43 and like where Mojo and the OK system is and where you're focusing next? Yeah, yeah. So modular will be four years old in January. It's hard to believe. Turns out things do take four years sometimes. Even with full focus. Yeah, and so we are an unusual startup

Starting point is 01:16:00 and we went into this knowing this, but we are a very large team and very expensive because we believe, and our investors believe, that this problem that we're tackling is very important. And so at this point, we're about just over 140 people, something like that. And we've enabled seven different architectures from three different vendors. So that's Ampure, Hopper, Blackwell from AMD, or from Nvidia, the 300, 325, 355 from AMD, plus all their consumer stuff.

Starting point is 01:16:29 And now Apple GPUs and beta and coming online, which is very, exciting. We're probably not done yet. We'll keep going, but can't share stuff before it's time to share, but it's still very early days. And so what we're doing is we're building into this from a technology perspective. Mojo, for example, I want 1.0 to be meaningful. And so I think that we'll have 1.0 early summer next year. So that'll be pretty cool and it will be way better than Swift 1.0. Yeah, but I was about to say this 1.0 will be. So the team's scoping that out. We'll come up with detailed dates soon, and we'll share that when it's the right time. And so as we're doing, that's what we're doing is we're being really thoughtful and learning from the previous journey

Starting point is 01:17:08 and from other people on the team's journey is to make sure that we do this well. And so we want to provide stability for the ecosystem, which Swift 1 didn't do, by the way, and build into this. And each time we do pass through one of these milestones, you know, we can expect more growth and more use cases and more investment from different kinds of people. That's not what I've seen. So commercially, we have customers and we're scaling into that. that. We have a cloud platform, which is the way we monetize most of this stuff. Mojo is not a product that we're trying to make tons of money up, by the way. It's a thing we had to build to be able to scale across lots of hardware. And so we see a dual world where we can actually catalyze

Starting point is 01:17:44 and grow and teach people how to build and use all these accelerators, and then we can help enterprises build and manage AI into their ecosystem. And a lot of what these folks are looking for is they want something that's easy to use that actually works, it's reliable, that they can scale. Yeah, a lot of them want optionality on hardware. They want to be able to be able to able to buy the best ship for their workload. They want to be able to get multi-source in different vendors. They don't want to have to rewrite it three times. Yeah. I mean, this all only makes sense when you're, especially that there's this dependency on like one or two vendors, usually one. Yep. And so, and what I see and what I predict in

Starting point is 01:18:16 2026 is there's going to be a lot of silicon from a lot of different vendors and we want to make sure that people can scale across that effectively. And so, you know, for people that are interested, we have lots of job postings on our website. And so we're growing very rapidly. And we have a very big mission. It will take, you know, it's not another six months and then we're done. We're just still at the beginning. And every year of our journey is just another epoch of new things to build and learn and grow and develop into this. And of course, you're building like your sounds like the infrastructure under AI or hopefully a lot of it could be. How is your team, the engineering team, specifically using AI, AI tools. May that be, you know, IDs, agents.

Starting point is 01:18:54 You mentioned that Mojo is a familiar language to write with AI, but I'm interested in underneath the hood. Are you seeing the impact of these tools? Is it helping engineers work faster or down in the current? Because there's always this question, like if you're

Starting point is 01:19:10 writing compilers, will these things help at all or no? You need to handcraft it and I'd love to ask for someone who's actually seen it. Yeah, well, so I can tell you, I can't tell you all the things, but I can share my experiences. So we definitely encourage our team to use AI coding tools, and so you use cloud code and cursor and all 57 other things. And so that there's lots of different experiences.

Starting point is 01:19:30 I saw I've seen tons of benefit. And so for me, I still code. And so I use cursor for example, as my daily driver right now. And so I feel like it's, you know, a good 10% productivity for particularly mechanical rewrites and stuff like this. But you're a really good programmer. So that's pretty good. Pretty meaningful. Yes. I represent very sophisticated programmer. Correct. Yes. And so like I, but it's really, I enjoy it because it frees me from a lot of the mechanical stuff. And so whether it makes me aggregate more, more productive or not, increases my enjoyment, which is good. It's really important.

Starting point is 01:20:01 It is, actually. Now, for other folks that are building prototypes or PMs, they're building, like, wireframe models of things, like it's transformative because you can literally 10x somebody and you can make it so they could build something that otherwise wouldn't get around to doing. And so that is transformative. For production coding, I think that it's kind of hit and miss.

Starting point is 01:20:21 And to me, honestly, I don't know if it's, like, actually a net win or not because I've seen many cases where people say, okay, just go let the agent and try a thing. And then it grinds and grinds and grinds and a gazillion tokens later, like, it doesn't work. If they would have just done it,

Starting point is 01:20:37 then it would have been done sooner, right? And so to me, like, again, there's lots of wins, but there's also some losses. And so I don't know how things will net out. I do know it's moving very rapidly, and I do want us to be using it. But also, one of the things I'd say is I don't want people to turn their brains off.

Starting point is 01:20:52 And one of the things I think is really important is that programmers in our team, but also in general, they use it as a human assist, not a human replacement. And I think it's very important that we review the code. We understand the architecture. And it's not kind of for production workloads and production applications, at least vibe coding terrifies me. Not just because of what does it mean for jobs, but what does it mean six months from now when you want to change the architecture of something? Yeah. How are you going to even understand how anything works? Right.

Starting point is 01:21:21 And so I do think that it's really important to keep humans in the loop and architects thinking about something and understanding things, not just for security and performance and all the other reasons as well. Well, Eddie Osmani, I talk with Eddie Osmani, who's been on the Chrome DevTools team for like 13 years now. I think a lot of the tooling he or his team built there. And he said he wrote this book called Beyond Vipe coding. And he said that what he does is he enables like a verbose kind of output for the agent. And he actually reads through like what the thing he was. He makes sure that before he commits something or reviews that he just understands what there. So that he always stays in touch.

Starting point is 01:22:00 So, you know, if this thing turned off, like he could, he could jump in. It's just like he had now, he just chose not to do so. Well, and I think that, again, the thing that I care about is that the architecture of production, I mean, prototypes are a different deal. But for production, you keep the architecture. clean and right. It doesn't need to be perfect, but it needs to be curated

Starting point is 01:22:20 because I've seen AI coding tools go crazy with like duplicating things in different places. Well, we know that's a problem because then you want to go change something. Now you have bugs when you update two out of the three places, right? And so there's a lot of these things

Starting point is 01:22:32 where the tools are amazing, but they still need adult supervision. So when it comes to hiring, like you have like, you're building something really interesting and cool and honestly super ambitious. You already have like a pretty big team but what you're growing.

Starting point is 01:22:45 what is your hiring bar? And I think you previously mentioned that you're building a company of elite nerds. I'm interested in. Not wrong. What are things that you look in folks? And for software engineering, experience engineers, especially like listening,

Starting point is 01:23:00 what tactics or strategies suggest that they get to the level of that they could have a shot at a place, like modular, or even if not necessarily modular? Yeah. So we hired two different kinds of folks. One is the super specialized, you know, compiler nerd.

Starting point is 01:23:15 or something like this, or the GP programmer expert that knows how to do high-performance matrix modifications, and they've got 10 years of experience, and so you can go look at that track record. We also hire people fresh at school. Wow. This is refreshing. And so I really do appreciate people that are very early career because they haven't learned all the bad things yet.

Starting point is 01:23:36 And so for me, I think maybe the more interesting thing to talk about is not the super expert that's already the senior engineer. Yeah, we are. It's about folks that are much earlier in their career. And so what I love to see is I love to see people that are really hungry. They have not given up and just said AI will do everything for me, but they have intellectual curiosity. They're willing to work hard, of course, right?

Starting point is 01:23:55 But then they're fearless because everything, particularly in the AI space, is changing so rapidly. But a lot of people will just freeze up and they won't do anything. Versus if you say, okay, cool, let's go figure it out. I want to learn how to do this. I mean, this has been my life's journey. It has been saying that sounds terrifying. People tell me it's impossible and it's doomed to failure,

Starting point is 01:24:15 but how hard can it be? Let's just go figure it out and work through things and do that. And I personally really love when people have contributed to open source projects. I think that's the simplest way

Starting point is 01:24:24 to prove that not only can you write code, but you could actually work with a team, which is a huge part of software engineering in reality. And so I love internships and things like this that people have done before and kind of when talking with people,

Starting point is 01:24:37 you can tell whether they're excited about what they do or whether they're just kind of performatively kind of going through the motions. Now, one of the really challenging parts is in interviews, people get very nervous. And so one of the things I really believe in is giving people kind of the native tools they're

Starting point is 01:24:50 used to using. And so, like, let people code in the way that they would be normally coding. And so today, for example, we want people to be using AI coding for the mechanical pieces, right? So saying, okay, well, you have to write in a whiteboard and you can't use your native tools would be very strange. And so we kind of try to make sure to meet people in something that actually approaches a real world situation. One question I have a meaning to ask. Because you built several languages, this comes up and I ask folks like what are things they'd love to hear from you. And this was repeated quite a bit. Do you think there is any logic in having a new language that is designed to make it easy for LLMs to code with and like attributes, for example, better pattern matching or other things that might be the strength of LLMs. And if so, like what would these

Starting point is 01:25:42 patterns be. This is really weird. I would have not thought we would talk about things like this, but now we should. Absolutely. I'm happy to talk about it. And so I often sometimes get asked, so given the AI is writing out of the code, why are you building a programming language? Right, which is another way of asking the same question, maybe a little bit more agro. And so to me, I don't think that optimizing for the LLM is the right thing to do at all. Because go back to what we're just talking about, the important thing is reading the code. Always has been, by the way. Right. code is read more often than it's written. AI has made it way easier to crank out the code than it ever has been, and I don't think

Starting point is 01:26:18 we're going back to where we were. So writing the code is actually not the key thing to me. It's about reading it. And so what I care about with Mojo and with these new emerging systems we're building is really a combination of two things. One is expressivity. And so can you express the full power of the hardware? Yes or no.

Starting point is 01:26:37 Like if the answer is no, then you will never be able to get it. And so JavaScript will never be a good way to write a Kuta kernel or a GPU kernel. It just will not because it can't express the things you need to express, right? No slight against it. It's a good system, right? But if that's the goal, then you have to be able to do that. Now, assembly code can express the full power of the hardware. And that's not what I'm advocating for, right?

Starting point is 01:27:00 The other thing that goes with it is you need readability. Right. And so you need this combination, this intersection between expressivity. So can you express the important thing, in our case, performance? And then can you understand the code? Can you build abstractions? Can you build in scalable systems? And that intersection, I think, is the key thing. And so this is where you take Python syntax, for example. Python's super easy to read. It's actually very standard. A lot of people know it. Let's embrace that. Let's go all in on that. That's a good thing. And so Mojo embraces Python and the syntax of Python at least and says, okay, well, now let's fix the problems with Python, which is basically the entire implementation.

Starting point is 01:27:38 let's replace the Python interpreter and all that kind of stuff. Let's keep this good part and make it so we can express that full power of the hardware. And with that, I think the LLMs will continue to get better and better at dealing with any weirdness. And what I've seen is that the LMs are an amazing way to learn coding a new language. Yeah, this is interesting because I feel we're starting to see, like, including in like software, how to build good software. Oftentimes, like what is good and efficient for teams is also LLMs tend to do better with them. Yeah, I also look at these

Starting point is 01:28:09 funny questions. It's like, okay, well, a lot of people in the agentic coding loops want to have error messages that feed back into the agents so that the agents know what to do. And so, should we make better error messages for the agents? Make it better for humans. And my question is like, what is better for an agent that's not also better

Starting point is 01:28:26 for a human? Let's just have good error messages, right? The other key thing, and I think the most important thing for LLM-based coding and AI tools is have a massive amount of open source. And so for us, if you go to the modular repo, we have, I don't know, 700,000 lines of Mojo code that's open source. And when you open source, you open sort of the whole history as well, by the way.

Starting point is 01:28:43 Exactly. Exactly. And so that's super powerful because now you can say, hey, hey, go index this. And so now it knows a gigantic swath of everything you can possibly do. And so that that I think is also really important. Yeah. Now, to close with, you have mentioned that compilers are cool multiple times here. I'm slightly biased, I admit. Absolutely. But I'm getting convinced that compilers are cool. for software engineers who have not built or work with compilers, what are some kind of like pointers you would give them of like, hey, I'd like to get into understanding compilers, maybe building one, should I try to come up with the language?

Starting point is 01:29:19 Like, what are some things that you point people to? Yeah, so I'll tell you why I fell in love with compilers. So if I go back to university, I was taking the standard computer science program of basic programming, then data structures, and then like an operating system class, GUI class and all this kind of stuff. The thing I love about compilers is that

Starting point is 01:29:40 you built Project 1, and so in this case, it was what's called the Lexer, the thing that tokenizes the source code. And you had to use some data structures. So I'm like, oh, okay, cool. I'm using the things I learned, not just how to write code, but what a tree is and things like this.

Starting point is 01:29:56 And then you get to the second part and you build on top of it. And so now you build a parser, the thing that decides the syntax of the language, using the tokenizer. And then you build the type checker and you build on the first two and then you build the next thing and you build on those things. And so what I

Starting point is 01:30:11 loved about compiler as a class was that I got to build and learn and iterate and then if I made a mistake I had to go back and fix it is you're building higher and higher and higher. And so I think that compilers, particularly in the university setting, reflects more of like real software development in a way

Starting point is 01:30:27 that a lot of other classes I experienced did not because a lot of the other classes ended up to build a thing, turn it in, throw it away, build another thing and throw it away. And so that's why I found love with it. Today, if you want to learn compilers, it's easier than ever. You can go to Lovm.

Starting point is 01:30:43 There's a great tutorial called the kaleidoscope tutorial that I wrote. That wonderful. 15 years ago or something, it's been a long time ago. The Rust community has a lot of compiler, lovely compiler nerds in it. And so a lot of compiler-based technologies built in Rust these days, which is also super cool. And so there's books and lessons and things like this.

Starting point is 01:30:59 I don't literally think everybody needs to become a compiler engineer, but I think the compilers don't get the credit they deserve, and it turns out there's a lot of really good jobs and compilers. And so if it happens to be that it's an interesting place and interesting side of technologies, and I really do encourage people to do it because we need more folks in this field. Wonderful, Chris, this has been really, really interesting. Learned a lot. Thank you. Yeah, well, thank you for having me. I hope it was useful.

Starting point is 01:31:23 It was. It's rare to talk with a software engineer who is so passionate about programming as Chris is, and I hope you enjoyed this conversation as much as I did. I especially liked how Chris took us to the backstage of how things really happen at Apple. How it was Chris's own stubbornness and track record of getting things done that allowed him to push things through like open sourcing the Swift programming language and creating the new language to start with? For more deep dives on AI engineering and stories and developer tools at other companies, check out deep dyes in the pragmatic engineer that I wrote

Starting point is 01:31:51 linked in the show notes below. If you enjoyed this podcast, please do subscribe on your favorite podcast platform and on YouTube. A special thank you if you also leave a rating on the show. Thanks and see you in the next one.

The Pragmatic Engineer - From Swift to Mojo and high-performance AI Engineering with Chris Lattner

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.