CoRecursive: Coding Stories - Tech Talk: Modern Systems Programming And Scala Native With Richard Whaling

Episode Date: February 22, 2019

Richard Whaling has an interesting perspective on software development. If you write software for the JVM or if you are interested in low level system programming, or even doing data heavy or network ...heavy IO programming then you will find this interview interesting. We discuss how to build faster software in a modern fashion by using glibc and techniques from system programming. This means using raw pointers and manual memory management but from a modern language. Richard also shares some perspectives on better utilizing the underlying operating system and how we can build better software by depending on services rather than libraries. Links: Beej's Guide to C Beej's Guide to Unix Interprocess Communication Beej's Guide to Network Programming Gary Bernhardt's Destroy All Software Screencasts (Web Server from Scratch, Malloc from scratch, shell from scratch) Stevens & Rago Systems Programming books: Advanced Programming UNIX Environment Unix Network Programming - Sockets UNIX Network programming - Interprocess Communication  

Transcript
Discussion (0)
Starting point is 00:00:00 Welcome to Code Recursive, where we bring you discussions with thought leaders in the world of software development. I am Adam, your host. So it's not as much that I'm a fan of it. It's more that I feel that it's real and it's there. And the more we try to abstract over it, we'll both, one, our abstractions will get messy, the underlying details will leak, because they inevitably do. And then it will end up working with a sort of unpredictable and not really performant abstraction that's probably worse in every respect than working with the underlying, somewhat poorly designed operating system interface in the first place. That was Richard Whaling. He has an interesting
Starting point is 00:00:55 perspective on software development. If you write software for the JVM, or if you're interested in low-level systems programming, then you'll find this interview super interesting. I mean, I think lots of people will find it interesting who don't fit into those categories. I saw Richard present at a conference a couple years ago. He was talking about building his own web server and it was a super interesting talk. So I'm excited.
Starting point is 00:01:22 I'm finally getting the chance to have a nice long chat with him. If you like the podcast, I recommend you subscribe to it and stay tuned to the end of the episode for an update about the Slack channel. All right, Richard Whaling, thank you for joining me on the podcast. Yeah, thank you so much for having me, Adam. So you are the author of Modern Systems Programming with Scala Native. That's right. I've read half the book. And this is actually good for me this case, because it's not that I got lazy and didn't finish it. Actually, I think only half of the book has been released. That's right. It's in pragmatics, beta early
Starting point is 00:02:01 access program. So there's there's five chapters available online now for anyone who wants to get in on it early. And then I'm going to be uploading additional chapters, probably about one a month or thereabouts until it's done, hopefully sometime this summer. Awesome. So the very first interview I did was actually with Dennis Shablin, the original creator of Scala Native. And in it, we talked a lot about kind of the implementation details. And I think it's great if people want to go check it out, but I don't think it would be required for this episode.
Starting point is 00:02:33 So I wanted to ask you, as an end user, what is Scala Native? Sure. You know, from my point of view, and from someone who's not an expert on implementation details either, for me, Scala native is an alternative implementation of the Scala language. You can think of it sort of as being analogous to Scala.js,
Starting point is 00:02:53 but whereas Scala.js compiles Scala to JavaScript and regular Scala compiles Scala to JVM bytecode. Scala native compiles Scala code to native machine binary executable instructions, the same way that a C compiler does or that a Rust compiler does. And it actually uses the same compiler backend, the LLVM compiler framework that Rust does, for example. Who should be interested in Scala Native? Oh, yeah, that's a really good question. I think there's sort of two different angles. First of all, I think really anyone working in Scala should take some notice of Scala Native.
Starting point is 00:03:36 I think, especially in the last year or two, I feel like there's a movement away from Scala as a single platform language that's really closely tied to the Java ecosystem. And both more of an emphasis, first of all, on pure Scala libraries, things like STDP is a great example. And then along with that, a move towards more explicitly embracing Scala as a multi-platform language specifically. So Scala.js is a few years ahead of Scala Native in terms of sort of uptake and broad library support. But I think we're moving towards a direction of really having three different platforms that can run Scala code. You can run it on the JVM,
Starting point is 00:04:27 you can run it on the web, or you can run it as native machine code. And I think there's a place for all of them in the ecosystem. That makes sense. Was that, I think you said there was two classes of people. Yeah, you're right, thank you.
Starting point is 00:04:41 And then the other side is really people coming at it from this like native or systems programming sort of angle. Where, I mean, obviously Rust is a language that's made a huge amount of impact in that area, both people switching from writing C++ to writing Rust, and also bringing people in to writing code at a much lower level than they'd ever thought possible before. Mostly because Rust provides modern ergonomics and language features, and has a really inviting and supportive community in a way that C and C++ may have lacked. I think similarly, Scala Native provides access to really low-level techniques in a way that's more friendly and ergonomic than anything people have probably used before, while also having a pretty substantially
Starting point is 00:05:32 different spin than Rust, for what it's worth. And do you find yourself in a specific camp? What brings you to Scala Native? Yeah, that's a good question. I mean, for background, I started writing Scala maybe about 10 years ago, maybe nine. And it was from a background of writing C and writing Python and from Python C interop, especially for like XML databases, information retrieval, stuff like that. And I originally started with Scala because I thought its XML support looked really good. Yeah. It's been a...
Starting point is 00:06:10 And I'm not working on XML anymore, fortunately. So that's not nearly as much of a problem. But for me, having the ability to go down to a very low level and write C when necessary to solve a problem is sort of how I think about how to solve problems. And sort of losing that capability when I switched to mostly working on the JVM was really hard for me. So when I started seeing Scala Native show up at conferences and just saw the elegance of Scala natives C interop. It was just like nothing I'd ever seen before,
Starting point is 00:06:47 like compared to working with Python or even to languages like Lua. It's just the cleanest, most straightforward C interop I think I've ever seen. So I just sort of picked it up and started making like tiny PRs, mostly to add C standard library function bindings and sort of took it from there. So when using Scala Native, I lose the Java libraries. Is that right? You lose at the very least the JDK implementations
Starting point is 00:07:21 of the Java libraries. It's tricky for Scala.js and Scala native because the Scala language itself does rely on Java libraries at a lot of fundamental levels for things like arrays, regular expressions, strings. Even when Scala provides its own Scala string and Scala array classes, under the hood, those are depending on Java string and array classes a lot of the time. So the approach that both Scala.js and Scala Native have taken is that they provided alternative non-JVM implementations of these classes wherever possible. And the idea is that the sort of the surface area of Java libraries that you need to cover
Starting point is 00:08:08 to get Scala the language running is not that large. It's a couple hundred classes, I think. And those are all provided as part of the standard Scala native distribution. The trouble is, right, the line between a few hundred core Java classes versus the tens of thousands of classes in a full JDK Java libraries, right? And in some ways, that's more painful than just the core JDK in some ways. So what you generally find in Scala Native is to solve a new problem, a lot of times you'll look and find a C library that solves the problem well before you look for what you'd find necessarily in the Scala ecosystem. That's changing more and more just because we're getting better pure Scala implementations of so many core parts of the domain. We're just getting close to having a really good pure Scala implementation of
Starting point is 00:09:29 Hocon, the type safe configuration library that's really widely used. Whereas before, that was a library that's incredibly widely used, but a lot of it is implemented in Java, which would make it unusable in Scala.js and Scala Native. So moving towards having pure 100% Scala implementations of important libraries is really important just for sort of allowing cross-platform development to work easily. So it seems like it's a great time for it to emerge then, because as you were saying,
Starting point is 00:10:02 for a long time, like maybe using Java libraries is less common now. It's still quite common, but I guess there's a lot of pure Scala implementations that are used in that, that makes transposing things to native that much easier. Yeah. You know, I think it really does. It means you can pick up a JSON library, like Argonaut off the shelf and it'll just sort of work, you know, versus having to find a C library or something like that. It's also just a really good time because I feel like there's also just sort of a cycling of the common libraries used in the Scala ecosystem. And I think there's a lot of things that are getting redesigned or people are bringing in new libraries for things like config, for HTTP, a lot of areas like that.
Starting point is 00:10:47 So it's a really interesting time to be building and designing new libraries and making an impact there, I guess. So you mentioned interop with C. How does that work? In Scala native, it's really simple. It's literally just a one line function def, just like in a regular Scala program. You define a singleton object, you annotate that it's an at extern object. And then let's say you want to write a binding for a function like quicksort. The C Quicksort takes, off the top of my head, it takes four arguments. It takes a void pointer to the array of data to sort. It takes the number of elements in the array, which is just an integer. It takes the size of each element of the array in bytes. And then the fourth argument is a function pointer
Starting point is 00:11:45 to the actual comparator that Quicksort uses to actually implement the pairwise comparisons. And it's this sort of magical C library function that is both incredibly low level, quite generic. And when you compare it to sorting routines in a high-level language like in Java or in Python, it's sort of ludicrously performant, like usually one or two orders of magnitude faster compared to sorting in a higher-level language with richer data structures. to use it in Scala native is you write a Scala def and you give it the same name as the function you want to bind and the same arguments. You have to translate the types a little bit. So a void star, void pointer, and C becomes a pointer of byte in Scala. So in Scala native, pointer is actually just a generic data type, which turns out to be incredibly elegant. And it can actually make these function declarations and type definitions and stuff like that quite a bit easier to read than C syntax, which can frankly feel like line noise
Starting point is 00:12:57 sometimes with function pointers and things like that. So it's actually really straightforward. And if there's a function from the C standard library that's missing that you need to use, one, you can provide it in your own code easily. Again, just a one line function definition. But also like contributing those bindings to Scala Native itself is a really easy, quick one, like small PR. And that was how I got involved in the first place, was just sort of knocking these things out because there was a bunch of string tokenization functions that I wanted to use for some reason.
Starting point is 00:13:35 That's interesting. It's hard to picture, I think, so like what the code looks like, for instance, if you're using this QSort, is standard Scala, right? You're going to import, you're going to import this namespace. That's like the C standard library. And then you have, um, you have a generic pointer type of the type of data that is in that you have to do a lot of casting. I think that's the big difference, right?
Starting point is 00:14:01 Yeah. And that's the thing that probably feels the most different about working with Scala Native, just because casting pointers and structs and arrays in Scala Native has similarly unsafe semantics to how C works. C is a typed language, but in many ways it's a weakly typed language, in that not only will the compiler allow you to freely cast between nominally unrelated data types, but a lot of really important APIs require you to do so in a way that feels deeply unhygienic to people who, like me, who have been working in strong, safe type systems like Scala's for 10 years, right? So it can be a little
Starting point is 00:14:46 awkward to learn the patterns there if you never learned them from C. And maybe it's even harder to trust yourself to do all of these sort of free, unsafe casts that C programmers do all the time. But once you get the hang of it, it's actually incredibly powerful. And the absence of sort of runtime overhead that you get from just doing this sort of pure compile time casting and maintaining certain type disciplines oneself is kind of crazy, honestly. I think in my Scala Days talks last year, I had some pretty good benchmarks on this. And I don't 100% remember all of the numbers off the top of my head. But the thing that was really impressive is that the difference in performance between, say, sorting a large Java array versus Quicksort and Scala Native on a large array of structs, you couldn't even say that Scala Native was X percent faster because it was a super linear improvement in performance as the size of the array gets bigger, which was both surprising and really cool. And the reason that happens is
Starting point is 00:16:00 because if you have a large array in Java or in JVM Scala, right, you have your array, but you're also storing a ton of objects on the heap. And the larger the heap gets, the more time and the more resources your system is going to spend on garbage collection. Whereas with Scala native or with C, an array of structs is manually managed memory. It's never going to get garbage collected. You can have regular garbage collected on heap objects also. But if you keep all of your bulk data, like gigabytes of data off the main heap, it's basically free for the purposes of like runtime GC overhead. And that's more than like, oh, this is 20, 30% faster. That's where you find the difference between a program that can run and a program that can't. It makes things possible that
Starting point is 00:16:52 just aren't possible with vanilla Scala and large on-heap data structures in a lot of ways. Yeah, it's kind of crazy. So what you're actually saying is, okay, like sort, sorting things, quick sort, let's say it's supposed to be like n log n complexity, I think. Yeah, that might be right. But you're saying that the complexity class is actually completely different in a managed environment. Like you think that you're not actually getting that performance because of all this overhead that you're not counting. So when you go native, you're actually in a whole different class of performance. Is that? Yeah. Yeah. I think that's a really good way to put it. I think native quicksort is going to perform like the algorithmic lower bound of quicksort. I think quicksort might nominally be in squared, but just a very well
Starting point is 00:17:41 optimized in squared, but I'm fuzzy. But you're right. So the difference is that with the JVM, you have this performance penalty of garbage collection on top of everything. And the burden the garbage collector places on large data intensive operations is much larger than actually running computations on them in a lot of cases. And we take these large, heavy, legacy, virtual machine-managed environments for granted so much in every modern language. I think we lose sight of how much performance we're giving up to these runtimes in some ways. I don't know.
Starting point is 00:18:23 Yeah. In your book, you use Scala Native to manipulate Google Ngram data. Yeah. Why was that a good choice? Why was Scala Native a good choice there? Why was Scala Native a good choice? Well, first of all, for all of the kinds of reasons I've put out there already, that when you're really trying to process bulk data, and I picked Google Ingrams, right, because just the letter A file for this data set is like two gigabytes. And you can get way, way more data than this, if you want to. Once you approach the size of around two plus
Starting point is 00:19:01 gigabytes of data, that's where a JVM heap is really going to start to have trouble. And it turned out to be a really great way to exemplify the virtues of Scala Native for doing this kind of bulk off-heap data processing on a real data set and with a simple but somewhat practical real world use case of taking two gigabytes of data, aggregating it and sorting it, which I think is a pretty common and understandable task with this kind of file. Yeah, like the first example you do, you're just reading it line by line, I think, right? And just finding the... That's exactly it. Just finding the max line value. So to me, it feels like the JVM should do this very well.
Starting point is 00:19:51 Right. And reading lines one at a time is something the JVM does well and is pretty well optimized for. But even then, when it's doing that, it's going to be allocating data for a string and then freeing that object with a file like this tens of millions of times. And thus, even for these really small object allocations, without having to actually persist a huge amount of data onto the heap, there's not a huge amount of data being retained in this case. It's just a lot of data goes in and a lot of data onto the heap, right? Like there's not a huge amount of data being retained in this case. It's just a lot of data goes in and a lot of data goes out. Even in that use case, the garbage collector is really imposing a lot of overhead. Whereas what you can do with Scala
Starting point is 00:20:37 native is instead of allocating data over and over to read lines in and then discard them. You can just allocate a static buffer of like a kilobyte one time, and then you can read every file into the buffer, and then you don't have to free anything. You just keep reading data into the same buffer and just process all the data in place. And then there's no allocations at all, no de-allocations. So even where you don't actually have the overhead of the large heap, but just high GC throughput, it's still possible to beat the JVM. And I did try running this on my machine and yeah, it's definitely faster. And then you had a second example, I think you were hinting at it, where you kind of aggregate this data. So that was interesting because the result was not quite what I expected.
Starting point is 00:21:28 Yeah. I'm curious what you were expecting from doing like an aggregation and a sort. Well, so the example I think was just grouping all the data because it's split by years. The thing that was surprising to me was that the non-native version couldn't do it. Like it was just too much. Yeah. So I was expecting the JVM to do a little better there, to be honest. And I mean, my code isn't always perfect either. So it's certainly possible a high performance JVM specialist could write a Scala or a Java program that can do this. I certainly know people who do that for a living. But I was a little surprised that the basic JVM array classes and string classes really couldn't handle what to me was a somewhat intensive, but not outlandish ask, I guess. Processing a two gigabyte file and sort of aggregating it into maybe a couple hundred megabytes of on-heap storage seems to me something we should expect a reasonable language implementation to
Starting point is 00:22:33 be able to do. So to me, this is a really important way just to illustrate that the overhead and just the cost of the JVM or other heavy runtimes really does affect what programs you can run and what programs you can't. I think, like, I remember Dennis talking about this before, like, if you look at Spark or something, they end up just doing manual memory management using some tricks. Yeah, I mean, Spark is such a fascinating use case. You know, I use Spark a ton and have used it for years. I was doing Spark consulting and struggling with Spark jobs and getting them to run a lot of times. I think in some ways, it is really fascinating how much Spark still relies on on-heap storage, which has pros and
Starting point is 00:23:21 cons. But one of the cons of it in practice is that when a Spark job approaches the maximum memory, amount of memory and heap available, either on a single node or even worse across all the nodes, really the whole system starts to fall apart. The classic thing that'll happen in really any data intensive JVM program, right, is that once your heap gets really maxed out, your garbage collector will start imposing long stop the world pauses. And while that's happening, you'll miss some kind of heartbeat or some other kind of distributed systems timeout that your cluster is using for coordination. And then once you start missing heartbeats, the cluster starts breaking apart and thrashing. And then it's just all downhill from there.
Starting point is 00:24:12 Right? So it's not like you can't even push the system to the limits of its capacity without breaking all of the networking and distributed systems components. Whereas I think using off-heat memory, whether you're in the JVM or not, for distributed systems has a lot of benefits for data-intensive distributed systems. But also, it's really, really hard to do that right. The JVM really gets in your way over and over trying to do things like this. And for me, what's really magical about Scala Native is that it's relatively straightforward. I can write a short sort of blog post length article or chapter and show people how to do it. It's not like, oh, you have to study this in grad school for six months,
Starting point is 00:25:03 sort of advanced technique. I feel like it just makes it so much more approachable. And I'm hoping it makes it possible for us as a community to build libraries that have these sort of more robust and elegant behaviors when handling intense amounts of data, honestly. It's interesting. Scala is a very unopinionated language, I guess some people would say, except in one particular domain, which is that it has to be garbage collected. So that's the one constraint that's been thrown away now. Well, it's interesting because it's garbage collected and it isn't. The thing that I think is subtle and that comes with practice with Scala
Starting point is 00:25:41 native is being able to sort of keep these two universes in your head. You have one universe of regular Scala objects that are garbage collected. You have all of the immutable data structures and for loops and all of the niceties of real Scala, right? But then you also have the option of using these manually managed pointers and off-heap data structures and structs and arrays, which the garbage collector doesn't touch. And I think if it were all manual memory, I think it would be – Skala Native would be just as hard to program as C, frankly. It would be a step backwards. For me, really, the magic is having garbage collection,
Starting point is 00:26:28 all the idioms of regular Scala, but then being able to manage manually a handful of large custom data structures for the critical paths of a program. And sort of finding the balance between these two domains, I think is really where the art is. And I think where I'm still evolving, certainly. It's an interesting point that I guess we probably didn't make clear. So Scala Native can be used just like Go or something, right? Go is native and garbage collected. And that's about it, right? But also with Scala Native, you can bring in the standard C library, and then you can start doing manual memory management, you know, programming like it's 1995 or whatever. Yeah, that's a really good distinction to make. Go is a really good
Starting point is 00:27:16 example of a native, relatively high level language that has a garbage collector and has good ergonomics. And I think Scala Native is competitive at a lot of the same things Go does. And in some ways, Go might be a better direct comparison for Scala Native than Rust in some ways. The other thing that I compare it to a lot, which might be a little more obscure, would be Standard ML or OCaml, which are somewhat more academic functional programming languages, but are also strict languages like Scala, have great immutable data structures like Scala, have awesome garbage collectors like Scala Native, and so on. But in practice, I don't think there's a lot of folks out there who've experienced doing systems programming in modern, high-level, functional, capable language like this.
Starting point is 00:28:13 Scala Native is the first a lot of people have heard of this sort of style. my book is really about and trying to just show people this whole different way of writing both native code that's much closer to the machine, but also with all of the quality of life and niceties of just writing regular Scala too. Yeah. So sort of your, like I could imagine that just your sort of tight interloop performance critical parts might be something you might want to manually manage? Yeah, exactly. And then you would use regular Scala collections and strings and all those niceties for configuration, for networking. I mean, it's always going to be a balance. And when you're at the point where you're writing a program which has specialized performance needs, every program like that is going to be unique in some ways. So it's always the experience and judgment you have of, well, which is the tight inner loop I actually need to optimize?
Starting point is 00:29:19 In some cases, that's obvious. If you're sorting four gigabytes of data, it's the sorting, you know? But in real world programs, it's not always that clear cut, which is definitely a challenge for this kind of programming, I'd say. So your book is called Modern Systems Programming, and spend some time on the C-Standard library. Why should we be interested in it? Yeah, you know, I called it modern systems programming sort of to contrast it with the books I learned see systems programming from. The one that's closest to my heart is the Stevens and Rago, Advanced Programming in the Unix Environment, which is the size of a brick, right? And like a large brick, not a little brick.
Starting point is 00:30:05 But it's a great book, one of the best technical books I own. It's totally encyclopedic, like everything you can do in C with a Unix OS kernel is in there, basically. But it doesn't even cover networking. If you want to do anything involving networking, there is not one but two additional brick-sized books, also by Stevens, on networking in the Unix environment in C, which, again, are encyclopedic and amazing. But it's just so alien from the way we write code now, where everything is on the network. You make REST calls at the drop of a hat, right? Everything is a server. It's not reasonable to say, well, if you want to write close to the
Starting point is 00:30:46 metal, you need to read these three brick-sized books. So the approach that my book takes is that we introduce the fundamental C system calls you need to work with data, to allocate memory, and to do basic TCP networking really early. We fly through some small, lightweight programs that exercise them, and we really bootstrap to the point where you can write from scratch without any support libraries, a simple HTTP client or a really simple but rugged HTTP server within the first 100 pages of the book or so. And I guess that's what I mean with the modern approach is that I'm not treating networking as an obscure, advanced topic. I'm really just putting it out there in front because it's absolutely critical
Starting point is 00:31:37 to every program we write nowadays. Yeah. And then the second half of the book actually doubles down on that. And that's all the stuff that I'm going to be releasing over the next couple of months. But essentially, what we're going to do in the second half of the book is we're going to bring in a C library called libuv, which is the event loop and networking library that Node.js also is based on, but it's a C library, not a JavaScript library. And sort of building up a fully asynchronous, highly usable, well-designed Scala library around asynchronous IO with C-level performance, basically. Yeah, it's a different perspective. So I'm a Scala developer. At my work, there are people who are c programmers i don't feel like we always speak the same language and then it's interesting in your book you're like let's build
Starting point is 00:32:33 a web server in scala so first let's look at how you you know listen for a socket using the standard c library like that is a not the approach I would normally take if writing. Yeah, it's, I sometimes have some ambivalence about that kind of presentation, because this ad, like, I think a lot of books would say, oh, just use this library, do you know, HTTP server dot serve, poor ad, or whatever, right? And that's certainly the right way to build a web server at work. You don't want to write a web server framework from scratch for your job, and you don't want to have to maintain someone else's from scratch web server ever. I think we've all been in situations where we've had to support sort of gratuitously DIY code that someone has done. And I think we all know the downsides of that. For me, the reason to sort of embrace that
Starting point is 00:33:34 gratuitous DIY spirit in the book is, first of all, it's a way to get the reader really intimately comfortable with exactly what the operating system does and is responsible for, both to show how much there is, but also in some ways how little there is. And then to demystify the libraries and frameworks we use for this every day, right? Because it is somewhat insane to write an HTTP server from scratch. But then you realize, oh, I can write an HTTP server that can handle a couple hundred or a couple thousand requests a second in less than 200 lines of code. And if you realize that's within reach for any Scala developer, really,
Starting point is 00:34:19 I feel like it opens up so many things. It makes it a lot more believable that we could write new pure Scala libraries that replace things from the JVM like Netty, right? That we're unlikely to ever be able to port to C. So maybe that's the sort of oblique strategy of the DIY approach is that it really just opens up the possibility of building this new ecosystem. And I hope this better and simpler ecosystem than the sort of JVM environment that Scala sort of bootstrapped itself upon. Yeah, I love the approach. I picture you playing around with Scala Native and you're like, okay, you want to use whatever, Netty, and you can't. And so you turn around and you have your three giant brick books and you're like, well, I know how to open a socket. And you're just
Starting point is 00:35:15 coding away. So is this an attitude that's lacking in the high-level language world? You know, I'm hesitant to even generalize about the high-level language world? You know, I'm hesitant to even generalize about the high-level language world in general. What I would say is that I think there's probably a crisis across software development about dependencies and libraries. And a lot of the really serious examples of this come from Node.js and the NPM ecosystem, right? Where if one library gets taken down or compromised, like LeftPad or EventStream, hundreds or thousands of upstream libraries and any number of large, serious projects can get harmed. And I think it comes from this notion that it's easier and faster to grab a library off the shelf for every possible need we might have.
Starting point is 00:36:14 And maybe this is me editorializing or getting a little bit cranky. In fact, I'm sure this is me editorializing or getting a bit cranky. Run with it. run with it. Yeah, no, I'm not sure that many of the things we use libraries for are that hard. And maybe we should consider the possibility of having programs that are a little more self-contained and that don't necessarily rely on, you know, a hundred or 200 dependencies to tick every possible box.
Starting point is 00:36:50 I'm a big fan of writing software that is simpler and more rugged or more performant that can be relied on. And sometimes it means taking a different approach to that. It can also mean thinking more about infrastructure and how your code is going to be deployed and figuring out how to rely on infrastructure to solve a problem. And just thinking about this whole life cycle of your code and the environment it lives in and what it does and what it doesn't need to do. So what do you mean by that? So a really good example of this would be something that I just put out on GitHub and on Twitter was a Scala native runtime for AWS Lambda, actually. I don't know, your readers might not be so familiar with it, but a couple months ago,
Starting point is 00:37:41 AWS announced custom AWS Lambda runtimes. Previously, there were only three or four languages that you could run a Lambda function in. And it was, I think, JavaScript, Python, C Sharp, and Java, right? Yeah. The idea is you would upload your code or a jar or some other archive with a runnable artifact. And then AWS managed a runtime that something like that, you actually just get a bare Linux VM with sort of a magical local HTTP endpoint that's provided by Firecracker, basically by the new AWS virtual machine monitor, right? And any executable program in any language you can write if it wants to serve lambda functions
Starting point is 00:38:48 it just has to hit this little local http endpoint wants to get a request in and then it hits it again with a response and that's it like there's no encryption there's no request signing because it's all local none of this is even going over the network between my code and the sort of virtual machine boundary that's standing between my code and the rest of AWS and the rest of the internet. So it's this really fascinating model where it allows you to write really simple code, you know, because I already had like a, an HTTP client in Scala native from the book, in fact. So I took that and I added about 20 lines of additional Scala code to just hit the two endpoints basically in a loop. And that was it. That's all it takes to like provide an AWS Lambda runtime now.
Starting point is 00:39:45 Because there's this beautiful synergy between a really streamlined interface and really sophisticated and elegant infrastructure and API design. So some of it is that our dependencies can be services rather than actual libraries is that yeah yeah that's a really great way to put it that instead of having dependencies on 15 libraries with 15 different interfaces if your dependency is a service then you might only need to implement one protocol to to work with all of your different dependencies an interesting interesting thing is containers. So the JVM sort of is an early type of containerization in a way, right? It's a little VM that runs the same everywhere. And now, like I regularly am shipping to production, you know, JVM processes that run inside of containers. Yeah. Yeah. And it's really interesting how that there's a bit of an impedance mismatch there, and that so much of the complexity of Java as an ecosystem is built around things like runtime class loading, right?
Starting point is 00:40:53 With the ability, presumably, to swap versions of an application in and out within a larger sort of mothership-grade Java-like super server that's hosting lots of such applications. And now with containers, we've accepted and I think pretty happily adopted sort of shipping these immutable artifacts for all of our code. But we sort of still are paying the memory overhead and the runtime complexity of this virtual machine that's designed with, honestly, much more sophisticated capacities that we just aren't using. Yeah, you're a fan of the operating system.
Starting point is 00:41:32 I don't know. I mean, a fan? I guess I have an interest in the operating system. You know, I can complain about the design of any particular operating system for days, right? Linux can be a pain to work with. Linux is also amazing, but there are parts of it that can be a pain. I mean, from writing this book and trying to explain even basic Unix socket APIs, I could go on about how poorly designed some of the socket APIs are, right? And like how little sense some of it makes if you try to write down how it works. And that was something I really struggled with. So it's not as much that I'm a fan of it. It's more that I feel that it's real and it's there. And the more we try to abstract over it, we'll both, one, our abstractions will get messy.
Starting point is 00:42:27 The underlying details will leak because they inevitably do. And then it will end up working with a sort of unpredictable and not really performant abstraction that's probably worse in every respect than working with the underlying, somewhat poorly designed operating system interface in the first place. Yeah, like I feel like probably rightfully so. You may more than others solve an architecture problem
Starting point is 00:42:57 by leaning on the resources that the operating system provides. So recently I was like downloading, I was using W get to like scrape some website and then it was slow. So I wanted to run like multiple in parallel. I like looked at the documentation and it's like, just keep starting it up with the same arguments. Right. Nice. So, so I saw that as an example, right? They were coordinating just by before it downloaded a page, I think it creates the file. So then if there's multiple instances, they just move on. So this type of design maybe is less
Starting point is 00:43:30 common where you are utilizing what the operating system provides. Yeah, I think that's a really great example, actually, just because like if you did that with a shell script, you could run that pretty ridiculously fast, just because shell scripts are great for spawning dozens of processes in parallel, whereas every modern, high-level, pleasant-to-use programming language doesn't expose multi-process programming elegantly or at all, in fact. We all are used to threads, right, where you have multiple contexts of execution, potentially multiple CPUs going at once within the same sort of memory address space. But actually having multiple separate processes running within a single logical
Starting point is 00:44:20 program in some sense is actually a little more alien. And it's something I do get into precisely because it opens up things like spawning 15 WGIT or curl instances or something like that. Yeah. But where do we learn about this? Besides your book, do we just have to go to your three tomes of Unix system programming? I mean, those are great. Actually are a lot of really good resources online for stuff like this. Beej's guide to C covers a huge amount of this stuff. Unix sockets, memory management, fork and multi-process programming.
Starting point is 00:44:59 The actual Linux man pages for this stuff are also really, really good. And I've spent a lot of quality time with them while trying to write a book about this. There's a few somewhat older programming languages that have good support for a lot of these multiprocess programming techniques. And just for like sort of Unix style programming in general, I guess they're not even that old. But Python and Ruby both have a slightly closer affinity to C into the operating system than like Java or Godo in some sense.
Starting point is 00:45:34 So like Gary Bernhardt has a lot of really great podcasts where he does pretty similar kind of low level systems programming to what I do in Ruby, right? And just write stuff from scratch using basic C system calls. Because Ruby also has a pretty good, obviously, untyped C FFI. And then Python actually can be pretty good for aggressive multiprocess and off-heap data structures. You can do some pretty cool stuff with their multiprocess and numeric computing libraries, where you can have multiple processes with isolated state, mounting a shared memory map with a giant array in it or stuff like that. That's probably where I picked up a lot of these techniques originally, for the more aggressive process and off-heap stuff, actually. Very cool. Yeah, I mean, the JVM, like, I guess we've been bagging on it, but it's quite performant generally. So it's not often where I have to reach.
Starting point is 00:46:31 Like I can see why Ruby, maybe you're more likely to have to do some FFI. Yeah, no, I mean, and that's a really good thing to call out. And I love the JVM. The JVM is great. The quality of the JVM's just-in-time compiler is, I mean, it's a really phenomenal piece of human engineering. It's so good for so many things,
Starting point is 00:46:50 and I'm not hating on it. I think I just, it's more like I have qualms about its applicability to the situation we find ourselves in, you know? Maybe it doesn't fit that well with containers. Maybe it doesn't fit that well with really large on heap data structures. Maybe it's not perfect for latency sensitive distributed systems. You know, those are sort of the angles that I find myself sort of pushing against the limits of the JVM, if that makes sense. Yeah, no, it totally makes sense. And yeah, I mean, I think containers for certain are,
Starting point is 00:47:25 are the way forward. So yeah, I don't know what that says long-term for the JVM. You see a lot of new languages are, are native. One thing, one thing I noticed was that the standard C stuff that you interrupt with, uh, like some of it by my maybe more modern taste, seems somewhat insane. I don't know. To use the technical term. No, it's really true. There's this particular technique called a type pun, where you have types that are genuinely unrelated to each other, have totally different structures, different sizes, right? And in some cases, you'll have types like the socketter type, which nominally exists and has a size of like 14 bytes, but you can't actually instantiate it, because there is no generic instance of a socket address. Instead, you do things like you allocate an IPv4 or an IPv6 address, which will be twice or four times as large. And then you just cast it down to this other much
Starting point is 00:48:28 smaller data type and pass it into a system call. And that's terrifying, right? Any rational person doing that would think, oh my God, the system is going to chop off the last eight bytes of this thing I just passed in when I cast it, or it's going to overflow or something else, right? Because certainly anytime you're working with C, you're going to be scared of overflowing pointers. And when you have these basic foundational APIs that require you to do unsafe casts that are effectively overflowing buffers going into the kernel, it's sort of horrifying that this is how everything works. It's all of this. But a lot of that has historical reasons. The POSIX socket API, I think, mostly got written and mostly implemented around 1981, 1982, which is a really long time
Starting point is 00:49:21 ago. It's close to 30 years now, right? And it actually doesn't quite predate like modern ANSI C, the sort of standardized modern C we all know and love. But it sort of came into being around the same time. And a lot of the implementations didn't have access even to a full compliant standard C implementation. And what that meant was that some of the nicer features C has, like C has union types, which are a more clean and sort of type safe way to represent sort of related types in the same, that'll sort of slot into the same memory space. It's not quite as elegant as sealed trait and Scala or like subtyping in a OOP language. But it fulfills the similar role in a modern C program.
Starting point is 00:50:14 So this is something the C language has, and it just wasn't there or wasn't stable enough in time that a lot of these legacy APIs got rid of, which is both a little terrifying, but also maybe it's encouraging. We have all these new systems programming languages coming up. People are writing new virtual machine monitors and new operating systems and things like Rust and Go even. And I think if people can write these things, and certainly in a garbage collected language like Go, the notion of using Scala native for a sort of real, deep and like low level implementations of some of these primitive facilities, it starts to get really exciting. It makes you wonder what could we do if we tried to implement a fundamentally sane network protocol, for example, which might not be something we have yet.
Starting point is 00:51:10 Yeah. So it does seem like an opportunity. I can see what you're saying. Like just Scala native could have a wrapper around whatever string copy and it could provide a little bit of sanity. Yeah. And that if you might be able to find a sweet spot where you provide more sanity than C does, but you don't have the overhead insanity that the Java string API introduces. I think what I'm finding from writing this book is there is a sweet spot, in fact,
Starting point is 00:51:42 that is much closer to the metal, but also much saner than, you know, what someone, some really smart person designed in a hurry 30 or 40 years ago. Yeah. And we have more expressive languages, better compilers now, like a lot more validation can be done. Yeah, it's very true. I think that that's probably most of my questions. Let me take a look here. Is there anything that we didn't cover that you'd like to talk about not that i can think of off the
Starting point is 00:52:09 top of my head i really appreciate you uh giving me the chance to talk about all this stuff yeah it's interesting yeah like i'm not a systems programmer and i like the approach of the book because it's easier for me to take in than reading something and see i suppose like it's just it's easier for me to take in than reading something in C, I suppose. Like it's just, I'm used to Scala, so. Yeah, like this book, it's not a reference in the way that one of the brick-sized tomes is, but I hope it does open this up to more people than some of these older, larger, scarier books do. So that's really good to hear.
Starting point is 00:52:42 Awesome. Thank you for your time, Richard. It's been fun. Yeah, Adam, to hear. Awesome. Thank you for your time, Richard. It's been fun. Yeah, Adam, this has been awesome. Thank you so much. That was the show. Thank you for listening to the Co-Recursive Podcast. I'm Adam Gordon-Bell, your host. If you liked the show, tell a friend about it to help spread the word. Or you can join our Slack channel or spray paint our website on the back of a bus. Or maybe not that one. This month on the co-recursive Slack channel, there's been some interesting talks.
Starting point is 00:53:12 User GraphBloodwurst, great name by the way, has been working on an interesting project, Scala Z Schema. And he has volunteered to explain context-free algebras to our group. I have no idea what those are, but hopefully I will soon. I looked at some of his PRs and it's a bit above my head, so I always have more to learn, but I promised I would give him a shout out in the podcast. Also, user Joe, nice short name, he's been asking questions about learning functional programming. He's been learning in F sharp and now he's dabbling a little in Scala and trying to get the feel for what is the best language to kind of learn some functional programming.
Starting point is 00:53:49 So these are the types of discussions happening in the co-recursive Slack channel. Check it out. Until next time.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.