Advent of Computing - Episode 54 - C Level, Part II

Starting point is 00:00:00 I got some breaking news for you. You have, in fact, been exposed to C, and you might not have even realized it. The simple fact is that C, the programming language, is deeply embedded in our day-to-day lives. Most major web servers, the actual software that handles requests on the internet, are written in C. Going deeper, operating systems rely on a whole lot of C code. Linux is the obvious one. The Unix-inspired system that runs the modern world is written entirely in C. That means everything from embedded devices in cars and stereos on up to some of the largest supercomputers, well, they all rely on C to operate. macOS shares a similar distinction. Modern versions are Unix-based, so under the hood,

Starting point is 00:00:53 we run into C yet again. Windows isn't Unix, but its core components are, at least reportedly, written in C also. So ask yourself, have you used the internet today? Have you logged into a computer running Linux, macOS, or Windows? Smartphones count here too. iOS uses C in its kernel, and Android is just a variant of Linux itself. If you answered yes to any of those questions, then you too have benefited from C. Now, what makes this wild, at least to me, is that C isn't some new hotshot language. It's approaching 50 years old. In that time, we've seen the entire computing world change.

Starting point is 00:01:41 There's types of computers, types of processors, even ways we use computers that just didn't exist when C was created. So how has C been able to stick around in such a changing world? And how has it remained ubiquitous for so long? Welcome back to Advent of Computing. I'm your host, Sean Haas, and this is episode 54, C-Level, part 2. Today, we're going to wrap up our discussion of the C programming language, at least for the time being. And, you know, this time we're actually going to talk about C. Last episode was really heavy on the background leading up to the language itself. If you haven't listened already, I'd highly recommend going back and checking it

Starting point is 00:02:36 out. As always with the series, my usual disclaimer here is that the last episode isn't strictly needed to enjoy this part, but it will add a lot of context and deeper understanding to today's discussion. So, with that aside, what exactly is today's discussion? Well, last time we made it all the way up to B. That's the language that comes right before C. When we left off, the crew at Bell Labs were in this weird transitionary period. B was up and running. It had some issues, but it showed a lot of promise. Thompson and Ritchie were still trying to convince management to throw some money their way for a newer and more powerful

Starting point is 00:03:20 computer. So that's roughly where we're going to start, the tail end of B and the early development of C. Now, it's probably time I let you in on why this is a two-parter exactly. Most of this episode is going to deal with the early spread and adoption of C. It's something that I explained in about a sentence last episode, but that's not nearly enough. I want to examine why C became such an important language, while other really similar languages just didn't. I think this is a great opportunity to look at what makes a programming language popular. We have examples of languages close to C in the overall family tree, specifically BCPL and B, that just never rise to the same level as C itself. But more than just being an interesting case study, the rise of C is an interesting story.

Starting point is 00:04:14 It relies on a lot of moving parts, everything from how academia works to early Unix distribution, and even down into more technical details. We have a lot of ground to cover, so let's get into how C became the force of nature that it is today. We're starting off back in the depths of Bell Labs. Ken Thompson, Dennis Ritchie, and an ever-growing cast of programmers were toiling away on a side project that would, at least very soon, be called Unix. Their new operating system was coming along nicely. New tools were being developed every day. Programming the system was starting to feel more and more fun as its complexity increased. And, most importantly, a working B compiler derived from the earlier BCPL language was helping ease their assembly language woes. But despite outward

Starting point is 00:05:07 appearances, there were some issues going on in the lab. The operative problem was what all programmers deal with at some point, that's hardware. The Unix team was working with a PDP-7 mainframe, which packed a very diminutive 8 kilobytes of RAM. It was a pretty cramped environment to say the least. It's easy to just say that RAM limits put a hamper on development, but for Thompson, that theoretical code complexity ceiling had actually been reached. He had tried getting a working BCPL compiler on the PDP-7, but it turns out that the program would have just been too big to fit in 8 kilobytes.

Starting point is 00:05:51 That led to the development of B, a massively pared down version of BCPL. The short term goal for Thompson and Ritchie was pretty clear. They had to get off the PDP-7 and onto better hardware, otherwise they could never really flex their muscle. Over late 1969 and early 1970, the duo started pestering Bell Labs management for a new computer. Initial attempts of, hey, can we get a really new, really big mainframe didn't really go over so well as one could imagine. But after a few months of trying and honing in their request a little, the team actually made some progress. One of the programmers working on Unix, Joe Osana, had been working on some software for text editing and

Starting point is 00:06:38 typesetting. It's the kind of application that would have been in demand around Bell or around any office. So, possibly a useful inroad to funding. The crew decided to put their weight behind that one aspect of Unix, the exciting and ever-intriguing field of text processing. This changed tactic actually worked. Ritchie remembered their successful request like this, quote, It differed in two important ways. First, the amount of money, about $65,000, was an order of magnitude less than what we had previously asked. Second, the charter sought was not merely to write some unspecified operating system, but instead to create a system specifically designed for editing and formatting text,

Starting point is 00:07:28 what might today be called a, quote, word processing system, end quote. A more direct approach targeted at something that Bell Labs could actually use paid off. By the summer of 1970, Unix had become a fully-fledged official project, with funding and a cushy new PDP-11 computer. But new hardware didn't solve every problem overnight, and moving on to this new system made the cracks in B all the more clear. In 1970, Unix, at least the core components of the system, were all written in assembly language. The eventual plan was to rewrite everything in some high-level language,

Starting point is 00:08:12 but the team had yet to find the right language to commit to. So when the time came to make the move from the PDP-7 to the larger PDP-11, everyone just buckled down and started the process of rewriting Unix from scratch. The first part of Unix didn't make use of high-level programming languages. It didn't even use B. It was a near-complete rewrite, which must have been grueling to say the least. And I think this is a case where a total rewrite was inevitable, especially a rewrite in assembly language. Thompson and Ritchie had wanted to replace assembly language with some high-level language, but the PDP-7 just didn't have the resources to do that. So the first port to escape from 8K was gonna hurt a little bit. There wasn't an easy way around that. Once Unix was up on a new computer,

Starting point is 00:09:06 one with more space, then more options would start to open up. A bigger and better Unix started to take shape on the PDP-11, and B followed very soon after. Dennis Ritchie handled the port of B, and this is a great example of how earlier work paid off. As I went over last time, B was built specifically for portability. It was self-hosting, meaning that most of its compiler was written in B itself. The only section that was written in assembly language was the library of calls used by the final output program. All Richie had to do was make a few modifications, rewrite a library or runtime code, and call it a day. B worked fine on the PDP-11, and given the new space, the language started to see more use. One early application of B was DC, or Desktop Calculator. It's a relatively simple command

Starting point is 00:09:59 line utility for processing mathematical expressions, But this ended up being essentially the most high-level use of the language within Unix. The simple fact was that B had some serious issues. Some had to do with its compiler design, while others were more deeply seated in the language itself. So let's start off with the implementation issues and then move more towards why the language was never a favorite. The B compilers, both for the PDP-7 and PDP-11, were implemented using a technique called threaded code. I discussed this at some length in last episode.

Starting point is 00:10:38 In short, each line of a compiled B program was a call out to a library of functions. of a compiled B program was a call out to a library of functions. That library was the only thing that needed to be rewritten when porting the compiler, and like I mentioned, this worked out really well for portability. It made Richie's life a lot easier, to be sure. However, it came at a cost. Threaded code is intrinsically slower than other options. Each line calls a function. That means that each line introduces a little bit of processor overhead. Just like with the phone company, there's no such thing as a free call. That means running a program implemented in threaded code will always be slower than a

Starting point is 00:11:22 more traditionally written program. code will always be slower than a more traditionally written program. This could have been overcome by rethinking how B's compiler functions, but that assumes that B was actually worth salvaging. The other issue brings us to the fun and arcane realm of computer memory. B didn't have data types. All variables were just the generic bit pattern. That specific quirk came from BCPL, an earlier language and Thompson's inspiration for B. The idea here was to make the language as machine-independent as possible. Computers in the 60s all treated memory a little bit differently, so making a language that treated memory in some generic fashion, well, that let you gloss over a lot of those

Starting point is 00:12:11 differences. It's a sound approach, but once again, every choice you make comes with some kind of tradeoff. BCPL was able to get away with the whole no data type thing because it was explicitly meant as a systems programming language. That is, it was only designed to be used for programs such as operating systems and compilers. You don't really need floating point math for those types of applications. You can get away with purely integer math with relatively small numbers. But when you need to deal with a decimal place, well, then you're on your own. The weight falls to the programmer to implement code that can, you know, somehow treat a generic chunk of memory as a floating point number. The same goes for

Starting point is 00:12:58 textual data. You have to write a lot of boilerplate for dealing with different types of data. That cuts into valuable energy that a programmer could be spending, you know, implementing actually useful software. Getting into the more fine details, the whole no variable type approach started to fall out of usefulness as computers started to change. At least, this specific approach to not having variable types. And this is where we get really deep into the mountains of memory madness, so prepare. The gist of the issue came down to this. B and BCPL use an entire word of memory as the smallest size of a variable. This was a design choice made around

Starting point is 00:13:47 computers in the 60s. The machines BCPL was developed for, and the PDP-7 that B was developed for, addressed memory in terms of words. A word here can be of varying sizes, but they usually encode more than just one byte of data. Now, the size of a byte is also a historically variable thing, but let's try to gloss over that for right now. This word versus byte thing starts to matter when we look at one specific application, text. Now, generally, a computer will encode a single character as a single byte, so it's most convenient to work with text in terms of byte-addressed data. BCPL, and by extension B, didn't do that. The flashy new PDP-11 could address memory by byte, so it could natively handle text a little bit better than B or the PDP-7 for

Starting point is 00:14:48 that matter. This mismatch meant that any time you had to deal with text in B or in BCPL, you had to deal with packing and unpacking characters in odd-sized memory cells. It's not backbreaking, but it's another little annoyance you end up having to write code to make up for a shortcoming in the language. To summarize all that really quickly, B was simplified, but it was too simplified. Its combinations of features, plus how its compiler was implemented, made the language really easy to port. It also made implementing a compiler a breeze. But at a certain point, those benefits don't make up for the fact that B wasn't all that useful. The overarching goal with the language was to eventually replace assembly language,

Starting point is 00:15:39 basically to become THE language for Unix. But B wasn't quite the right language. It was close, but it came up short. The Unix team still needed a system programming language, so Richie set to work. In 1971, I began to extend the B programming language by adding a character type and also rewrote its compiler to generate PDP-11 machine instructions instead of threaded code. Thus, the transition from B to C was contemporaneous, with the creation of a compiler capable of producing programs fast and small enough to compete with assembly language. I called this slightly extended language NB for New B, end quote. Of course, New B didn't stick as a name. Pretty soon, it was just called C. As is the usual MO with programmers, the new name was picked partly as a joke. As Ritchie put it,

Starting point is 00:16:45 Now, this initial change was small. Ritchie added character and integer data types in addition to existing generic types, and he wrote a more reasonable compiler implementation. Interestingly, this is a step back in the overall family tree. Remember that B was a simplified BCPL, BCPL was a simplified version of CPL, and CPL was a more hardware-cognizant version of ALGOL. BCPL was the point where data types were dropped in favor of generic data storage cells. So in that sense, C was a return to an earlier form. In other words, adding back variable types was a big deal. It's something that we need to look at for a little bit. And, well, let me just say one more time how important memory is to the overall equation here. Each type of variable that Richie added to C had a different memory footprint, and was handled by the compiler in

Starting point is 00:18:00 a different way. Characters, or just care in scene lingo, are a single 8-bit byte wide, or I guess a single byte, but by this point, the byte was pretty standardized as 8 bits. Moving on up, integers, or int, are one word wide, so twice the size of a care. Going further up the pyramid, we eventually get into larger sizes for longs, doubles, floats, and other numeric types. This matters because, unlike in BCPL or B, a C programmer can very easily specify how big a chunk of memory they want. That being said, Richie made a very smart decision early on to retain some of the usefulness of B's anti-typing system. To C, a character and an integer are still just chunks of generic data.

Starting point is 00:18:55 You can try to print an integer, and you can try to perform mathematical operations on a care. Oftentimes, it's even advantageous to do so. On the surface, that might seem silly. Why would you want to run arithmetic on textual data? But it does serve a purpose. Here, Richie is taking one of the better things about B and BCPL, their generalized approach to variables, and extending it into a more useful and flexible form. Arrays are another place where C made some changes. Now, this gets us deeper into the fine detail, so I'm going to try not to dwell on it for too long.

Starting point is 00:19:34 Basically, C treats arrays as pointers to a list of values. That means that accessing the nth element of an array is the same as referencing a pointer to that array plus in addresses. So, what are pointers, you ask? Well, that gets really complicated and hard to explain without a diagram. For now, let's just think of them as a reference to some address in memory, essentially a variable that points at some other data. Pointers were implemented in B and BCPL, but C switched things up so that they were better suited for byte addressing. It made things like array access or even just accessing data in memory a little bit

Starting point is 00:20:19 faster, and those gains started to show over time. The final piece of this memory puzzle that I want to talk about is the structure, or struct. This is, at least in my opinion, the most useful thing that Ritchie put into C. A struct is, like the name may suggest, a way for you to define your own custom data structure. Think of it as a purely custom variable type composed of other variables. Now, maybe an example will make that a little more clear. Let's say I need some structure to hold data about widgets. Maybe I'm writing a widget management program. Each widget has a weight and a name. That name can be up to 8 characters long. So I can just define a struct called widget that is composed of an integer called weight and an 8 element long character

Starting point is 00:21:12 array called name. I only have to define that struct once. Then I can use it later in my code. Anytime I need to deal with widget data processing, I can just create a new widget variable and fill its weight and name. From the outside, structs are a great way to organize data. If C's variable types just aren't cutting it for you, then you can define a new one. But there are more uses for structs than you may initially think. Remember, we're still in memory-ville. Everything about C is in memory-ville. So to C, a struct is just another chunk of memory.

Starting point is 00:21:54 When you define a struct, you're explaining to C how that chunk of memory is broken up into smaller variables. For my widget example, I'm specifically defining a 10-byte long section of memory, 2 bytes for the integer, and 8 bytes for the character array name. The members of that struct get stored in the order they're defined. All your bits and bytes are packed up in memory in a predictable and predefined way. So here's the trick to all this struct stuff. If you get past some random pile of data, you can tell C to treat it as a struct. From there, you can start to actually make sense of just any old data. So if I know that I've been keeping all my widgets at some location in memory, I can just say, hey C, I have some widgets at address X,

Starting point is 00:22:47 can you help me get their names? And the language will happily oblige. At least if you remember the correct syntax and invocations. This all sticks with the ideology of B and BCPL, of generic data handling for portable code. Richie's just building a more complex and really a more usable environment on top of that. To bring this full circle in what I think is a delightful way, Richie's changes weren't made totally out of the blue. He openly admits to drawing inspiration from a place that we should, at least now, be familiar with. Quote, from a place that we should, at least now, be familiar with. Quote,

Starting point is 00:23:31 The scheme of type composition adopted by C owes considerable debt to Algol 68, although it did not, perhaps, emerge in a form that Algol's adherents would approve of. The central notion I captured from Algol was a type structure based on atomic types, including structures, composed into arrays, pointers, and functions. End quote. When people say that C is an Algol-like language, well, it's more complicated than just a long lineage. There were multiple points of entry for Algol's influence. However, we aren't off in Algol's own territory of theory and good ideas constructed by committee. Ritchie wasn't just adding structs and following data dogma because it sounded like a good idea. True, he drew inspiration from Algol 68,

Starting point is 00:24:20 but it was tempered by real-world practicality. To prove my point, we can look no further than structs themselves. Early versions of C were working around 1972. That year, Ken Thompson sat down and tried rewriting Unix in the new language. After all, the whole reason for C's existence was to be a language for Unix. But there was an immediate problem. The early version of C didn't have structs, so managing data quickly became a nightmare. Thompson dropped the attempt, and Ritchie went off to implement structures. C was getting features based off real-world use cases with the backing of some theory.

Starting point is 00:25:13 For the first few years, it existed as a very living language, and that would continue. By 1973, C reached a more usable and more familiar form. The crew around Bell Labs started writing out libraries for the new language. The self-hosting C compiler was ported to just about any machine that the team could find. And, in general, the language sprinted towards maturity. Most importantly, 1973 saw the beginning of another rewrite of Unix. This time, the operating system was written entirely in C. Alright, that's time to stop and take a quick breath. That's essentially all the more technical details finished. Once we hit 1973,

Starting point is 00:25:57 C is complete, for the most part. There are still some changes that will be made in the coming years and decades, but 73 is where the language emerges in a meaningful and recognizable form. From there, things take a bit of a twist. C is the result of a long process of converting theory into practice and then adding back in some theory. In a lot of ways, it's a very practical language that takes real-world constraints into consideration. This happened outside formal academia. B and C were both developed on the industry side of computing. But with C, something interesting starts to happen. We already got a little bit of a taste with the whole Algol 68 connection. C brings the

Starting point is 00:26:44 lineage full circle, connecting back up with at least a little bit of a taste with the whole Algol 68 connection. C brings the lineage full circle, connecting back up with at least a little bit of the world of theory and academics. Now, just to be 100% clear here, I'm not knocking academics or trying to flatter folk and industry. They're two sides of the same silicon coin. I guess what I'm getting at is that C starts to exist with one foot in both worlds. From that unique position, it starts to spread, grow, and incrementally improve. It's hard to pin down the exact first time that C leaves Bell Labs. If you want to get really granular, then I invite you to get in touch with AT&T. See if you can go through employee records and find anyone who left Bell in the middle of 1973. My best

Starting point is 00:27:33 guess for C's official emergence is in mid-October of 73, at the ACM's Symposium on Operating System Principles. There, Thompson and Ritchie submitted a paper on the Unix operating system. The paper wasn't about C, but it did give the language a mention. The paper states Unix supports a, quote, compiler for a language resembling BCPL with types and structures, and in ellipses, C. Now, as far as I've been able to tell, that's the first time C is mentioned in the outside world. Just a single letter in little ellipses. But this was the jumping off point for a sensation. You see, once people heard about Unix, they wanted to learn more.

Starting point is 00:28:23 Thompson and Ritchie were talking some big talk. And, thanks to some fun antitrust lawsuits, Bell Labs was in a weird position here. They weren't actually allowed to sell software. Bell was restricted to telecom only. Unix wasn't telecom software. And an earlier DOJ ruling meant that they had to license this kind of work to anyone on an at-cost basis. So requests started trickling in for a copy of this new Unix thing. Bell, very literally and legally, couldn't say no.

Starting point is 00:29:01 Here's where C sneaks in. say no. Here's where C sneaks in. This early version of Unix was distributed as raw source code, and at first only to academic institutions. A researcher at some school would ask Bell for a copy of Unix. This request would end up getting forwarded to Ken Thompson's desk. He'd throw a copy of the full C source code for Unix onto some magnetic storage medium, write a quick note about the system, sign it, love, can, and then ship it off. In order to get Unix up and running, you had to first get a C compiler up and running. Then you had to compile the operating system. It was a bit of a process, but from this quirk of distribution, we get two big results.

Starting point is 00:29:46 First, Unix ends up on a lot of college computers. And secondly, mountains of C code end up on a lot of college computers. This is a place where C's design really pays off. Its portability made it so basically anyone who wanted to use Unix could, at least as long as they had a reasonably powerful computer to work with. In practice, most early installs were on PDP-11s, but the doors were open for more innovative projects. Now, that's the traditional story. Don't get me wrong, this is one major path for C's spread. As Unix worked its way into the broader world, C came along for the ride. However, there was another route that C took on its way out of Bell Labs. One of the big questions that I had when I started this series was,

Starting point is 00:30:40 what was the first compiler written outside Bell Labs? Or, maybe put a more dramatic way, who wrote the second C compiler? Now, on the surface, that may seem like a strange question, but I think this is central to understanding C's spread and influence. The whole point of C is that it's portable, powerful, and reasonably easy to implement. So what does that look like in practice? How long did it take for people to follow in Richie's footsteps? I really wanted to figure that out, and it actually took me down an unexpected path. It took me to MIT and the incompatible timesharing system. In my search for anything C-related published in the early to mid-70s, I ran across an interesting thesis called A Portable Compiler for the Language C, written by one Alan Snyder. He was a grad

Starting point is 00:31:41 student in MIT's computer science department. Importantly, the thesis is dated May 1975. So initially, I passed it over. That's a full two years after C was announced to the world. So I just assumed there must be something earlier. Well, first impressions can tend to be wrong. I think Snyder's thesis represents the breadth of C really well. The paper describes a theoretical framework for a highly portable C compiler. Snyder describes a formalized abstract machine, just like we've seen in earlier work done on BCPL. It's a very

Starting point is 00:32:22 structured approach to making a portable programming language. This isn't just talk. Snyder goes on to explain his implementation of this compiler for the PDP-11, and how it was easily ported to the Honeywell 6000 computer. According to Snyder, initial development of this totally new compiler took seven months. Even more impressively, the port from the PDP-11 to Honeywell took a single month of work. That's the kind of gains that portable code gives you. The context around Snyder's thesis, well, that makes it all the more important in our growing timeline. And, well, this is where I really need to thank Lars Brinkhoff once again for helping me out with this line of inquiry.

Starting point is 00:33:12 Snyder's C compiler was also ported to the PDP-10 computer running the incompatible timesharing system. Now, that's the last place I expected C to show up. ITS is somewhat famous for being written entirely in assembly language and, outside of that, making aggressive use of Lisp. But that's not the only language that it ran. Mars actually has the code for Snyder's C compiler up on GitHub, and after some discussion and further email threads, we can start to pull an

Starting point is 00:33:45 even more interesting story out of this compiler. My current working hypothesis is that Snyder was at Bell Labs sometime in 1973. This was either as an intern, a short-term hire, or maybe a researcher on loan from MIT. At that same time, Snyder was a grad student at MIT, working in the computer science department. Taking some time off, or even just a summer, to work at Bell wouldn't have been outside the realm of possibilities. Bell Labs is just a four-hour drive from MIT's campus, so not even that far to go for a chance to get in on some interesting research. This stint at Bell was how Snyder was exposed to C for the first time, and where he started to work on his very own compiler. The connection comes right from Snyder's thesis. In it, he wrote that his C compiler was implemented, quote,

Starting point is 00:34:46 thesis. In it, he wrote that his C compiler was implemented, quote, on the Bell Laboratory's Computer Science Research Center's PDP-1145 Unix system, end quote. That's the same general division and the same type of computer that Dennis Ritchie worked on. We also know that the two had personal contact because Ritchie mentioned Snyder in later writings. And this is where the dating evidence comes into play. C would set a sort of precedent for logical operators in programming languages. The language uses the double pipe, so pipe pipe, as a logical or, and the double and, so little AND AND symbols, as a logical AND. The distinction matters because there are also bitwise OR and AND operators. Prior to C, here I'm talking B and BCPL, the logical and bitwise operators were both

Starting point is 00:35:47 represented with a single character. The meaning was dependent on context. The compiler had to figure out if you wanted to do a bitwise AND or a logical AND just by looking at how you were using the operator. That's not very efficient. Initially, C followed the old-school approach. A single AND sign could be a logical or bitwise operator, depending on where you used it. Later on, the language switched to using separate logical operators. It became more explicit. This was a small change, but it helped to make the language more clear, and it was easier on the compiler. Now, here's the cool part. In an essay on the history of C, Ritchie says that he made the change on the recommendation of one Alan Snyder. So, logically,

Starting point is 00:36:42 we should be able to figure out when Snyder and Ritchie met by figuring out when that operator switch occurred. The first quote-unquote official C manual was published internally at Bell on January 15th, 1974. That manual includes the doubled-up logical operators, a footprint left behind by Snyder. This document was spread around Bell as a memo. Its cover sheet lists everyone who would receive a copy. Importantly, Snyder is not on that list. So by 74, he was no longer involved with Bell, or at least his involvement was minimal. Putting this all together leads to my best guess at what's going on here. Before the end of 1973, while C was still in its infancy,

Starting point is 00:37:34 Snyder started writing its second compiler. If that's not the case, then he was at least planning something along those lines before 73 was over. I think he had to be doing more than just writing some C code as a programmer. For me, the logical operator change proves that. It was a suggestion that affected both the language itself and its compiler implementation. He would have to be pretty deep into the language to come up with that small specific tweak. Now, with that established, why does Snyder's involvement matter? First off, this helped spread C to MIT really early in the language's life. By 1974, there were C programmers at MIT. The language had broken out of industry and into university

Starting point is 00:38:26 research. It was starting to live in more than just one niche, and it was starting to live outside Unix. Once again, this is where I have some thank yous to make. I got in touch with Elliot Moss, one of Snyder's colleagues at MIT, and he helped flesh out the picture for me. Snyder's colleagues at MIT, and he helped flesh out the picture for me. C was being used to write utilities at MIT, programming tools, and even the C compiler itself. Even before Snyder's thesis was complete, the language had a new home. This also really flexed C's portability in a huge way. Within Bell, C had been ported to a few machines, and Sider's thesis discusses the process of porting his new compiler to another. But outside the paper, C was also spreading. There is, of course, the aforementioned PDP-10 port. C running under ITS, an operating system

Starting point is 00:39:22 very, very different from Unix, is a huge point for its portability. The open culture around MIT, and specifically ITS, helped C spread further and faster. There were also ports to weights, that's Stanford's ITS-inspired operating system, and TOPS20, DEC's own in-house operating system. Snyder's portable C compiler made that possible, not just on a technical level, but also on an availability level. The final piece to Snyder's involvement is that it shows how C was growing. Ideas from outside Bell are entering into the language and its implementation. That's a sign of a really healthy language.

Starting point is 00:40:06 It means that the team working on C, Richie and his co-workers at Bell, were willing to take suggestions, and crucially, that C was already establishing itself as a language worth improving. The image we get of C going into 1974 is a language on the rise. It has a community forming around it, a dedicated platform, and it's spreading to other platforms as well. For me, what makes Snyder's compiler so important is that it wasn't working on just Unix. The ancestral home of C has always been seen as Unix, and that's with good cause. But I think that makes Snyder's work all the more exciting. C didn't just live inside Bell's operating system for decades. Almost as soon as the language was developed, it escaped. And just to complete this image,

Starting point is 00:41:00 Snyder's thesis is showing more formalized theory seeping back into C. The language is starting to become a well-formulated mix of theory and practice. It's a middle ground that everyone can find something to love in. But for all the excitement of MIT's involvement, we need to circle back to the more traditional timeline. C didn't just spread via Snyder's route, that was just one contributing factor. Call it a single crack in the bell if you want. The early spread of Unix formed another big crack that C leaked out of. The next route we need to look at is how C became part of classrooms around the world. To round out our discussion, we need to turn to the more traditional spread of C, and I mean that pretty literally. While Snyder's work shows that C wasn't just on

Starting point is 00:41:54 Unix, the simple fact is that C and Unix are deeply intertwined. One of the interesting results is that hacker culture starts to really matter when we look at the early adoption of C. Specifically, we're going to take a look at one route out of many. To do so, we need to get into more CS classrooms, research labs, and illegal books. So, how does a banned book fit into the spread of a programming language exactly? Well, I'm glad you asked. It all starts with John Lyons, programmer and Unix aficionado. In 1972, Lyons became a computer science professor at the University of New South Wales in Australia. This put him in a part of the

Starting point is 00:42:40 education pipeline that I haven't really covered as much on the show. As a university professor, he wasn't so much introducing people to computers or even teaching students to program for the first time. His students, at least hopefully, already knew how to program and definitely were able to operate computers. Instead, Lyons was working on the very final step of any good education, helping those with existing education polish their skills and understanding. These kinds of upper division courses tend to focus on application of your earlier education, and they can be pretty challenging. One of the mainstays in any upper division CS catalog, and a series that Lyons taught at UNSW, are classes in operating systems. Generally speaking, operating systems make for a great

Starting point is 00:43:34 topic for upper div courses. An OS is just about the most complicated program that you can write, so understanding how an operating system works is a really challenging and rewarding exercise. Plus, you know, operating systems are pretty crucial to how we use computers. So a course on operating systems really gives a good convergence of challenging and practical. The problem is, how do you approach this subject in a classroom environment? There are a few schools of thought about this. Perhaps unsurprisingly, it comes down to the whole theory vs. practice split that I keep bringing up. One tactic is to just teach theory, to talk about operating systems in this abstract form. This style, of course, would cover what each part of an

Starting point is 00:44:26 operating system does, why it does it, but wouldn't necessarily get very deep into the actual code that makes it all work. The other approach is for students to write their own toy operating system as they learn the fundamentals in class. This is a much more hands-on option. It gives students the chance to see for themselves how process control or memory management is handled. Plus, it's a chance to gain some practical experience programming a really complicated piece of code. However, there is an issue with the scale of these kinds of hands-on classes. A one or two semester course isn't really enough time to write a complete operating system. Students can go through the motions, but they don't hit a level of complexity

Starting point is 00:45:12 where their choices start to have massive impacts. I think just time sharing is a good example here. You can write up the code for multitasking and process isolation pretty easily, but if you only really have time to get a few processes running at once, you don't get a chance to tweak and iterate on your process. It's all a sterilized exercise. But there's a third option, one that was just starting to become viable when Lyons joined the University of New South Wales. That is, the case study. As Lyons put it, quote, In our opinion, it is highly beneficial for students to have the opportunity to study a working operating system in all its aspects. Moreover, it is undoubtedly good for students

Starting point is 00:46:01 majoring in computer science to be confronted at least once in their careers with the task of This exists as a sort of middle ground between pure theory and heavy practice. Students would work through an example operating system, familiarize themselves with its code, and learn how abstract ideas were implemented in the real world. This takes all the importance of abstracted study and throws in the complicating factors of actual programming done at scale. Lyons started developing his new operating systems curriculum in 1976, taking this third route.

Starting point is 00:46:47 operating systems curriculum in 1976, taking this third route. This was possible thanks to the easy availability of the source code for Unix. At the time, Unix was the only real option, full stop. Source code licensing was free for universities. As far as I'm aware, there weren't any other operating systems running under this model in the 1970s. Even better, UNSW had a PDP-11 running in Unix already. Students were already exposed to the operating system, and the college already had the source code handy. This allowed Lyons to change up his courses in a way that wouldn't normally have been possible. Over the course of May 1976, he wrote up a series of notes and annotated source code for his operating systems class. These texts were later refined and published as Lyons Commentary on Unix 6th Edition. These volumes, sometimes just called the Lion's Book or Lion's Commentary,

Starting point is 00:47:47 are an exhaustive explanation of the v6 Unix source code. The first volume is a more traditional textbook. It goes over operating system concepts and explains how they're implemented in Unix at a really fine level of detail. The second volume is essentially a printout of the Unix source code, but there are some caveats to that. This volume contains just the most central elements of Unix, not any utilities or really fancy hardware drivers. By careful curation, Lyons was able to reduce this down to around 9,000 lines of code. That's still a relatively large amount of source code to go through, but it is manageable. Since this is Unix we're talking about, then you should see where this is going.

Starting point is 00:48:36 Lions Commentary is also a case study in effective real-world use of C. Volume 1 included a short primer on C. It was assumed that going into the class, students knew how to program at least something. Volume 2 was just a few hundred pages of C source code. What's so interesting is that even without the Unix connection, C would have probably been a great choice for this kind of class. In the commentary, Lyons wrote, quote, you will find that C is a very convenient language for accessing and manipulating data structures and character strings, which is what a large part of operating systems is about, end quote. In other words, C is a good fit for operating system development. I mean, that shouldn't be a surprise. It better be, right?

Starting point is 00:49:29 It gives a programmer the control they need to work in bits and bytes. But more importantly than that, C provides a high-level way to approach this low-level task. It's a lot more clear than assembly language. You can actually read a line of C that someone else wrote and tell what's going on without having to study the entire program. And just as an aside, I think it's interesting that C here is filling a role really similar to that of BASIC. Both are simplified and reduced languages.

Starting point is 00:50:03 Both, strangely enough, drew some inspiration from ALGOL. The main difference is that BASIC was designed as a teaching language for novice computer users. C, almost by accident, ended up being well-positioned and well-designed as an educational language for more experienced practitioners. Anyway, I just found that observation interesting. Lyon's revamped course ended up working really well, and in 1977, the course material was officially published as the commentary. Officially, the book was only for other people who had access to a Unix license. Read that as universities or a few businesses.

Starting point is 00:50:49 It did contain a full source listing for Unix, after all. Despite that restriction, the book rapidly gained in popularity. Part of the draw was the educational value. It did teach how operating systems worked. Other universities picked up on that value right away. operating systems worked. Other universities picked up on that value right away. But just as important, Lyon's commentary fully documented the Unix kernel. It explained how Unix functioned in excruciating detail. At the time, Lyon's book was the only text on that particular topic. AT&T just plain hadn't published anything that came close. AT&T would even eventually end

Starting point is 00:51:27 up buying the rights and handle distribution of the commentary. It even made it into the hands of Ken Thompson and Dennis Ritchie themselves. Ritchie would later recall that, quote, we were very much impressed by the quality of the work and highly flattered by its very existence. of the work and highly flattered by its very existence. But things rapidly changed as the 70s drew to a close. In 1979, Unix 7th edition was completed, and AT&T decided to change tactics. With this new version, AT&T started to look at options for actually turning Unix into a product they could sell. Part of that was securing their intellectual property, which means no more selling Lion's commentary. Supposedly, the company even threatened to sue schools that taught courses that used the Lion's book, but I can't find any hard sources to back that up. The gist of the situation is, by 1979, universities stopped teaching using the text,

Starting point is 00:52:27 and new copies stopped flowing. The jargon file tells us what happens next. The Lion's Book began to spread via Samizdat. Its official use was driven underground, so Unix hackers and enthusiastic C programmers, well, they started running photocopies of the text. These copies spread by hand amongst students and enthusiasts. Copies of copies were made, and the process continued. This is a fantastic example of the hacker spirit. People just plain liked Lyon's commentary. It was a source of valuable information that couldn't be found anywhere else. Hackers were quick to find a way around the legal hurdles in their path to share knowledge. And in a strange way, this restrictive atmosphere bred a certain level of conviction.

Starting point is 00:53:17 Peter Reinschess, one of the students involved in a copy machine ring, put it this way, quote, Because we couldn't legally discuss the book in the university's operating systems class, several of us would meet at night in an empty classroom to discuss the book. It was the only time in my life when I felt I was an active member of an underground, end quote. Really, it's a perfect storm situation, isn't it? Suddenly, it was a subversive act to teach using Lion's Commentary. So students spent hours poring through the text in secret.

Starting point is 00:53:52 And they spent hours poring through lines and lines of C. It almost sounds like the genesis of some revolutionary vanguard. But in this case, you end up with generations of diehard C programmers. Lyon's commentary would remain underground for nearly 20 years. It was finally republished in 1996. SCO, the current right holders to Unix, agreed to let the book make a return. Bringing the project full circle, both Dennis Ritchie and Ken Thompson wrote in the forward for this new, legally published edition. Alright, that does it for our dive into the C programming language. And really, this is just the surface of a much larger conversation. I initially wanted

Starting point is 00:54:47 to get into commercial compilers, early use of C on microcomputers, and a few other threads, but there's just too much to cover without turning advent of computing into advent of C. So I'm not totally done with C. I'm just done for right now. With what we've covered so far, I think we can turn our attention back to the guiding question of this series. Why did C become so ubiquitous over other languages? If you want to be technical, then we can just look at what C has that BCPL lacked. Data types. I know that's pretty reductionist, but I do like it as a bit of a crude answer. BCPL and C are pretty similar languages up to a point. They're built to be portable, easy to use, and to bring the power of high-level programming languages to a low-level set of

Starting point is 00:55:41 problems. So we could just look at the main difference, which honestly comes down to data types. There's more to the equation than just that, but data types do stick out as an important indicator. C wasn't made overnight. It evolved from a mix of earlier theory and a lot of hard work and iteration. Data types and, even getting really granular, the AND AND operator, those are both results of this hard practice and evolution. Even in its infancy, Ritchie was tweaking and adjusting the language based on feedback from other programmers. Once C became recognizable around 1973, the language continued to shift and change as better practices and new ideas were incorporated. Unix also helped make the language better.

Starting point is 00:56:34 In a circular kind of way, C was developed to make Unix better itself. That gave the language a clear and exacting goal, a drive towards something bigger than just making a cool language for the sake of making a cool programming language. And as Unix left Bell Labs, C came along for the ride. The spread of Unix, at least early on, was also the spread of C. But that wasn't the only way C got out of the lab. Snyder's early portable compiler is just one example of C showing up outside of the context of Unix. The bottom line, pulling everything together, is that C became ubiquitous quickly because it matured really quickly. Constant feedback, a quick-moving user base, and a focused path helped forge C into a complete and really competent

Starting point is 00:57:27 programming language. It was built for a single purpose, to write a better Unix. That turned out to be general enough that the language soon found other uses. Initial spread started in classrooms, and as students struck out into the world, C came with them. But like I said, this isn't the end of C's story. Just a look at a few early paths to its success. Thanks for listening to Advent of Computing. I'll be back in two weeks time with another piece of the story of computing's past. And hey, if you like the show, there are now a few ways you can support it. If you know someone else who would be interested in the story of the computer, then why not take a minute to share the show with them? You can also rate and review on Apple Podcasts. And if you want

Starting point is 00:58:14 to be a super fan, then you can support the show through Advent of Computing merch or signing up as a patron on Patreon. Patrons get early access to episodes, polls for the direction of the show, and bonus content. You can find links to everything on my website, adventofcomputing.com. If you have any comments or suggestions for a future episode, then go ahead and shoot me a tweet. I'm at adventofcomp on Twitter. And as always, have a great rest of your day.

Advent of Computing - Episode 54 - C Level, Part II

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.