Advent of Computing - Episode 158 - INTERCAL RIDES AGAIN - Restoring a Lost Compiler

Episode Date: June 1, 2025

In 1973 the world caught it's first glimpse of INTERCAL. It's a wild and wacky language, somewhere between comedy and cutting satire. But the compiler was never circulated. There would be later implem...entations, but that original compiler remained lost to time. That is, until now. This episode covers how the original source code was found, and my attempt to get it up and running. Get the source code for INTERCAL72 here: https://github.com/rottytooth/INTERCAL72/ Read the original INTERCAL manual: https://3e8.org/pub/intercal.pdf

Transcript
Discussion (0)
Starting point is 00:00:00 I've been producing the Adjunct of Computing for six years at this point, and I know, your applause is appreciated. I'm excited about that too. Now in that time, there's one question that I get asked over and over. I'll get this from everyone from long-time listeners to random people I meet while traveling. The question is, aren't you worried you'll run out of topics for the show? Now, the short answer is no. Computers are just so complex and their history is so complex and rich that there will always
Starting point is 00:00:35 be a new nook or cranny for me to dig into. I actually think it's impossible for anyone to know about the entire story of the computer. There's just too much and there's too much to tell. My long answer? Well that's kind of just been the arc of the entire podcast. I think we are, right now, at a crucial point in the study of computer history. There are all these huge events that are now old enough to be actually studied as history. We have enough space between us and those events that we can start to form context around
Starting point is 00:01:10 them. We can look at their long-term effects, we can see how they fit into the larger story. You can now actually write a history of Unix and it will be, in large part, complete. Whereas you couldn't really do that in 1990. A lot of that story was still happening. I'd argue you couldn't even write a good Unix history in the year 2000. The other factor here is that we're close enough to those events that we still have very high resolution. Documents still exist. Computers still exist.
Starting point is 00:01:45 We still have eyewitnesses to these events, and in many cases, the very perpetrators thereof. And best of all, new information can still be discovered. I don't think I'll ever run out of things to talk about on the podcast because there are always new things coming to light. Today will be a prime example. You may have noticed that this episode is a week late. I had to push it back to give myself a little extra time. The reason's simple.
Starting point is 00:02:15 I was handed some source code that hasn't been seen since the 1970s. Welcome back to Advent of Computing. I'm your host, Shaun Haas, and this is episode 158, Intercal Rides Again. Today you're in for a special treat. Our story starts a few years ago, specifically with episode 78 of the podcast in which I covered esoteric programming languages. The root of that topic was a language called intercal, which is often cited as the first esoteric programming language. This is a class of languages that are made as jokes or satire of programming. They almost act as this mix of art and commentary in the programming world. These esoteric languages tend to be
Starting point is 00:03:20 horrifying to behold. They're often intentionally confusing, nonsensical, or otherwise bad and gross, but kind of in this incisive way. There have actually been many very bad programming languages written throughout history. That tradition continues into the modern day. Intercal, however, is the first language that was intentionally designed to be gross. Like I said, it's part art, part commentary. As I was working on that episode, I hit a point where I had some specific questions about how Intercal actually worked. The original manual that describes the language is pretty jokey.
Starting point is 00:04:03 So I thought, hey, why not try to track down the source for Intercal's first compiler? That should answer actually all of my questions, but try as I might, I couldn't find that original source code. My next step took me down quite the path. The end point of that path has been an attempt to run the original Intercal compiler on a
Starting point is 00:04:26 modern computer. The project has been a team effort between myself and Daniel Tempkin of Esoteric.Codes. And of course, none of this would have been possible without the help of Don Woods, the co-author of Intercal itself. This episode is going to be a little different than usual. I'm gonna be chronicling the project, the story of how we went about reviving Intercal 72. Along the way, I'll be looking at a lot
Starting point is 00:04:54 of the historical context that comes up, because, at least for me, when you look at old source code, you end up reading a lot about the history of the languages and the practices involved. We're also going to be talking about why and how you restore old software. These kinds of projects are difficult and time consuming, which I think makes it all the more important to explain why they should be undertaken. As for the final conclusion, well, you can stick around,
Starting point is 00:05:22 or you can go to the GitHubithub link in the show's description that will take you directly to the raw source code and the mostly functional modern restoration of the intercal 72 compiler Now before we get started, I do have to make a few announcements The first is we're getting pretty close to half a million downloads. Once we hit that, I'm going to do some kind of special and I think I'm going to do another Q&A. I'll be posting more about how to send in questions, but be on the lookout on my
Starting point is 00:05:57 Blue Sky, my Discord, and my Patreon if you subscribe. And second, I'm going to VCF West again this year. That's being held at the Computer History Museum in Mountain View on August 1st and 2nd. I'm going to be given a talk about some other software I've been working on. So if you want to come out, VCFs are a wonderful experience and I'd love to get to spend some time with listeners. I really have a fabulous time every time I get to meet y'all.
Starting point is 00:06:28 So, without out of the way, let's get into the, well, the morass, the maze and rabbit hole that is intercal. It's only fair that we do a refresher on the language. I don't want to throw everyone in the deep end immediately. Now this is going to be in no way comprehensive. If you want the deep lore, I have all of that in episode 78 lying in wait for you. So consider this a primer, just enough to get us ready. The Intercal Manual, itself distributed in 1973, describes the language's origins thusly, quote,
Starting point is 00:07:12 The Intercal programming language was designed the morning of May 26, 1972, by Donald R. Woods and James M. Lyon at Princeton University. Exactly when in the morning will become apparent in the course of the manual? It was inspired by one ambition, to have a compiler language which has nothing at all in common with other major languages." There are two big things going on here. First, we have intercal the language. It's confusing, bizarre, and very very unique. That's one design goal that they really nail.
Starting point is 00:07:54 It's meant as a joke. Then second, we have the manual. It is also confusing, bizarre, and unique. It is again, meant as a joke. It's a funny document describing a very funny language. I mean, the manual doesn't have an appendix. That got removed. It has tonsils instead. The result here, and something that I want to bear in mind,
Starting point is 00:08:21 is that the manual may not be 100% true to the implemented language. The jokes in Intercal started off, according to Don, by renaming punctuation symbols. The number sign in Intercal is called the mesh. The period is the spot. The dollar sign is called big money. The tilde is the squiggle. Parentheses become wax and wane. That makes reading the language allowed into this kind of tongue twister.
Starting point is 00:08:50 The language features are also strange and uncanny. The one that most folk know about is pleas or the politeness system. An intercal programmer has to have just the right number of please statements. The ratio between lines with please and lines without is called a program's politeness. Use too few pleases, the compiler will call you impolite and won't run. But if you get overly polite, your program will also be rejected. There's also this wild concept of ignoring, abstaining, and forgetting. I tend to think of these as closely related. Ignore essentially locks
Starting point is 00:09:33 down a variable. It makes a variable read-only until you say, please remember this variable. Abstain will prevent a label from being executed, or, used another way, disable certain functions or classes of operations. You can literally tell Intercal to never print data again, or turn off math. Forgetting gets even more strange. It has stuff to do with ruining the return stack. Not really for the faint of heart, or actually a useful language feature at all. This isn't even to mention the variable system, or
Starting point is 00:10:11 even how intercal handles math. There should be some big quotes around handles math. It's a wild language, and if you're a programmer, it's really funny. The more old-school languages I've learned, the more It's a wild language, and if you're a programmer, it's really funny. The more old school languages I've learned, the more incisive that comedy has become. The compiler is up and running in 1972. It runs on one of Princeton's IBM System 360 mainframes, and it's written in this language
Starting point is 00:10:40 called Spitball. That's S-P-I-T-B-O-L. I may lapse into calling it Spitball on occasion. Now, despite the name, that's actually a real language. There's no room for jokes there. The manual for Intercal is completed in 1973 and circulates around the community. At least, 1973's the date stamped on the manual. The manual has been around
Starting point is 00:11:09 ever since, but as for the compiler, well, that's the mystery. According to Don Woods, the compiler never left Princeton, so while there was a manual to chuckle at, there was a lack of tooling. You couldn't actually run a program written in intercal. To the less sick among you, that may sound like a good thing, like a dangerous pathogen was contained in a very nice school. Documented, but contained. That's how pathogens should be, right? This would all change in 1990. Eric S. Raymond, a hacker par excellence, created a new Intercal compiler written in C. This new compiler is called, believe it or not, C. Intercal. We're
Starting point is 00:12:00 pretty certain that that compiler was written exclusively working from Intercal's manual, not the source for the original compiler. I haven't heard back from Raymond, but I'm backing this belief up by everything I know from Don Woods, everything I've seen in the language, and the fact that Raymond even calls his new compiler a quote-unquote revival of the language. C. Intercal is recognizable as its own dialect. This is in part because the 90s was just very different than the 70s. The original language, which I'm going to call Intercal 72, was designed on an IBM System 360.
Starting point is 00:12:44 That used the EBCDIC character set. Modern computers use ASCII. Those character sets don't 100% line up. One result is that Intercal 72 uses some symbols that aren't available in ASCII, like the SynthSign or the StruckThroughV. Senior Intercal replaced those non-ASCII characters with ASCII equivalents. The newer dialect also added some commands. Most notably, Raymond added the come from to the language.
Starting point is 00:13:18 The come from statement is the opposite of a go to, which makes perfect sense to absolutely no one in the world. With a normal go-to, you tell the program to jump from the statement to a certain line of code. With the come from, you tell the computer to jump to the come from statement once it hits a certain line of code. For example, if you write come from 100, then when Intercal reaches line 100, it will jump to wherever that come from is written. This one is kind of a joke on a joke. In an interview on Esoteric.Codes, Raymond explained that, quote, the idea of adding come from to Intercal and the implementation were both mine, though the original idea goes back to a 73 satire on Dykstra's go-to statement considered harmful
Starting point is 00:14:09 end quote If go to is bad come from must be good, right? I mean they're opposites This leads to an interesting fact. It's that everyone who's ever written intercal Except for maybe a handful of programmers at Princeton, have used C-intercal or one of its dialects. Yes, but believe it or not, there are actually multiple dialects of this absurd language. Most things written about intercal tend to be, on purpose or on accident, centered around C-intercal. tend to be, on purpose or on accident, centered around C.intercal. One surefire way to see this is a joke about the compiler.
Starting point is 00:14:49 The C.intercal compiler, the actual executable, is called ick. The intercal72 compiler has a different name, but we'll get to that in a minute. Let me outline this web for us. The modern understanding of intercal comes largely from C. intercal, which is based off a manual that, I believe, is written as a joke. That's not entirely perfectly preserved history. However, that seems to be pretty well acknowledged. Raymond even calls intercal hacker folk art.
Starting point is 00:15:27 That's where I got the quote. So we're dealing with a bit of a shifting thing. Without the source code for the original Intercal 72 compiler, we don't actually know the root of the language. Now there is a bit of a subtle distinction to be made here. Is a language its specification or its implementation? You can really argue either way, but it kind of depends on the language. Take Algol as an example.
Starting point is 00:15:56 That language is built to be a standard. So much so, the standard is really the language. Different implementations of ALGOL are, at least early on, kind of viewed as dialects of that standard. There are many ALGOL compilers, but there is only one ALGOL standard. But what happens when you have a smaller language? What if you only have one compiler? Is the language as executed the real language or is the standard the real language? What if the standard is written
Starting point is 00:16:32 as a joke, tonsils included? I think in that case, you need to look at both the standard or the manual in this case and the implementation. For Intercal 72, it feels like those two aspects form a greater whole, and we haven't had access to that whole picture for quite a while. That is, until now. So let's fast forward to 2022. And yes, it has been that long. This is what I mean when I say I have these long-term projects going on. I wanted to see if I could get my hands on some source code. I didn't have Don Wood's email
Starting point is 00:17:11 address, so I reached out to someone who I knew had contact with him, Daniel Timpkin. Daniel had interviewed Don a few years previously for his wonderful Esoteric.Codes blog, and I was using that interview as a source when I initially covered Intercal. Daniel didn't know the whereabouts of the source code, so he started up what became a very long email thread between us and Don. The network formed itself, as it were. It turns out that the source code for Intercal 72 was sitting in a box somewhere in Woods' house. After some back and forth, we were able to get the scans.
Starting point is 00:17:49 They dropped into my inbox about three weeks ago. From there, the madness set in. Don closed out that email with, happy hacking. He was right, and I have had very poor sleep ever since. Now there are two big parts to this project. The first is simple preservation. Daniel and I worked to get the scans typed up, which was an error-prone process. Those scans and the typed code are available on GitHub, which I'll link in the show's
Starting point is 00:18:22 description. This way, the source for Intercal is finally accessible in its original 1972 form. This is also where we started to make discoveries. I'm gonna cover these later, but I do wanna drop one here to start with. The compiler for Intercal 72 was originally called ICAL, I-C-A-L, not IC. So if I start using ICAL this episode, I'm specifically referring to the 1972 implementation of the language.
Starting point is 00:18:56 The second part, and where my personal suffering, my happy hacking comes in, is getting the code to execute. I want to be able to compile an Intercal 72 program on a modern computer using the original compiler. Now, to do that, I've made a few decisions that may prove controversial. The first choice I made was to not use an emulator. It should be possible, given a System 360 emulator and the correct software, to just run the compiler.
Starting point is 00:19:30 That should kinda just work. Instead, I want it to actually run iCal natively. My logic is that native execution should be easier to deal with for users. I want folk to be able to clone the new intercal repository and get up and running with iCal as fast as possible. Mainframe emulation is something I'm not good at, but I'm passable at software development. If anyone wants to load iCal up into a System 360 emulator, please do so. All of this source code's available now, so anyone can play around with it. But how do you go about getting old code to run on a new computer,
Starting point is 00:20:11 sans emulation? And how is that even possible? Well, it's all thanks to a quirk of history. ICAL was written in Spitball. I know, get the yucks about the name out now. Spitball is the only reason I can even attempt this project. And let me assure you, it's a very serious language. This is also a spot where we get to touch on, well, some confusing things. Spitball is one of those pained acronyms. It stands for speedy implementation of Snowball. It is, well, a speedy implementation of the Snowball 4 programming language.
Starting point is 00:20:59 Spitball is one of those things that blurs the line between implementation and language. I think I'd call it a dialect, if anything? From what I've seen so far, the manual for Spitball and the manual for Snowball 4 are basically interchangeable, so any changes are, at most, minimal. The most important fact to know is that Snowball 4 is a highly portable language, and so is Spitball for that matter. Snowball 4 code isn't exactly compiled, rather it's transformed into another language called — and get this — the Snowball Implementation Language, or SIL. SIL is closer to an assembly language than a high-level language, but this isn't an
Starting point is 00:21:50 assembly language for a physical computer. No, nothing so pedestrian. SIL can only run on a special virtual machine. When you compile a snowball program, its output is SIL code, which is then ran inside a virtual computer. In fact, Snowball itself is written in SIL. It's VMs all the way down. This is very, very similar in method, at least, to how Java and the Java Virtual Machine work. The Java compiler creates bytecode that is then executed on the JVM. That sounds pretty convoluted, right? Why go to all that trouble? Well, portability.
Starting point is 00:22:35 It just so happens that SIL is very easy to implement. In fact, it's designed to be easily implemented. If you can write SIL for a new computer, you're done. Snowball now works on the new platform. That means you get the full compiler and any existing Snowball programs. As such, Snowball runs on just about everything. It's exactly like how Java can run on 15 billion machines or whatever the old ad copy used to say. Spitball does a very similar sleight of hand but uses a different virtual machine. The key distinguishing factor is that Spitball is more performant than Snowball. From what I've read, this is due to the fact that Snowball operates more like an interpreter,
Starting point is 00:23:25 while Spitball is closer to a traditional compiler. But still, I want to stress, neither is very normal. The implementations going on here are extremely complex and extremely atypical. The result is that Spitball can also run on 15 million machines! Once I got my hands on the source code for iCal, I went out looking for a Spitball compiler, and I found one immediately. The compiler I ended up using is called Spitball x64. It works on 64-bit x86 systems. Basically, any modern computer. I ended up running that compiler under Linux, so I get to program with all the tools I'm
Starting point is 00:24:10 used to using. There's even a Spitball plugin for my favorite IDE. However, just because I can keep my next text editor theme doesn't mean I'm going to have a good or smooth time. Now I'm a pretty accomplished programmer, if I do say so myself. I know a ton of languages and can usually pick new ones up pretty easily.
Starting point is 00:24:34 But Snowball, well, that's been a unique challenge. Snowball is a string-oriented language. When I first read that, I thought, okay, cool, perfect. I've worked with these types of languages before. I used to get paid to write Perl, which is very much a string-oriented language. I naively assumed my skills would transfer over. Oh, how wrong I was! Snowball doesn't look like any other language I've ever seen. It is, however, very well suited to string manipulation. It's perfect if you want to write a quick and dirty compiler, which is very much what
Starting point is 00:25:17 iCal is. So what exactly makes Snowball so unique and challenging to learn? Allow me to count the ways! First of all, each line of Snowball uses the same syntax. An expression follows a very formulaic layout. It goes label subject pattern equals object colon transfer. That forms an entire expression. However, and here's the trick, each part of that expression is optional. That makes for some bizarre looking lines of code. A hanging
Starting point is 00:25:57 equal sign is actually completely valid. Labels act exactly as you'd expect. It just sets an internal name for a certain line of code. You can then jump to it. The transfer part of the expression is how you make that jump. Thus, a loop can simply be one line of code that jumps back to itself. Subject is also pretty simple, and in some ways, it's just a variable name. X is a subject. If you're simply setting the value of a variable, you can actually write x equals 1. In that expression, 1 is the object and x's value is just set to the object's value. It's the pattern bit that gets complex. Patterns are their own type of expression that are used to match and extract data from a subject.
Starting point is 00:26:48 If a match is successful, then the pattern is replaced with the object. This could be as simple as replacing every space with periods. Complexity spirals up very steeply from there. In the case of iCal, pattern matching is used in almost every single line of code. The other important behavior here is the concept of success and failure. Each expression can either succeed or fail. If you try to replace all spaces in a subject, but there are no spaces to replace, then that counts as a failure. There's special syntax in the
Starting point is 00:27:25 transfer to branch on success or branch on failure. By default, a transfer is unconditional. But with these success fail conditions, you can do some pretty tricky things. The most basic use of this trick that I've seen all over iCal is a function that drops every space from a string. A pattern replacement will normally just replace the first occurrence of a pattern. If no more occurrences are found, then that expression is counted as a failure. So you just have to have a label that replaces a space with nothing, then transfers to itself as long as the replacement
Starting point is 00:28:05 was a success. Once there are no more spaces, the transfer stops, ending the loop, and you can use a fail to return from the function. That's just not very normal. I've never ran into a language language, it works in that way. Then we get to the cursor. Ah, the cursor. This is kind of an old school way to deal with text processing. In a more modern, and I express more modern language, like Perl, you're expected to do text processing using regular expressions, pattern matching, and considering a whole chunk of text and ripping it apart.
Starting point is 00:28:51 But in Snowball, you operate as a cursor moves back and forth, pointing to a certain head where you want to be operating. As Spitball processes a string, it's internally moving that cursor. Under the hood, it's doing pattern matching at a very low level. It scans the cursor forward from the starter subject to its end. That pointer, where it's currently scanning, doesn't stay low level.
Starting point is 00:29:21 It's made available to the programmer. And it's done in a weird way. It's done using these unary operators. To get the value of a cursor, you use the unary at operator. You use it as a prefix to a variable. So at x would set x to the current value of the cursor. Having a unary operator that sets a value is a little odd. For instance, it's more common to use a unary to modify and return a value without
Starting point is 00:29:51 side effects. This is something like how the exclamation mark or the not operator in other languages will take the logical inverse of a value without changing the variables value. But Snowball just doesn't work that way. So where exactly would you put at x you may ask? Why in the pattern part of the expression of course. And that doesn't change the pattern. It just, well, it kind of hangs out and pulls extra information. In fact, you can pack a wild amount of things into the pattern that just hang out and don't affect the pattern, that just glean information. So, so I hope you're seeing the shape of things, dear listener. This is a complex, unique, and very special programming language.
Starting point is 00:30:45 I'm pretty sure you could define a Turing machine with a single line of snowball. That is powerful. And best of all, it would be a totally incomprehensible line of text. I can just barely read some of this code, but hey, I've improved. When I first started typing out the scans of iCal, I felt completely illiterate. Now, well, I'm reading at a first grade level. It's still not the best, but workable. Luckily, I've been getting a whole lot of help from one Cheyenne Wills, who is a maintainer
Starting point is 00:31:20 of the Spitball x64 compiler that I'm using. I would not have been able to get so far without their assistance, so huge thanks! We've met the tools and we've met the language, so now let's get to the actual project. How exactly does one go about restoring software? It helps by establishing goals. As I've said, we wanted to have intercal 72 available to view in an unadulterated form. And I wanted to have an executable version of the compiler. We're talking about preservation and actual usable code. This is where the nature of software helps us out.
Starting point is 00:32:02 If this was a hardware restoration project, we'd probably only be able to achieve one of those goals. Let's say you dig up some rare and obscure computer from a garage somewhere. You only have one machine. In the restoration process, you have to make a choice. You can choose to preserve the machine as close to how it was found. You clean it up, you make it presentable, you end up with this nice museum piece. It shows off what the computer may have been like when it rolled out of a factory. That would prevent further deterioration of the machine and give a good representation of that computer. You'd see what the machine
Starting point is 00:32:41 would have been like back in the day. On the other hand, you can try and get it up and running again. That sounds like a better path, right? Computers are such an experiential thing. It's one thing to look at a computer, it's another thing to actually use it. But that presents a problem. That kind of restoration will almost always mean maintenance. Power supplies are notorious for degrading over time, so you'll have to do some work there.
Starting point is 00:33:13 Period components will have to be replaced. Depending on the era of the computer, that can mean digging out replacement vacuum tubes, throwing in modern capacitors, or swapping out transistors. In many cases, whole parts of the system may need to be replaced. One well-known example is the Commodore 64's sound chip, the SID. No one makes those chips anymore, but there are modern replacements that use modern parts to simulate the SID. You can slot that in during a restoration and get a Commodore 64 up and singing again,
Starting point is 00:33:48 but now you've changed the original hardware. The cost here is damage. That's the counterintuitive thing. If you're making these changes to get a machine running, then you're destroying the original state of the physical machine. When you replace a period component with a new one, there's this loss of information. That gets worse and worse as larger
Starting point is 00:34:10 changes are made. In these types of restoration processes, it's super important to figure out which route you want to go because once you start swapping out parts, you'll hit a point where you can't go back. You'll change solder, you'll pull out wires, you'll destroy parts of the computer. At that point, the machine is no longer as it was initially. In software, we don't have that problem, so we get everything. For iCal, we have the original scans preserved as image files. We have transcriptions of those files stored as raw text. And we have a separate file that's, in large part, executable.
Starting point is 00:34:56 It's just files, so we can make as many copies as we want. That means we can take every path we desire. And in this case, we need those extra copies. To get a modern spitball compiler to even accept iCal, some major changes were needed. The largest was to format the software. The scans we received from Don Woods were a source code listing. That takes us, as always always into the fun discussion of software
Starting point is 00:35:25 representation. The big thing here is that each line started with a line number, there were weird tab stops, and each line was restricted to 80 characters. For Spitball x64 to even consider the code, I had to reformat that all into a normal look and text file. Most of the remaining changes were actually nitpicks. This is something that happens between language versions. Technically speaking, iCal is written for IBM System 360 Spitball. That uses a slightly different method for reading and writing files. That adjustment, again, is a destructive change.
Starting point is 00:36:08 I'm changing how the compiler originally handled files. But we have copies of the original source code, so I can make those changes guilt-free knowing we have that original reference. What else did I have to change? Well, to start with, most of the problems came from transcription errors. Two sets of eyes have helped catch a lot of these, but there are some major problems. The scans we're working off of came off of some type of impact printer. It was clearly a
Starting point is 00:36:41 well-loved device, and the font set is a little weird. Periods are suggestions at best, zeros and O's look almost identical, and the kerning around single quotes is distressingly wide. There's also all that reformatting I talked about. This kind of comes down to the media in play. iCal would have been fed into Princeton's 360 on punch cards. That would then make its way into a file and then be compiled by Spitball360. The scans we have are from a source code listing of ICAL.
Starting point is 00:37:18 That means each line starts with a number, has that specific padding, and headers. Lines are also restricted to 80 columns, anything that spills over is continued and marked with a plus sign. To get x64 to accept that, I had to drop the formatting and turn those multi-line spots into single lines. That ended up being kind of challenging because of how weird the printout was. It was hard to tell when a continue started with a space. I also had to add specific indents to make the flow of the program more clear and to make it work. The white space was mostly absent from the original program because, again,
Starting point is 00:37:58 the media was punch cards. File input and output also had to be adjusted somewhat. And, to be fair, Snowball has some weird I.O. going on. To output a file, you first open that file, bind it to a variable name, and then assign its value. Anything you assign to that file variable is printed in that file. There's a default file handle already open called output that prints to standard output, the screen. So a Hello World program is actually just output equals hello. That's a little strange, but whatever, it's noble. The main change that had to be made was dealing with how files are opened and closed. Basically the 360 version of the
Starting point is 00:38:45 function that opens a file expects two arguments, but Neuers at Ball wants three. These are the kinds of small shifts that occur in a language over time. It's not major, but it's the kind of thing that prevents older software from running directly. I also hit another linguistic shift when it came to function names. iCal has this internal function called exp. It's used to process an expression and it also does some error handling. As it turns out, modern versions of Snowball already have a function called exp. already have a function called exp. It's used to calculate exponents. So names had to be swapped around. Again, nothing major, just the kind of shifts that a language experiences over its lifetime. And I only bring this up because it's really neat
Starting point is 00:39:37 to see this firsthand. I know we always talk about how, oh and then the language evolved over time, but this is what that means. But then we get to things that I had to replace. I-Cal sometimes requests to punch data out, you know, on a punch card. That obviously doesn't really work well on modern hardware. Very few of us have key punches set up to our computers, so I had to replace any punch calls with a simple file output. iCal also relies on a few system libraries.
Starting point is 00:40:14 Spitball is able to load and execute binary modules. The closest comparison I can make is to linking in another language like C, but again, Spitbull isn't compiled in the traditional sense, so there's some wiggliness in that comparison. Spitbull does loads in a dynamic way, you can actually load, execute, and unload a module in the same line of code. But iCal is supposed to be running on a System 360. It relies on System 360 modules. I don't really have those kicking around, so I had to re-implement them.
Starting point is 00:40:53 Luckily, there were only two. Clock spelled with a K and WTP. Clock it turns out is just a numeric clock value. It's used to set up a random number generator. That's been replaced with Spitball's own clock function and should function in a similar manner. WTP was the weird one. From what I found, it's short for Write to Programmer and was another way for the System 360 to send output. Hence, it's another output file. Simple and easy. The compiler already generates a lot of output files, so one more was a safe bet.
Starting point is 00:41:35 This kind of makes the process sound painless, but dear listener, I assure you, this was a lot of work. It's been quite the struggle to get things to actually run. And even then, we've only been able to get so far. At the end of the process, I have a working iCal compiler. It's able to take simple intercal code and output a program that could run. There are still a few bugs. There are still some little processing bits that don't quite work right, but it's enough that we have the shape of things. The biggest issue is actually that the output code relies on two modules that we don't have. One I think might be generated by the compiler, but the other one definitely isn't.
Starting point is 00:42:22 There's this preliminary library it tries to load that I think is runtime support. And that I can't find, and Don hasn't gotten back to me with the code. But for me, this is enough of a success. The compiler is working well enough that we can start asking some deeper questions about Intercal 72.
Starting point is 00:42:43 So what have we discovered? One of the many jokes tucked away inside Intercal is its inefficiency. Not only is Intercal code inscrutable, but it's also wasteful. That carries on from the code itself to actual running programs. Don Woods often cites the fact that a single integer division in intercal takes 30 seconds to execute. It also takes over a hundred lines if I remember the floating-point libraries correctly. This is an awful language. Don't program in Intercal. Now when Don gives out that 30 second figure, that's all dependent on hardware and on the specific implementation of the Intercal compiler.
Starting point is 00:43:35 That number should sound ridiculous on its own, right? In the deep past of computing, division was a difficult task, but by 1972, it was already a solved problem. Division in any normal language is fast, but intercal is abnormal at best. In working with iCal, I can now point to why division is so slow, and in fact, why iCal programs tend to be sluggish. The big secret is iCal isn't actually a compiler. This is also why ICAL is so portable, because believe it or not, Intercal is a highly portable
Starting point is 00:44:16 language. So what exactly is ICAL doing? What even is this program that I've been working with? In modern terms, I'd call it a transpiler. It takes intercal code as input, and instead of making a binary, it outputs spitball code. Pretty nifty, eh? Why would you do this, and what are the implications? Well, the why is easy, actually. It's a lot easier
Starting point is 00:44:46 to write a compiler this way. Spitball can very easily write spitball code. I mean, to spitball, code is just a string and vice versa. So, there's nothing stopping you from using a spitball program to write another spitball program. In fact, the language makes it easy. This also makes the process of writing the compiler really simple. iCal doesn't have to worry about things like memory or how to shuffle registers. Spitball will handle that eventually, later. The result is that writing iCal would have been much faster and easier than developing a traditional compiler. This is also why iCal is so portable. It spits out code in a very portable language and it makes very few assumptions about hardware. Thus,
Starting point is 00:45:38 iCal could actually run and produce output that works on just about any machine, as long as we ignore the missing library issue. But there is a downside to this trick, and that's efficiency. A program written in iCal will always have this awful overhead of having to run the spitball compiler thing, whatever you want to call it. You will always take that hit, since you aren't outputting native code. This means that Intercal's capabilities are colored by the capabilities of Spitball itself. ICAL has to express everything that it can in terms of Spitball expressions.
Starting point is 00:46:19 And Spitball is a string language. Perhaps you see the issue. Spitball technically has primitives for math, but iCal never touches them. If you look through its output, it will never call, add, subtract, or even divide. At least, not in a normal way. The math operations in iCal are all nonsense, and they are all based around string manipulation. I know, take a minute to gasp in horror. The two operations you get are called mingle and select. They operate on binary data. Internally, iCal will store a number as a
Starting point is 00:46:58 string of binary digits. A 4 becomes, and this is a string representation, quote, 1 0 0. When you operate on that number, it's done as string manipulation. That is, again, inherently slower than a mathematical operation. The technical reason is simple. A string operation is actually just a pile of very special math operations. The result is that any operation involving numbers becomes a wildly convoluted mess, both in the intercal source code and in the final output spitball. Hence, you get 30 second division times. That's one of the more wide-ranging discoveries. But we also have a pile of these small fascinating details. One of my favorites is how ICAL handles
Starting point is 00:47:53 politeness. This is, at least for me, the hallmark feature of Intercal. One of the open questions in my head has always been, how polite does intercal code actually need to be? The C. intercal manual has its own opinion on this, to quote, the balance between various statement identifiers is important, if less than approximately one-fifth of the statement identifiers used are the polite versions containing please that causes this error at compile time." But as we've established, that compiler is a later addition to the arts. That's only applicable to the Ick compiler.
Starting point is 00:48:37 So what does iCal actually want? The tracking here is simple. Every line of intercal code has to start with do, please, or please do. ICAL tracks the number of lines that start with do, first the number of lines that start with please. That is used to calculate blindness, but the calculation isn't straightforward, it doesn't take a ratio. You need to have at least one please for every five do's and no more than one please for every three do's. In other words, your politeness ratio needs to be between 1 fifth and 1 third. If that's confusing, then good, you're getting the idea. How iCal implements politeness leads to a weird quirk.
Starting point is 00:49:30 All intercal programs have to be at least three lines long. You need enough code for the ratio to work. That means you can't have an intercal one-liner, because that's just not polite. If that one-liner has a please, then the ratios are all off and iCal will reject your program just as a matter of course. That's a 100% please ratio. That's too polite. Another legendary feature of intercal is the unexplained compiler bug. This has been the bane of my existence on this project.
Starting point is 00:50:06 The story goes that Intercal will randomly introduce bugs into its compiled output. The fact is two-fold. Call them facts if you like. The first is how ICAL handles actual compiler errors. Normally, when Spitball encounters an unrecoverable error, it will spit out a pretty reasonable message. You get the error code, a small message describing the error, and the line the error occurred on. Like many proper programming languages, Spitball allows the programmer to define their own error handling routine. Normally, that's very useful. iCal isn't normal. One of the first
Starting point is 00:50:47 lines of iCal sets a custom error handling routine. That routine spits out the string unexplained compiler bug, which on its own is pretty useless. That says nothing about what went wrong. Now, each spitball error has an associated code, but iCal, well, that views any number as a potential source of chaos. Instead of printing the error number, iCal calculates a nonsense error number. Specifically, it subtracts 632 from the error type and then adds a decimal point that appends the last statement number executed but in reverse. It is utterly useless and it's kind of hurt by the fact that it isn't a random message
Starting point is 00:51:38 that gets spit out. This is supposed to replace spit's just normal error handling. Now I said that's one part of the unexplained behavior. Part two has to do with its output. This is something that I haven't been able to trigger correctly. It appears that iCal has a 10% chance to insert a random bug into the output program.
Starting point is 00:52:04 From reading the source, I think it replaces a go-to with a different random go-to, but I'm not entirely sure because I haven't been able to trigger it reliably. Another weird note comes from character encoding. The typed-up versions of iCal are technically not ASCII files, they're UTF-8. There's one reason for that, the synth sign, or as Intercal calls it, change. It's used for the highly cursed mingle operator, which interleaves the bits of two numbers. It's not a useful math operation. The Scent sign is also special because it's in IBM's EBCDIC character set, but it's not in the ASCII character set.
Starting point is 00:52:53 The original Intercal manual suggests replacing change with big money, aka the dollar sign. But iCal expects change, not big money. That means that not only the compiler, but also any source you feed into iCal cannot be represented in ASCII. What's funny is that a decade ago, that would have been a problem. But nowadays, text encoding support is very good. Most editors will happily support UTF-8 or even more sophisticated text encoding schemes. So change is no issue. Or at least I thought that would be the case.
Starting point is 00:53:31 For some reason, and I don't know why, I wasn't able to get iCal to run with the change symbol. I ended up having to do the suggested conversion to big money. Why? Well, I can't figure it out. Spitball64 would accept change, but it just wasn't acting right. iCal wasn't properly recognizing the change character, so my version uses big money now. But that leads me, in a way, to something weird I've ran up against. That's the simple fact that most intercal
Starting point is 00:54:06 code you see online doesn't actually work with iCal. I had this realization while I was trying to compile a Hello World example. It kept failing to process. iCal would tell me, unexplained compiler bug, unexplained compiler bug minus 138.125. Um not very useful iCal but thanks for trying. iCal has been such a well let's call it a joy to prevent me saying anything else it's been such a joy to debug since it can be very hard to tell when the error is in the compiler or in the program you're compiling but I digress. If you go looking for an intercal hello world, you're going to run into this one program.
Starting point is 00:54:51 It loads up an array with ASCII character codes, then it tells intercal to print that array. See intercal, the dialect with a usable compiler will allow you to print an array as a string of characters. Intercal 72 will not. The vast majority of software online is for C. Intercal or some later dialect. There are a few other tells, but that print statement thing is the big one for me. It also came as a bit of a strange surprise. I knew somewhere in my head that intercal
Starting point is 00:55:27 had shifted over time, that C. intercal was its own dialect and language spec, but I hadn't realized the implications of that. So I guess that's twice this episode that I've had to come face to face with the horrible specter of linguistic shifts. With that aside, well aside, I want to focus back on the deeper internals of iCal. What I found is that reading the compiler actually makes the manual quite a bit more funny. There's this one line, for instance, quote, Spaces may be used freely to enhance the program's legibility, or at least reduce
Starting point is 00:56:05 program e-legibility, with the restriction that no word of a statement identifier may contain any spaces. End quote. That statement is not only completely true, it's also completely wacky. iCal's processing goes in a few steps, but here's what's important. Step one is it checks a line for any statement, like do or please or abstain or forget. Step two is it processes any expression on that line. Between steps one and two, it removes all spaces from that line of code. That means you can have very gross formatting, including, but not limited to, spaces between
Starting point is 00:56:51 numbers. You can freely use spaces to make intercal less legible, is what that means. There are a few operations that are done between step 1 and 2. Another is the deconstruction of wows to their constituent parts. Antercal has this pretty grody shorthand where a single quote followed by a period can be shortened to an exclamation mark, since, you know, they look similar. Internally, iCal automatically reverses that shorthand. To use the correct terms, a wow is turned into a spark followed by a spot.
Starting point is 00:57:34 iCal also has some strange hardware dependents. Spitball keeps a lot of those considerations hidden, but there is one exception and that is the punch card of all things. Like I mentioned earlier, I had to rig up a punch out function to replace Spitball360's native punch calls. Well, that assumption of the punch goes a little deeper, but perhaps not in the way you'd think. If you were reading data directly from a punch card, each card would give you 80 characters. Any empty columns would come over as nulls. iCal doesn't assume that it will always be given
Starting point is 00:58:11 80 character lines. Rather, in its processing, each line is truncated to 80 characters if it's too long and then it's padded to 80 characters if it's too short. That just kind of strikes me as funny. Despite being this joke language, iCal actually has code that deals with this very annoying brass tacks of its platform. That kind of processing would show up in just about anything that works with punch cards. And that leads me to the final big category that I want to discuss. That's old software practices. I've said time and time again that the best way to understand old software or hardware is to actually work with it. Well, this is one of those times where I get to
Starting point is 00:58:58 walk the walk, so to speak. Working with iCal has been fascinating in part because it is such old software. It's not just that it's written in a language I don't know, but it uses practices I'm not used to. One is the rampant use of GoTo. Now, I know. This is kind of a meme among programmers. It has been, ever since Dykstra published, GoTo considered harmful. But there is a truth to the matter. It's usually more clean, more understandable, and just better to abstain from going to. It's become practice over time to not really use GoTo. More recent languages don't even include GoTo, which has made the transformation complete. iCal makes bold and aggressive use of GoTo. This is partly because Spitball has very deep support for the feature. I mean, every line can end in a secret GoTo after all.
Starting point is 01:00:01 Specifically, Spitball uses labels for these jumps. You can label a line of code, then reference that label in a go-to. Nothing too weird, really, just an old language used for an old program. But here's where it gets to me. Functions. Spitball supports functions in a bit of a strange way. You declare a function up at the top of your program and in that declaration, you give it a name, you state if it takes arguments and if it returns values.
Starting point is 01:00:34 You can then implement that function anywhere in your code base. When you finally get around to writing your function, it's named using a label. Put another way, in Spitbull, a function is just a special kind of label. iCal uses both function calls and go-tos. A label for a function looks the exact same
Starting point is 01:00:59 as a normal label. So reading the code gets, well, a little confusing. Now, I've worked in systems like this before. Assembly languages often work this way. When you call a function, you're really just jumping to a label and setting a return value. That has the same pitfalls of labels
Starting point is 01:01:22 and functions looking identical. What you usually do is you write labels so they look somehow different than functions. That can mean that all labels start with underscores or that function names start with FN. You institute some stylistic flair that keeps your code clear. iCal doesn't use any kind of convention because, again, this is from a period where this kind of software was the norm. So we get functions named expr next to labels named noLBL.
Starting point is 01:01:58 That has made it pretty tough to track down control flow. The end-of-line go-tos is really the cherry on top, because, well, they can branch. Spitball lets you specify one place to go to if the expression succeeds and another if it fails. That's fantastic for error handling, but a bit of a challenge to follow. iCal does something tricky with this feature. When the compiler is checking what a line of intercal does, it attempts to find some character or some pattern.
Starting point is 01:02:34 It calls up find mesh, for instance, when it's trying to decide if the line of intercal contains a mesh operator. If no mesh is found, Spitbull marks that expression as failed. It's technically in an error state. iCal uses those errors to determine how parsing proceeds. If that line fails to find a mesh, oh well, jump to change processing, but if you proceeded, then jump to this other part of the program that processes mesh data.
Starting point is 01:03:07 This is also used for more complex parts of the program. iCal has this section that determines if a line of code is working with a variable or an array. There are functions set up specifically for processing and preparing each of those data types. iCal will try to process a single value variable, then if anything in that process fails, it cleans up and tries to process an array. It's very, very inefficient, but it's definitely a quick and dirty way to do things. It's also not a strategy I would ever think of. There are
Starting point is 01:03:46 just some weird things around functions in iCal in general. One that took me way, way too long to figure out is how iCal even declares functions. Again, this is old practice that you don't see on modern computers. As I said, Spitball wants you to first declare a function's name before you describe what the function does. You do that using this function called define. iCal has a lot of user-defined functions. That would normally mean writing out something like 50 defines each on their own little line. But iCal doesn't do that.
Starting point is 01:04:26 Instead, there's this huge string that has every function name in it. It also has some variable names. That gets passed to a chunk of code that reads all those names and properly declares each function and variable. That's weird. That's not something that would work in most languages, and it's not something I'd even consider doing. But it's possible in Spitball, so why go to the trouble? I think it was to save space. Ical would have been punched out on cards
Starting point is 01:05:01 before being fed into Princeton's 360. That meant that, under normal circumstances, you would have to have one card for each define call. That's a single card per function that just says, define x, define y, define table. Now that's kind of lame. Instead iCal has a list of functions. You can fit that list on just four punch cards instead of something like 50. That's much more efficient, and it has this cool factor to it. Plus, it's making use of Spitball's oddities to save space and
Starting point is 01:05:39 time. iCal does a similar thing when it defines tables, spitballs version of arrays. That table routine, however, is much more complicated. That's another part of the code that I can't really understand. There are quite a few spots of this compiler that I still don't get. Entercal is keeping its secrets, but we have the source code now. That means that anyone can dive in and see what they can uncover. I've just started the process. Now it's up to all of you. Alright that brings us to the end of this episode, but really, this is the beginning
Starting point is 01:06:25 of a bigger story. The source code for Intercal 72 is out in the public now. After decades, it can finally be shared. I've been able to make, I think, quite a lot of headway, but there's still a lot of space to explore. I'm going to put a link to the Intercal 72 GitHub repo in the description of this episode. I've kind of hit the end of my rope with this one, but I invite anyone interested to grab the code and run wild.
Starting point is 01:06:53 There are still some things with how iCal handles files that I'm not sure about. Plus, I wasn't able to dig into how it really implements things like abstain. If you are a loved one, no spitball. I highly recommend looking at this as a puzzle. If you don't, then this is a great opportunity to learn an intriguing language and get your hands on some real computer history. So take that as a call to action. Before I sign off, I want to give one last huge thanks to Don Woods for actually finding this code and putting up with all my questions.
Starting point is 01:07:34 I also want to thank Cheyenne Wills for all of his help understanding Spitfall. Daniel Timkin, my collaborator, also needs a huge round of applause. None of this would have been possible without him connecting me with Don and without his continued help. He also has a book coming out in September. It's called 44 Esso Langs from MIT Press and I'm personally very much looking forward to reading that. Thanks so much for listening to Advent of Computing. I'll be back in two weeks with a more regular episode. Again, this is Looking forward to reading that.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.