Advent of Computing - Episode 175 - SNOBOL? That's Disgusting!

Starting point is 00:00:00 One of my favorite facts about computing is that we actually get a birth announcement, or rather, a public disclosure. ENIAC is announced to the world via press release in July of 1946. That press release is picked up by papers and pretty quickly ran around. The machine is called an electronic brain, and it's claimed to be the next revolution in mathematics. That epitaph, the electronic brain, pops up over and over again during the first few decades of the digital computer. It's a bit of a funny disconnect. The electronic brain, it's so great at mathematics, so fast, so accurate, but the human brain is notoriously bad, slow, and inaccurate when it comes to mathematics.

Starting point is 00:00:53 The simple fact is that early on, computers aren't exactly brains. They're over-complicated and over-glorified calculators. I've spilled, at this point, uncountable bits trying to tease out the fine line between very sophisticated calculator and very primitive computer. The line gets more and more blurred the further back you go. Early machines are used for more sophisticated mathematics, than contemporary calculating arts could achieve. They are, in fact, an improvement over those earlier calculators.

Starting point is 00:01:31 ENIAC enables things like the Monte Carlo method, which is impossible to do by hand and impossible to do with a desktop calculator. New horizons are being crossed daily, but mainly in the realm of mathematics. It takes a while for us to see just what machines can really do. That basis in mathematics can be used with some doing to take on more human tasks, to become a little more brainy, if you will. One of the first places we see this particular horizon crossed is text processing. After all, what's further from math than the written word?

Starting point is 00:02:20 Welcome back to advent of computing. I'm your host, Sean Hass, and this is episode 175, Snowball? That's disgusting. Now, I've made no secret of my love for offbeat languages. Today, we'll be looking at a pretty strange tongue that has one of the most offbeat origin stories I've ever come across. First, a little personal background. In the halcyon days of last summer, I was part of a project to restore the 1973 version of the

Starting point is 00:02:55 Intercal compiler. Entercal, for the uninitiated, is considered the first esoteric programming language. It was developed as a joke and a pretty witty critique on the state of programming in the early 1970s. It's very, very funny, I assure you. The original compiler had been lost for decades. Daniel Timkin and I were able to get Don Woods, one of its authors, to dig up and send us scans of the compiler source code. I then spent about a month getting the code to run on a modern computer. It was written in

Starting point is 00:03:33 this language called spitball, which is a dialect of Snowball 4. I'm not a very good spitball programmer by any stretch, but during this project, I became somewhat literate in the language. And, let me tell you, it's strange. Spitball, or really, Snowball 4, isn't like a anything I've ever used. It was one of the first languages ever developed to handle string data instead of numeric data. It's one of the first non-numeric programming languages. The syntax is totally unique. It's built around patterns and scanning cursors and text. It's just, well, it's odd. I talked about the language a bit during my coverage of the great Intercal Restoration. I didn't, however, talk about the language's history.

Starting point is 00:04:26 It only takes scraping the surface to run into some weird things here. First off, Snowball 4 is supposedly very, very different than earlier versions of the language. But that's not the wild part. Snowball first appears in 1962 at Bell Labs. From there, it takes the world by storm. It becomes, in large part, a crucial language for developing compilers. It's so important that the development of Snowball 4 is actually part of the larger Multics project. That means, in short, it was a very, very big deal and a very official deal.

Starting point is 00:05:09 As far as the deeper history and the actual details of the language, well, let's just say Snowball sounds more and more like an esoteric language to me. Parts of it almost feel like an intentional joke. Perhaps Intercal's commentary is just a little. a little too close to home in this case. Today, I'm going to be connecting some dots up. Snowball is one part important, one part innovative, and a couple parts silly. How does that fit into the larger picture of programming in the 60s and beyond?

Starting point is 00:05:42 Well, let's find out. The paper that got me started this episode has fast become one of my favorites. It's titled A History of the Snowball Programming Language by Ralph E. Griswold, one of the original developers behind Snowballs 1 through 4. I really like this paper because it's funny, it's detailed, and it's written very personally. Griswold does this thing where he uses first names to refer to everyone in the paper, so you literally have sentences like, Dave was the uncontested leader. I kind of want more papers like this, honestly.

Starting point is 00:06:24 The path to snowball starts with the early 60s in the programming research studies department of Bell Labs. Griswold is fast to point out that department is a bit of an overstatement. It was six dudes in an office of Bell Labs. In the early 60s, their programming weapon of choice was this awful language called SCL. The language had been developed by one Chester Lee, the head of the very small department. And SCL was not very well loved. I've been trying to track down any information on this language, but nothing has been super forthcoming. What we do have is a poor review.

Starting point is 00:07:08 To quote Griswold, although SCL had a number of interesting features, it had notable deficiencies. Poor performance, a severely limited data region, and intricate, low-level syntax that required lengthy programs for realizing simple operations, end quote. What, I ask, are those interesting features? Pray tell. Please. Apparently, SCL was described in a 1962 memo inside Bell Labs that was unpublished. So, all we really know about SCL is what Rizwald chooses to tell us.

Starting point is 00:07:47 And just to point something out, there's a massive works cited portion of this paper, and a lot of those work cited are unpublished memos, or in a few instances, notes scribbled on the side of unpublished memos. For obvious reasons, I'm just trusting Griswold on this one. The paper seems well researched, and I like the story of Presents, so uncritical support. Now, back to the topic at hand, the picture Griswold paints about SCL isn't very rosy. However, this is tempered by the period. 1962 is very, very early in the history of programming. The field is Fortran, Lisp, whatever Grace Hopper happened to be doing at Univac,

Starting point is 00:08:41 and the very early stages of Al-Gol. Those are the only things that would really be recognizable as programming languages in this period that had been well developed. There are more far-afield things going on. Macro-assemblers had become pretty sophisticated at this point, which allowed for some really fancy low-level programming. There was also IPL, which we've discussed before. That was basically a list processing language in the form of assembly code.

Starting point is 00:09:15 It's one of those languages that just feels super primitive because it really emerges before syntax. is a consideration. So there isn't exactly a whole lot to draw from at this point. We're at the stage where the players that form major language families exist, but those lineages haven't really had time to form. So we hit some Wild West stuff. We get to see interesting designs, truly unique ideas, and some real stinkers.

Starting point is 00:09:48 SCL sounds like it was one of the stinkers. One path for these early languages was to offer complex features but use assembly-like syntax and structure. That's what IPL did. It had all kinds of tricks for handling list data, but you programmed it step-by-step, almost like assembly language. From the brief description we get, I have a feeling SCL fit that style of mold. There are also some huge issues with these languages that had underdeveloped sense. tax. It's kind of a cascading effect. If you have to use a lot of code or use weird code to do something, you tend to get backed into a corner. You have to spend a lot of time writing that software. It's difficult to write and requires deep knowledge of the language. That bulk of software,

Starting point is 00:10:41 plus just the complexity and the annoyance, makes the code difficult to understand. Once it's written, it's very hard to deal with. There was a team inside this department, by a team, I mean three dudes, that was looking at non-numeric computing at this time. That is to say, projects that required more than just raw number crunching. One example that Griswold gives is Markov chains. These are programs that model how a system moves between states, giving each transition some probability and some kind of connection to other states.

Starting point is 00:11:16 It's often used in physics research in places where a simple equation doesn't work. They were also looking at problems of formula analysis or even some very primitive program analysis. As frustration mounted, the crew went looking for alternatives. But none were forthcoming. Early days and all that. So they decided in 1962 if there's not a good enough language, well, they'll just make their own. The primary conspirators here on the aforementioned Griswold, Ivan Polonski, and Dave Faber. This, as I said before, is half of the PRSD, and, crucially, it's a connected half.

Starting point is 00:12:02 Faber at this time was secretary for IBM Share, the well-loved user group. So this isn't one of those cases where some programmers get a little too crafty in a vacuum. They were up to date on the latest and greatest, and they didn't like the latest and greatest. I think that's a great place to start from. There are two things to note at this stage. Development would go very quickly and haphazardously, and there was not a name for the new language. The early path of the project is summed up pretty well in its first public paper, Snowball, a string manipulation language.

Starting point is 00:12:41 To quote, As symbolic manipulations become more complex, programming and machine-oriented languages becomes increasingly tedious and cumbersome. A number of programming languages have been developed to aid the programmer in such problems. As interest in language translation, program compilation, and combinational problems has increased, many of these languages have been used for types of problems for which they were never intended. It is clear that more general symbol manipulation languages will materially expand the class of problems that can be programmed with reasonable time and effort, end quote.

Starting point is 00:13:23 Now, that really speaks to the growth of computing, doesn't it? Machines are initially very sophisticated calculators. Languages spring up as tools to make that calculation work easier, to allow for more and more sophisticated calculations. We eventually reach this point where we have enough digital oomph to do more than just math, but the tools we have aren't up to the task. A great example of this is actually the compiler.

Starting point is 00:13:57 The process of compilation of turning a source code program into a binary program for execution isn't numeric. It's not a matter of number crunching. It's not a straight application of a fancy calculator. Rather, it's a matter of string manipulation, of symbol processing. You have to have code to analyze text data and then somehow convert it into machine code. That's very difficult and complex to do with, say, Orchan, or even assembly language. You can do it, and in this period, people did.

Starting point is 00:14:32 It's just a big lift. This new language that Griswold and his homies invented would target this niche, not specifically compilation, but symbol manipulation and text data. It was planned from the start to be a string manipulation language. Design would begin in the fall of 62. This would immediately lead to an issue. Recall, SCL was developed by Chester, the head of the department. three of his henchmen had gone rogue and started working on a replacement language because they didn't like what he developed.

Starting point is 00:15:11 He was already working on an improved version of SCL at this point. And this wasn't even a programming language development department. They were supposed to be doing other work, at least in theory. So it would be pretty dumb to have two-thirds of the department fighting over two new programming languages and implementing two, two compilers. But the new project skated by, from Griswold, quote, while Chester did not stop our work, there was considerable tension within the department in the ensuing months, end quote. What these nerds cook up was truly something. One way to see this uniqueness is to consider the core data type used by this unnamed language,

Starting point is 00:15:58 the string. That, for the time, was a new idea, at least pretty new. In Fortran, the core unit of data was just numbers. For LISP, it was the atom, which would then form lists. Fourth, a later language, is built around a stack, but Snowball is all about strings, storing them, inspecting them, manipulating them. Every aspect of the language is shaped by this decision. Now, that's not to say Snowball was the first string-oriented language ever devised. SCL was built around strings to some extent. There was also a slightly earlier language, Comet, that was developed for string manipulation.

Starting point is 00:16:45 Snowball ends up being more capable and much more widely used than these predecessors, so it's often called the first string manipulation language. Just a quick sidebar here, you can actually do this type of work in Lisp. You just transform a string into a list of characters, and you can write functions to surf for patterns and do replacement and all that good stuff. It turns out the crew that developed Snowball knew that Lisp existed, but that was it. Griswold literally says they only knew about Lisp vicariously. That's some pretty weird secondhand news to have.

Starting point is 00:17:27 The connection here is that the manual for SCL referenced some early work on Lisp. However, the Snowball crew's relationship with Chester Lee was tense in this period, so I can't imagine they were asking him for more details. Besides, Trio had a pretty good idea of what they wanted out of their new language. They weren't so much revamping existing art as they were taking notes from what had already been tried. That said, their experience with SCL and Comet would shape some of their decision-making. Okay, so let's talk design goals. We actually have a concrete list for once.

Starting point is 00:18:08 Griswold explains the new language was designed for novice computer users, as in non-programmers. that's kind of a no-duh thing at this point. Almost all languages have that core goal early on, but the rationale here is a little subtle. We have four main goals. Concisness, simplicity, problem orientation, and flexibility. On their own, that does sound kind of trite. But the core idea here is to make a language that's actually separated

Starting point is 00:18:45 from the nitty-gritty of a machine. The user should be able to describe the problem they want to solve, not necessarily the steps needed to solve that problem. That's a huge departure from contemporary languages. Even Fortran and Lisp, which are pretty removed from the machine, are still bound up by mechanical restrictions. In those languages, you describe the steps you want to take to reach some end goal. The trio at Bell wanted to move a little pattern,

Starting point is 00:19:15 that. A result is the language they develop is completely untethered from the computer in kind of a delightful way. That isn't always a good thing, but it is interesting. From Griswold, quote, these objectives had several consequences, most of which can be characterized as a disregard for implementation problems and efficiency. This was a conscious decision and was justified by our contention that human resources were more precious than machine resources, especially in the research applications where Snowball was expected to be used. As they design Snowball, they add features not because they're easy for the computer or easy to implement, but because they'd be easy and slick for the programmer.

Starting point is 00:20:04 This is actually an interesting counterpoint to Fortran. When that earlier language is developed, there is a... a very heavy emphasis on ensuring that compiled code would be fast. There is a high level of optimization that was developed over many sleepless nights. That was done because Bacchus and his team at IBM were fighting against the sentiment that any compiled language would be too slow to use. The Bell Trio, on the other hand, well, they don't care. They view the programmer's time, or more correctly, the researcher's time as the value

Starting point is 00:20:41 to be optimized. For a period when computer time was so expensive, this is a wild approach. This is very outside the norm. It was possible to calculate the cost of running a program, so it was usually incentivized to make a program run as fast as possible. But this is a rejection of the norm, a very deliberate rejection. I believe that context here is also relevant to mention. Griswold, Faber, and Polonsky were never charged for computer time. Within Bell, they had access to as much computer time as they wanted. This was partly due to the fact that they were just three dudes writing some weird language, so didn't actually hog enough time to cause concern.

Starting point is 00:21:28 That and the fact that Bell Labs was a pure research laboratory. The culture in the lab valued researchers over mechanical resources. Fortran came out of IBM. In that institution, computers were more highly valued. Perhaps you can see the shape of things here. But seriously, this has some weird ramifications. The first is that the new language is forever slow and hulking. It just never becomes efficient.

Starting point is 00:22:01 The second is that, almost by accident, the crew develops a very portable language. All right, I can't avoid this any longer. It's time to talk about the name. You see, the name Snowball doesn't come into play until pretty late in development. That name isn't chosen until around 1963 or 64 when the first papers are distributed. Up until that point, the group was stumped for a good name. They just didn't have a good candidate to name the new language. The issue proved to be pretty funny from Griswold.

Starting point is 00:22:43 The language was initially called SCL 7 reflecting its origins. At the time, SCL was officially at version 5, and the name SCL 7 reflected a feeling of advancement beyond the new potential version. Actually, the new language was very much different from SCL. With Chester's separate language design and an increasing disaffection with SCL on our part, we sought a new name, end quote. That quest led to, well, it led to some more stinkers. This includes the name sexy.

Starting point is 00:23:17 That's the string expression interpreter. That had some issues with distribution and some problems with even using the name in the lab. To quote, Dave recall submitting a program deck with the job card comment, sexy favor, as per Bell Telephone Lab standards, to which the I.O. clerk is said to have responded with that's what you think, end quote. Obviously, a better name would be needed. There was actually some pressure behind keeping the SCL name in some form, although perhaps a little more respectful than SCL 7. Quote, a curious aspect of the choice of a name arose in the conflict with Chester between his language and ours and how the personnel would be affiliated.

Starting point is 00:24:09 Implicitly acknowledging the success of the language ultimately to be named Snowball, Chester offered to join our effort if we retain the name SCL, but not if we picked another name. We decided to pick another name, end quote. As you can imagine, the relationship with Chester continued to be tense. they eventually land on Snowball. What does the name stand for? Well, it stands for a joke about them not having a Snowball's chance of having a working language, but it's actually technically an acronym. It's short for string-oriented, symbolic language. The S-N is from string, the O from B-O from symbolic and L from language.

Starting point is 00:24:58 This was also not loved. One researcher was quoted as saying, That's disgusting when they were told the acronym. Griswold was so ashamed of the name that, according to him, he would refuse to publish an explanation until well into the 1970s. I include these passages because, well, they're pretty funny. But more importantly, I want to convey the rough vibe around Snowball. It is a serious language, but it's surrounded by jokes. We'll call it sexy because that's kind of funny,

Starting point is 00:25:36 and oh, well, here's another joke name. Sure, we can make that work as an acronym. Who cares about implementation? We have plenty of computer time. It'll work out. This would continue into the actual implementation of the language. I wasn't joking there. The first interpreter was written by Faber on, to quote Griswold, the back of a business envelope one evening. The language, dear listener, is cursed. This new language, which I'm just going to call Snowball now, is developed and implemented over the course of, well, basically no time at all. Griswold would later estimate that nine human months of labor went into initial development. With three people on the team, that puts us at, say, three months between idea and implementation. The language that comes out the other side of this

Starting point is 00:26:32 process is very, very unique. It's also somewhat different than later versions of Snowball. For that reason, it's often referred to as Snowball 1. A full description is given in that 1964 paper I referenced earlier. That's the first time the public learns about the language, but at that point, Snowball had seen a lot of use within Bell Labs itself. In fact, the team behind Snowball even uses the language to format papers about the language itself. The core data unit in Snowball is, of course, the string. The treatment of variables is rather special. There are a number of languages that only use a single data type. B is a prime example here. That language only has one type that was roughly equivalent to a byte worth of data. Larger data structures were then built up from those

Starting point is 00:27:29 bytes. I propose that we call these single type systems monotypes. Because that's a funny name, and I like how it sounds, in some literature, these typing systems are called typeless, but I kind of don't like that because that implies that there isn't a data type. And in reality, there's one data type, but that's a different argument. Snowball is somewhat unique amongst other monotype languages, or at least it's an interesting approach to this type of design. It's very restrictive to only have one data type. B gets around that restriction by allowing users to define structures. You use the byte as a building block to make your own types.

Starting point is 00:28:17 Snowball doesn't do that. It's not nearly so sophisticated. You just get strings. You also don't exactly have declarations. You don't have to give Snowball a list of variable names that you want to use. Instead, you just start using variables and Snowball figures out if they need to be created. A variable doesn't exist, then it just assumes it's blank and gets to work. There are some major benefits here.

Starting point is 00:28:48 No types and no declarations means that a new programmer doesn't have to learn what either of those things mean. You don't have to prepare to use a variable ahead of time. You just use it. The three most difficult things for a new programmer to understand are assignment, recursion, and concurrency. If you can make one of those more simple, then you usually end up with a language that's easier to learn. But with Snowball, nothing is quite that straightforward. Case in point, assignment. You know, one of the big ones.

Starting point is 00:29:25 In Snowball, the equal sign doesn't exactly mean assignment in a traditional sense. It's a little more complicated than that. In almost every other language, an equal sign means to assign the value on the right-hand side to the variable name on the left-hand of the sign. X equals 1 tells the computer that you want the value of X to be assigned to be 1. In Snowball, the equal sign is used to describe string manipulation rules. In one case, that's assignment. but the symbol means more than just assignment.

Starting point is 00:30:07 For this to make sense, I'm going to describe the actual structure of Snowball. I almost don't want to call it syntax, because Snowball just isn't built the same as traditional languages. Its code is positionally dependent. That means what a word in a line of code does is dictated by where it is within that line of code, not by any syntax or wrapping or any extra parts of speech. The structure goes as follows. First, you have a label that's used to identify a line of code. Then you have a variable name.

Starting point is 00:30:47 Next is a pattern which has its own syntax that will touch on later. Then you have the equals sign. After the equal sign is a replacement value, then you have a go-to statement. each of those fields, those parts of speech, are separated by spaces. The kicker, well, there's two kickers. One is space is also used as the concatenation operator, so it's how you connect two strings to make a bigger string. That instantly makes, oh, it makes so much of this code very difficult to read.

Starting point is 00:31:21 The other kicker is every one of those fields is optional. That's how you can have a normal assignment. X equals blah is perfectly valid and acts like a normal assignment operation. But technically, it's a special case of string replacement. You're replacing all of X with the value, blah. That structure I just explained is the only statement type that Snowball supports. Every line of code has to follow that format. Any apparent flexibility is just due to the fact that fields are optional.

Starting point is 00:32:02 So, what's the more complex case here? Well, that is pattern matching and replacement. Snowball is really just a wrapper over something that functions like a regular expression. If you understand that, then you 100% get what Snowball is doing here. There's not a whole lot of subtlety outside of a primitive regular expression. expression engine. If you don't understand that, I'll give you the long story. A pattern is a way to describe the structure of a string. That's used to make matches against string data. If a match is found, then you can replace it with some value. Remember, though, replacement is optional.

Starting point is 00:32:48 You can choose to replace it with nothing, thus dropping the pattern. A downside to that approach is, When you're removing patterns, it just looks like you have an equal sign with nothing on the other side. That looks off. On its own, this is very primitive. You can use that to transform text, but that's about it. There are three simple features that actually make this useful. The first is the fact you can use variables to construct patterns and replacement values. That lets you construct meaningfully complex rules.

Starting point is 00:33:26 that gets even more useful when you factor in this thing called fillers. Now, that's the term the first snowball paper calls them, but the more modern term is a capture group or a capture variable. You can describe a pattern and then say that some part of that pattern should be captured and stored in a variable. This becomes more clear with an example. Let's say you're trying to find out what's between two matching parentheses in a string. You would construct a pattern, they would say, open parentheses, capture variable,

Starting point is 00:34:02 and then close parentheses. That means you'll match something that has an open and close paren than anything in between. That something can be, well, anything. And it'll get saved down into that variable. There is some annoyance here. One is that you have to specify that the variable is actually supposed to be a filler and not just a variable. That's done by rations. wrapping it in a pair of asterisk. Not awful, but that does introduce some syntax to keep track of. A larger annoyance comes when captures interact with Snowball's monotype system. Remember, you don't declare variables. You use them. That also goes for capture variables. You just write it and Snowball figures the rest out. That can lead to some code that's hard to read. You can have variables

Starting point is 00:34:54 get filled in two different ways, so it's pretty hard to actually get a grip on all the variables in a program. Again, it's not a massive issue. It's just annoying and seems to go counter to the whole premise of a simple language. Okay, so that's two of the three neat features of patterns. With those two, you can make some truly complex patterns and rules. You can capture a power of a and then use that to construct a pattern. You can even use the results of a capture in the same pattern. You could, say, look for text that has matching uses of parentheses. That would be handy if you want to find all instances where a function was called twice

Starting point is 00:35:38 with the same variable, for instance. Feature 3 that kicks us into overdrive is the go-to. Yes, Snobble includes go-toes. Yes, this is considered harmful, hardy-har-har-har. Griswold kind of addresses this obliquely in his history of Snowball. So here's the relevant context in case you're missing the joke. In 1968, Dykstra publishes this letter called Go-To Statement Consider Harmful. This is part of a larger shift in sentiment towards structured programming.

Starting point is 00:36:15 The main difference between structured and non-structured programming is how code is organized. In structured programming, code is broken into units like functions or classes or even just blocks of code. That turns out to be more advantageous in a number of cases, and it becomes all the rage. Snowball is not structured. It's freeform. In fact, the first version of the language doesn't even have functions. All control flow is handled using a special type of go-to. That's right, we don't even have ifs or four-loops. It's honestly madness. There are three types of go-toes, unconditional, failure, and success. Whenever you perform a pattern match, Snowball checks if the pattern was successfully found. If so, that line of code is considered a

Starting point is 00:37:11 success. At the end of each line, you can specify your go-toes, either an unconditional one that always executes a success go-to or a failure go-to or even a mix of the three. Normally, Snowball executes everything in order, one line after another. Go-toes alter that order. If a pattern match succeeds and you have a success go-to, Snowball jumps to the specified label. Same for failures. A crafty programmer can use this setup to create conditional statements, loops, and even

Starting point is 00:37:46 something that's not entirely dissimilar to a function. That, however, is a bit of work. In the most simple case, this is used like an if statement. Basically, try to match a pattern. If you do, go here, otherwise go there. When you get to loops, things get weird. I've seen this used for iteration. In the 1964 paper, there's an example of a loop that removes a list of vowels from a string of text. It starts off with some string that's a victim, and then a string that's a list of vowels. The loop first removes a letter from the list of vowels and uses that letter to perform a pattern match to remove the matching letter from the victim text. Then it loops back to the top with an unconditional transfer. The incondition is a failure

Starting point is 00:38:43 that triggers when Snowball is unable to remove a character from the list of vowels. That works. That's perfectly valid, but notice how confusing that is. I even skipped over a little of the complexity. The example loop, which I remind you is in the first published paper describing Snowball and how good it is. The example loop uses capture groups. Snowball initially has no way to get the length of of a string, you have to use formatting characters in the vowel list. It's not exactly straight forward, and it actually uses almost every single feature of the language. The function trick is even worse. That makes use of one final language feature, indirection. And this one is a doozy.

Starting point is 00:39:36 Further explanation, I'm just going to quote the 64 paper directly. This is accomplished in Snowball by writing dollar sign in front of the string name. Thus, if the string factor contains the literal term, writing dollar factor is the same as writing term, end quote. Got it? If you did, then you're faster than I. This is kind of like a pointer. It lets you load up a variable with the name of some other variable and then reference that. Maybe call it point by name. I don't know, it's just strange. In fact, PHP has the ability to do this very thing, and I actively try to avoid using that feature

Starting point is 00:40:24 because it's so error-prone and confusing. But the power here is you can write code that is very subtle and very flexible. Let's say you're writing a compiler in Snowball, like, say, the Intercal compiler. You have a keyword in that language called forget. One way to handle forget would be to have a big if statement. If input text is print, then handle print. If input texts is forget, oh, that's the one we want, handle forget. So on and so forth. But remember, no ifs and snowball, so the actual code would be messier.

Starting point is 00:41:01 What some folk would do is have a label in your Snowball program called Forget. that label would point to the part of your code that knows how to deal with an intercal forget statement. Then you have some code that extracts statements from your incoming text. You find the statement and then you do a go-to that jumps to dollar sign statement. If the text is forget, then you end up in the correct place for it. In this way, a programmer can work up cases for all kinds of different inputs. The InterCal compiler uses this trick mixed in with more complex pattern matches with go-toes. It's simple to write, but conversely, very hard to read.

Starting point is 00:41:52 In the case of compilers specifically, it leads to weird blending. The source code for the Intercal compiler is full of labels, and variable names that line up with parts of the Intracal language. That would have been pretty nice while you were writing the compiler. Forget gets processed and a label called Forget. That's very straightforward. But reading the thing is painful because you never see anything that says go-to, forget. You just see indirect jumps.

Starting point is 00:42:28 This is actually a cool example of loss of theory. If you are actively working on the project or if you wrote that code, then everything here is pretty clear. The go-to jumps off to a logical place. And to be honest, it's a smart trick. It's pretty cool. It's just a cool way to do things, right? But it makes the code very hard to read, which in turn makes it hard to maintain. A good program is one that does the job as simply as possible.

Starting point is 00:43:00 frills and tricks, well, they're cool. They're nice, but will you remember them in a year? Will someone else be able to puzzle out what your super smart code does? I mean, yes, they can. I know because I can, but it takes a lot of work. It's a huge cognitive load. A corollary here is that a good language is one that lets you write code that does the job as simply as possible. That's why it can really pay off to have special purpose languages. If you have to use tricks to do something like a loop or a function, or you have to use indirection and labels to simulate if statements, then there's something deeply wrong going on.

Starting point is 00:43:48 This is a tricky problem because, well, of its double-headed nature. To quote Griswold, I view the best points of snowball to be the ease of use and its comfort. feeling. It is somewhat like an old shoe. Snowball programmers characterize it as a hospitable, friendly language that allows much to be accomplished quickly and easily. It's one-line programs for problems such as character code conversion and reformatting data cards are famous, end quote. Once Snowball enters use, people really like it. And why not? It's a slick language. It's easy to use. Once you're familiar with it once you break in the old shoe, you can write all kinds of handy code.

Starting point is 00:44:34 No mess, no fuss, and no overbite. I actually got to this paragraph in Griswold's history while I was on a plane and had to hold back laughter. I try to not look all that deranged in public, I swear. This passage struck me as funny because it's almost exactly how I feel about Pearl. Now, to be clear, I know some Pearl heads are in the audience. I love Pearl. Whenever I poke at the language, it's from a place of appreciation.

Starting point is 00:45:06 I wouldn't even talk about it if I hated it. Pearl is sometimes called a read-only language, because practitioners can write truly incomprehensible code. It's not so much that Pearl forces the programmer to use tricks. It's more that it's free-form nature lets you get away with some, well, nasty stuff. You can write really smart code and pearl, which is, it's a trap. As a result, I, yes, even I, your intrepid host, have written some truly read-only pearl in my time. That language is also famous for its one-liners. I remember at one point writing this beautiful

Starting point is 00:45:48 one-liner to extract specific data from HTML tags on a web page, use that data to format links, and then download all those links. Truly a smart piece of code, but one that's impossible to read. Pearl and Snowball can be used in a very similar way. These are both string processing languages. That niche ends up being useful for both traditional large-scale programs

Starting point is 00:46:18 and also these short ad hoc programs. The second type includes famous one-liners and also things like one-off utilities are just quick data manipulation tools. You tend to use these kinds of tools as, well, just that. Tools. They're about as handy as a Swiss Army knife. In a pinch, you can use a Swiss Army knife for almost anything. But there are some things you probably shouldn't use it for. Where this gets gnarly, I think, is with Snowball's target audience. Snowball was specifically meant to be used by novices, people who haven't programmed before. Now, to be fair, it's the 60s we're talking about.

Starting point is 00:47:03 Most folk don't even know what a computer is. The pool of novices was larger and deeper. So what would happen if a novice learned to program in Snowball? It would, after a fashion, fit like a well-worn shoe. That could twist the minds of budding programs. in unexpected ways. Back when I was yet another Pearl hacker, I remember an older programmer telling me this.

Starting point is 00:47:32 Pearl has whatever structure you bring with you. Pearl provides all the tools you need to write reasonable code. In fact, I've seen some fantastic examples of the craft written in Pearl. I'd like to think I even wrote a few nice functions, perhaps. To do that, you have to have experience. You have to know the difference between good and bad code. You have to actually know structure to bring it in with you. It took me years and years to get the knack for telling the difference between good and bad software.

Starting point is 00:48:05 It's a process you have to improve on every day. In that regard, it really is best to think of programming as a practice. Now, think of a bright-eyed researcher entering programming via Snowbolt. That could be dangerous. It's possible to write good code and snowball. I have no doubt of that. But with the lack of structure, with the strange tricks needed to do basic things,

Starting point is 00:48:34 and with a truly unique design, I think that very few programmers would craft particularly reasonable code. That could lead to all kinds of issues. For example, look no further than the design of Snowball 1 itself. We're all influenced by, our environment. That's an undeniable fact. The trio working on Snowball had primarily only experienced SCL and Comet. Each of those languages had issues when it came to scale. Comet had limits on the size of data it could work with and the size of your program. SEL also had data space issues.

Starting point is 00:49:13 The Snowball crew explicitly wanted a replacement for SCL because of its data limits. As a result, Snowball did a really neat thing with data. A string in Snowball is variable length. That's not a big deal today. We're used to languages accepting strings of any size. Snowball was one of the first languages to do this. As you work on a variable, its size can change dynamically. If you expand it, well, Snowball deals with that for you.

Starting point is 00:49:45 As the variable falls out of use or shrink, Snowball makes adjustments as needed. that's, well, at least all kinds of problems with implementation. That's actually very complicated to implement. But a core principle of the project is that implementation had to take second place to useful features. The team didn't want a language that forced restrictions on string size, so they chose to make their implementation work a little harder to support these fancier strings. That solved their data problems.

Starting point is 00:50:17 but they didn't address a bigger issue. A snowball program could initially only have 2,000 elements. The more common word here would be tokens. If a line has six elements, it can actually have more, but let's say six things separated by spaces, then you could only have about 300 lines of code. That's not a very big program. They figured that would be enough

Starting point is 00:50:46 so they didn't bother supporting larger programs. Why did they feel this way? Well, Comet and SCL both put practical limits on program size. The Snowball crew wouldn't have routinely worked with a program that was very big, so they couldn't conceive of a reason why you'd need to write a big program. Their past experience, hamstrung, Snowball 1. In the same way, a novice that's introduced via Snowball, would be trained up with all these weird practices.

Starting point is 00:51:19 That could, over time, lead to a programmer with a strange worldview. It could stunt their growth or lead to them writing nasty code down the line. More the more wild claims that Dykstra makes in GoTo statement considered harmful is that the use of GoTo could damage the brain of a budding developer. That may sound a little dramatic, but the 2000 element limit of Snowball 1 is a perfect example of that kind of damage in practice. This is the part where I get to thread a very fine needle. I'm not saying Snowball is a bad language.

Starting point is 00:52:02 I know I'm ragging on it, but I want to be clear here. Snowball, especially in its first version, is weird. It's very unique as far as programming languages go. There really is nothing quite bad. like it. I really mean that. This isn't just a platitude. Folk have actually created these really neat genealogical trees of programming languages. Accurate genealogies will show that Snowball emerges as its own thing father of itself. Its descendants are pretty isolated from other languages. It's kind of its own little island. Maybe call it its own tree.

Starting point is 00:52:48 That can make analysis a little difficult. Griswold points out that Snowball never really followed any popular convention. In fact, it stands pretty hard in opposition to the mainstream of programming ideology. The mainstream in the 60s, at least in the mid-60s, was structured programming. And that's the camp that won out. Modern languages adhere to that design philosophy in a very fundamental way. As a result, analysis of older languages is tricky. Modern programmers tend to have a bias towards yesteryear's mainstream.

Starting point is 00:53:29 In that sense, history is written by the victors. So it's easy to look at a language that's outside the mainstream and just go, yeah, that one sucks. It's a lot hard to look at a language in its specific context. Snowbolt, compared to modern tastes, is insane. But in its context, in its period, that's not entirely the case. In fact, Snowball was pretty beloved and pretty successful. I have some evidence for this in the form of ports, expansion, and actual software.

Starting point is 00:54:07 The simple fact is that Snowball was immediately put to use. By 1963, just as the first implementation was running, programmers in Bell Labs were in join. the new language. This wasn't just the trio that developed it. That very year, at least four programs were written in Snowball. One was to convert network diagrams to Fortran. The other was to convert data into IPL programs. There was an expression evaluator and a Fortran compiler. That last one is wild to me, and I wish I could find more information about it. Griswold has a habit, like I mentioned, of citing papers that are tucked away in a file cabinet somewhere. In this case, we have Dave Faber's Fortran compiler for Double Precision Project, 1963.

Starting point is 00:55:03 It's a program listing that's kicking around somewhere that's a whole compiler written in Snowball. Timeline-wise, this must have been written in like a month or two. The point is, Snowball was popular enough and powerful enough to see instant use. What's more, this is very niche use. These four early programs were all language tools. In fact, discounting the expression evaluator, I'd call these all compilers. That is very niche and very unexpected. It was unexpected for the Snowball crew as well.

Starting point is 00:55:42 They didn't set out to develop a tool for writing compilers, but that's what they ended up creating. It turns out that Snowball was good at the kind of string analysis needed for compilation. This is particularly funny because at the time, there were actually very few options for writing compilers. Traditionally, the tool of choice for a compiler was either machine code or assembly language. There were some, and I stress some languages that could self-host, as in, they could be used to write compilers for themselves. LISP is one of the earliest examples, but that's not quite a compiler.

Starting point is 00:56:23 LISP is weird so we can discount that. The point is, Snowball kind of snuck in as the only other reasonable option for building compilers. That would help it spread out and gather many adherents and this, happened quickly. In 1963, the first paper describing Snowball was released to IBM Share. Recall, Faber was Cher's secretary at the time. The Snowball team treated that paper's publication as de facto permission to distribute software. Griswold recalls that, at the time, Bell didn't really know what to do about software rights and permissions. That would continue far, far into the future. Unix is a much more long-winded example of Bell, failing to come to grips with how to handle

Starting point is 00:57:11 software. Anyway, Griswold would do the rational thing. If someone read the snowball paper and got in touch, he would send them a tape with the compiler on it. Years later, Ken Thompson would do the same thing with the source code for Unix. We're still kind of in this era where folklore is being forged. I have to wonder if Ken knew about the Snowball tapes. Anyway, Snowball was so well-loved that programmers actually complained about it. Now, I know. That may sound paradoxical to the outsiders listening. If folk really don't like something, they usually avoid it, but it takes a certain amount of appreciation for something to complain and ask for changes. There were enough complaints, both from within the Snowball team and from the growing user base, that the language

Starting point is 00:58:08 was revised. That led to Snowball 2, 3, and eventually 4. Each was an incremental improvement with Snowball 4 being the most sophisticated and most fundamentally different version. Features like functions were added, and the compiler was entirely rewritten. I've been told that Snowball 4 was basically a new language. That's partly true, but I'm not. also, partly wrong? Snowball 4 retains the same basic structure of Snowball 1. There are, however, two huge changes that even an outsider, as myself, can notice. The first is that Snowball 4 supports data types other than string. Numbers and more complex structures like tables, arrays, and user-defined types are possible. That really made Snowball a much more reasonable language.

Starting point is 00:59:01 Honestly, you may think that the addition of numbers is the big deal here, but that's not really the case. Structures are a bigger deal. In older versions of Snowball, you could make structures, but you had to use in-direction and string matching, which is basically a hack. The other huge change was deferred evaluation. Now, this is a whole topic unto itself. If you find me at a bar, perhaps at VcF SoCal in February, 26, next month, get your tickets at VcfsoCal.com, then buy me a beer and I'll talk your ear off about this.

Starting point is 00:59:43 But in short, Snowball 4 uses something like just in time compilation. You can evaluate strings as code, which lets you do things like writing new code on the fly. That gives you much more flexibility. Basically, Snowball 1 is closer to a toy, and Snowball 4 is a much more fully-fledged language. All this is to say, Snowball is extremely well-received, and it's improved, iterated, and distributed. It was beloved by many, but it was still the weird outsider language. Allow me to share one final story that just keeps me chuckling. We've talked about Multics before on the show, but it's been a while.

Starting point is 01:00:28 Multics was a huge project to create a highly secure time-sharing operating system written in a high-level language. This was attempted by a coalition of companies, including Bell. At the time, this was a hugely ambitious project. Most folk know of Multics because it leads to Unix, kind of. After years of work and delays, Bell got frustrated and pulled out of the coalition. Ken Thompson and Dennis Ritchie had been working on Maltics and were suddenly a little light on work. They liked the Maltics programming environment,

Starting point is 01:01:04 but it was pretty big and unwieldy, and they didn't have access to it anymore. So they started working up a smaller version of the operating system for personal use. They called it Unix as a joke on the name. Because Maltics is multiple, Unix, that's just one. Easy. But Maltics is a fascinating saga in its own right. This is partly because the project is very well documented. There are piles of preserved documents detailing the day-to-day of the project, interviews with just about everyone involved, and these huge, detailed lists of programmers in the project. It's a mountain of data, to say the least. From Griswold, quote, in retrospect, it is interesting to note that although Snowball 4 was officially part of the Multics project, the affiliation

Starting point is 01:01:55 was largely ignored by both sides. No schedules were established, no goals were projected, and no reports or documentation were ever requested or supplied. End quote. That's so snowball, right? Maltics is notorious for its rig, its heavy-handed project management, and all this data. Everything was documented, except Snowball was somewhere. Jokes aside, Snowball was such a big deal at the time that when Bell went all in on Multics, they had to make sure Snowball came along. Bell planned to use Multics for all their needs. That meant they had to have a Snowball compiler for Multics.

Starting point is 01:02:41 Simple. The net result here is that Snowball 4 ends up being ported around quite a bit. One of the Multics project's goals was portability. It was written in a high-level language so it could be compiled for any computer system. Well, the same was true for Snowball 4. It turned out to be very easy to do that, because from the beginning, Snowball was designed to keep the user completely isolated from the muck of the machine. Its design is so portable, in fact, that you can run extant Snowball code. on computers today.

Starting point is 01:03:21 That's a level of portability and future proofness, for lack of a better term, that, well, I've seen in very few languages. It really is kind of a testament to the uniqueness of Snowball. All right, that does it for a dive into Snowball. Now, I know there's a lot more that could have been discussed. I didn't even get into SIL, but hey, I'll leave that as a rabbit hole. for the listener. Snowball's a funny language. In fact, I don't think I've ran to a story this funny in a few years. It's a wild ride, to say the least. Many programmers have ground

Starting point is 01:04:05 their teeth and thought they could write a better programming language. Few have actually attempted it. Still, fewer have succeeded, and fewer still have done it with such a strange humor and wit about the whole thing. Since Snowball was crafted in such an early period and for such a specific niche, it ended up being a pretty odd language. Griswold is correct when he says it stands against the mainstream in programming. This was intentional from the start. Snowball was one of the first languages that stood against the viewpoint of machines as fancy calculators. That on its own is fascinating. At the same time, Snowball is a pretty gross language to look at. In reading more, I've started to understand, well, I've started to understand more and more of Intercal's critiques of programming the era.

Starting point is 01:04:58 I really do feel like every time I go back to early programming languages, I get more of Intercal's jokes. That might not really be a good thing for my psyche. Anyway, however you choose to look at it, the saga of Snowball is something we could all learn from. There's a certain amount of pluck and really kind of a visionary attitude that went into creating this language. That's all for me this time. Thanks again for listening to Advent of Computing. You can find links to everything on Advent of Computing.com. And you can see me live at Vcf SoCal, third year running for this event in February 2026.

Starting point is 01:05:42 You can get tickets and information at VcFSoCal.com. if you're listening, that means I'd love to see you. Come on down. It's a great event and a wonderful time. Until next episode, thanks for listening and have a great rest of your day.

Advent of Computing - Episode 175 - SNOBOL? That's Disgusting!

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.