Advent of Computing - Episode 74 - The Elegance of LISP

Episode Date: January 24, 2022

This is the conclusion to my exploration of why LISP is the "mother tongue of artificial intelligence". We pick up from the end of last episode and continue to cover the early days of AI. We follow t...he meandering path from the FORTRAN List Processing Language and IPL, up to pen-and-paper versions of LISP and into the first true implementation of the language. Along the way we will see just why LISP is called elegant, and how it was tailored for thinking machines. Selected Sources: https://sci-hub.se/10.1145/321021.321022 - FLPL http://www-formal.stanford.edu/jmc/mcc59.pdf - Machines with Common Sense https://dspace.mit.edu/bitstream/handle/1721.1/6096/AIM-008.pdf - AI Memo 8

Transcript
Discussion (0)
Starting point is 00:00:00 Lisp is a strange language. Well, I guess that might not be totally fair. Let me rephrase that a little bit. Lisp looks strange when compared to other programming languages. If you ran into it in the field without any prior exposure, then you'd probably have a hard time classifying it. hard time classifying it. It doesn't have the semicolons so familiar in C and C-like languages. Lines don't start with numbers like in BASIC. You don't get separate data and code definitions like in COBOL. And, perhaps most upsetting of all, Lisp doesn't even have the familiar mathematical expressions seen in Fortran. That last piece is especially strange, because Fortran, for all of its faults, was a massively influential language. In many ways, it was THE programming language.
Starting point is 00:01:00 While an argument could be made that Fortran wasn't the first high-level language, it was the first one that was fully implemented and saw actual use. A huge number of later languages can draw their lineage back through Fortran. The path may be a little bit winding, but hey, just look for a few equal signs. Now, Lisp and its descendants do not share that distinction. This language just plain doesn't look like anything else. Look at a fresh line of Lisp, and the first thing you'll probably notice is going to be a C of parentheses. Then, you'd see that operators all seem to come before arguments, a bit of an inversion to the usual mathematical system.
Starting point is 00:01:43 Look deeper, and the differences will just continue to pile up. I firmly believe, or at least I hope, that people don't create new programming languages just to be different. There has to be more going on here. Difference isn't often a sound design goal. There has to be a reason why Lisp is so profoundly unique. And there is. You see, Lisp was built differently than its contemporaries. Among other things, Lisp was designed as a tool to make machines think. Welcome back to Advent of Computing. I'm your host, Sean Haas, and this is episode 74, The Elegance of Lisp. Today, we're picking up where we left off last episode.
Starting point is 00:02:46 Just as a quick recap, last time we looked at IPL, the first language designed specifically for use in artificial intelligence. That episode, as well as this one, are my attempt to understand why Lisp is called the mother tongue of AI, and, in a more micro kind of view, why lists are a critical feature for that specific application. IPL, or the Information Processing Language, was built around lists. This was especially visible in more complex structures like trees, which are, I must always remind myself, just built out of simple lists. To get even more specific, the real meat of IPL was all about handling linked lists. Now, I think I was able to pretty well establish why lists matter so much last episode. It all comes down to structuring data, building decision trees, and representing complex and dynamic problems in memory. I've been
Starting point is 00:03:40 able to hammer that part into my own head at least. The last half of my question is what we're going to be tackling today. That is, why is Lisp specifically called the mother tongue of AI? Why not IPL or why not some other language? And of course, the easiest way to approach that question is to look at the development of Lisp. One easy contributing factor that I need to throw out before we go any further is the fact that IPL would have just sucked to use. It was a really, really early programming language coming onto the scene before a lot of conventions and niceties were established. IPL looks a lot more like assembly language than anything else. It had a
Starting point is 00:04:27 lot of the features that we'll be seeing reimagined in Lisp, but IPL was just a pain to work with. It was only a matter of time before something better came along. So let's look at that something better. We're going to be examining the roots of Lisp, the Lisp processing language, and we're going to be trying to understand its connection to AI. Why has Lisp become so tightly associated with this one field? What does it offer over other languages? And hey, what's with all the parentheses anyway? This episode, we're going to primarily be following the early career of John McCarthy. And my impression of McCarthy is that he was a very busy man. This seems to have been especially true in the late 40s and early 50s. He earned a bachelor's degree in mathematics
Starting point is 00:05:19 in 1948 at Caltech. He studied for his PhD first at Caltech and then at Princeton. He became a Dr. John McCarthy in 1951 after completing his dissertation. Between then and 1955, he taught at Princeton, Stanford, and eventually Dartmouth College. McCarthy was also entering mathematics at an interesting point in time. In 1945, some of the first digital computers started crunching numbers. By the time McCarthy graduated from Princeton, the first stored program machines had come into being. He got the chance to go from just hearing about computers in lectures to coming face to face with functioning machines.
Starting point is 00:06:02 It must have been fascinating to watch a new discipline, a whole new branch of the sciences, unfold around you. And in a very profound way, this early era of computing reshaped what kind of mathematics were possible. So how do we get from this emerging field to artificial intelligence? What's the connection between John McCarthy, the mathematician, and John McCarthy, the eventual creator of Lisp? Well, we don't have any one big moment to point to. It almost sounds like McCarthy naturally slid into computing, and from there, his curiosity and exposure to new ideas pulled him towards artificial intelligence. In an oral history
Starting point is 00:06:46 interview with the Computer History Museum, McCarthy explains that there were a number of influences driving him towards thinking machines. Early inspiration may have come from a lecture on behavior in animals given at Caltech. According to McCarthy, the speakers compared the brain to a computer, thus sending the soon-to-be doctor down the rabbit hole. The only real issue is that McCarthy doesn't seem 100% sure about this one. Quote, I went and looked at the proceedings of that symposium, and I had supposed that artificial
Starting point is 00:07:19 intelligence had been discussed there, that is, the use of computers to behave intelligently. The actual inciting incident here may just be a mix of factors. Whatever the case, it's safe to say that McCarthy was on the AI tip really early in his career. His position in academia also helped drive him towards more specific goals. In that same interview, McCarthy talked about meeting John von Neumann, a figure that looms really large in the early development of computers. I only had one interview with von Neumann, in which I told him some idea that I had about artificial intelligence, representing a brain and its environment by finite automaton. And he encouraged me. He said, write it up, write it up. But afterwards, after my conversation with him, I came to the conclusion that it was, in fact,
Starting point is 00:08:15 a bad idea, because one couldn't represent knowledge in terms of the structure of finite automaton, which was to represent the brain, so I didn't write it up. Now, I'm bringing this encounter up early to show a crucial point. One of von Neumann's larger areas of study was the so-called finite automaton, sometimes called the cellular automaton. This is a well-defined system of automata, that's small entities that perform some predetermined task based on a series of inputs. Think of it like a tiny program. Conway's Game of Life is probably the most well-known example of finite automata at work. In that system, cells will either turn on or off depending on the number of active neighbors.
Starting point is 00:09:04 In general, automaton can be defined to follow any rule set. By carefully choosing your rules and carefully generating some initial state, you can create pretty complicated systems. Some of these systems can even behave unexpectedly, which gives the idea of intelligence. The theory McCarthy is alluding to here is that finite automata could be used to model living cells, more specifically the neurons that make up a brain. On the surface, this seems reasonable enough. A cell is kind of like its own little self-contained pre-programmed entity, at least if you reduce things down far enough. It can react to outside stimuli. If you could simulate the operation of a single cell, then it's just a hop,
Starting point is 00:09:52 skip, and a memory jump to simulating an entire brain. Of course, it's not exactly that simple. This assumes that we know precisely how a brain cell functions, or at least that we know every possible response to every possible stimuli, and we can simulate that correctly. We kind of can't do that. It also assumes that we know how a cell in a brain is connected to all of its neighboring cells. Now, I'm no flesh scientist, but I think it's safe to assume that the human brain isn't exactly a 3D matrix of little cubic cells. There's some ambiguity there that we don't really know how to model yet. But that all aside, McCarthy had a very different issue with this automata model. As he put it, one couldn't represent knowledge with automata alone.
Starting point is 00:10:50 In other words, automata are a poor substitute for memory. Even at this early juncture, McCarthy understood that memory would be an important component for any artificial intelligence program. The next step on the road towards LISP would take place in 1955. That year, McCarthy and a handful of his colleagues started planning a workshop. From their initial proposal, We propose that a two-month, ten-man study of artificial intelligence be carried out during the summer of 1956 at Dartmouth College in Hanover, New Hampshire. New Hampshire. The study is to proceed on the basis of the conjecture that every aspect of learning or any other feature of intelligence can, in principle, be so precisely described that a machine can be made to simulate it. An attempt will be made to find how to make machines
Starting point is 00:11:38 use language, form abstractions and concepts, solve kinds of problems now reserved for humans and improve themselves. We think that a significant advance can be made in one or more of these problems if a carefully selected group of scientists work on it together for a summer. End quote. The plan was to form something like a temporary think tank. Work on computer intelligence had been occurring throughout the country at this point. The proposal itself was pinned by researchers from four different institutions. You have McCarthy from Dartmouth, who was the primary author of the plan, but you also have Marvin Minsky from Harvard, Nathaniel Rochester from IBM, and Claude Shannon from Bell Labs.
Starting point is 00:12:22 McCarthy wanted to pull the field together, exchange ideas, and hopefully focus efforts on some concrete projects over the summer. To that end, McCarthy invited nearly 50 researchers to the summer session. Of those, only 10 would show up, and even then, most were only there for part of the summer. So maybe the grandiose plans didn't fully pan out, but the Dartmouth Summer Workshop was far from a failure. It was the first real symposium dedicated to artificial intelligence. In fact, this is where the field was named. McCarthy seemed to have decided on the term early on, so when it came time to put together a conference, he stuck the AI label on it, and it's been called artificial intelligence ever since. Now, there are two groups of attendees that
Starting point is 00:13:13 we have to talk about. The first is probably the most obviously influential on the future of Lisp. Alan Newell and Herb Simon came down to Dartmouth to show off Logic Theorist and IPL. The two would only be there for a few weeks, but their appearance offers a link into a larger story. If this wasn't McCarthy's first exposure to IPL, then it was at least his first major contact with the language. And, to be quite blunt, McCarthy did not like what he saw. The second group that matters here is IBM. Now, the only employee from Big Blue to actually go to the workshop was Rochester. That said, there were more IBMers on the invite list that didn't show.
Starting point is 00:13:57 This included John Backus, the creator of Fortran. At this point, IBM had a profound interest in AI, at least on the research side of their operation. As McCarthy himself explained in an essay on the development of Lisp, IBM was undertaking to develop a program for proving theorems in plane geometry, based on an idea of Marvin Minsky's, and I was to serve as a consultant to that project. At the time, IBM looked like a good bet to pursue artificial intelligence research vigorously, and future projects were expected." Once again, the problem of proofs rears its ugly head. Now, I think these two groups at this
Starting point is 00:14:41 workshop, RAND and IBM, are important because they mark the ends of a spectrum of solutions to the AI problem. That's kind of a mouthful, but stick with me here. On the RAND side, we have IPL, a language designed from the ground up specifically for AI. It has fantastic support for lists, but it's also hard to work with. On the IBM side, we have something else entirely. BigBlue was a Fortran shop. I guess you could just say it was THE Fortran shop. At the time, IBM was the only vendor in the world to offer a truly high-level programming language, since, you know, Fortran was really the only game in town. It offered something IPL didn't have, expressiveness. Fortran actually has fully-fledged syntax. You can type in complete lines of code
Starting point is 00:15:39 strung together with special characters. You can even write mathematical equations and expressions in Fortran that look really close to their handwritten equivalents. Fortran even has lists. Well, it has arrays. The distinction here actually matters a lot. A Fortran array is just a blob of data subdivided into elements. I know, that's boring. It's flat. It doesn't have any dynamic life to it. Each element is of the same fixed size, and the array has a fixed length. So, boring.
Starting point is 00:16:19 To expand the array, you have to shuffle around blocks of memory. IPL used the much more flexible and dynamic linked list, wherein each element contains the actual stored data and a pointer off to the next element of the list. So, not boring, alive, and dynamic. That may seem like a small detail that I'm kind of overblowing, but as we saw last episode, dynamic data storage is crucial for AI. The different ways that IPL and Fortran handle list data impacts not only how you
Starting point is 00:16:54 can use those languages, but what kind of programs are viable in each. Just remember, a language's usefulness is often determined by how it deals with memory. Now, IBM also offered something that the small IPL team at RAND couldn't. Hardware. At roughly the same time McCarthy was finishing preparations for the Dartmouth workshop, IBM started planning the New England Computing Center. Cool name, but it's a little misleading at this point in time. This was an initiative, together with MIT, aimed at providing New England colleges with
Starting point is 00:17:32 access to a computer. IBM installed a single 704 on MIT's campus. MIT got the bulk of the computer's time, but the remainder was to be shared between other schools in the region. So, as you can see, McCarthy was able to get access to computer time. At least, in theory. The New England Computing Center would take a little bit to get up and running, and even then, McCarthy could only get limited time on a single computer. The machine could handle assembly language and Fortran. Two tools not very particularly well suited to AI, I might add. But it was the best path that McCarthy could
Starting point is 00:18:14 take. At least for the time being, McCarthy would bet on IBM. The other factor here is McCarthy's aversion to IPL. On his first run with the language at the summer workshop, McCarthy had this to say, quote, The people who were furthest ahead at the time were Newell and Simon, who came for just a few days, and I got the idea of list processing from them, but did not at all like the language that they used, end quote. Some ideas did flow in from IPL, but ultimately McCarthy did not like the language's approach. Like I keep saying, and will probably say a few more times, IPL had all the tools you needed for AI.
Starting point is 00:19:05 It's just that everything is hidden inside a language that's not exactly easy to learn or use. The IBM and Fortran connection here is the more fruitful branch. After the summer workshop, McCarthy would actually consult for IBM on one of their AI projects. And I want to take a second here to examine this part of the story before circling back in time a little bit, since it takes us to a weird place that I think is important. McCarthy was consulting for the Geometric Proof Project that I mentioned briefly. This took place after his exposure to the IPL way of AI. So, quite naturally, he suggested that the IBM team wouldn't be able to make very intelligent software without list processing capabilities. This would lead to the birth of Fortran List Processing
Starting point is 00:19:52 Language, or FLPL. I don't know if I can string together a more frightening series of words. FLPL is described in a little paper called a Fortran compiled list processing language. And for whatever reason, the only scan I can find of this paper must have been slightly damp or wrinkled when it was digitized. The top few lines of every file are a little bit wavy, which I think adds a fitting chaotic effect. FLPL wasn't an entirely new language. It was instead an extension to Fortran's standard libraries that added support for linked lists, or as the paper calls them, Newell-Shaw-Simon lists. You know, just in case you need a reminder that linked lists came directly from IPL. Some of the specific implementation details are different, but by and large, FLPL follows the
Starting point is 00:20:53 RAND prescription pretty closely. It even keeps track of as-yet-unused cells using a big free list. FLPL came packed with most of the functionality you needed for working with that error's take on AI. You got dynamic lists, you can easily build trees, and you have all the functions needed to traverse and modify list data. FLPL even introduced pointers to Fortran, which allowed list elements to effectively contain any data type. It sounds a lot like IPL, but better, right? Well, yes and no. FLPL is a bit of a monster. It fits into this uncanny valley, that's the best way I can put it. It comes with most of the features that made IPL an attractive option to people like McCarthy, but also comes
Starting point is 00:21:45 with all the issues that pushed researchers away from Fortran. You also get a handful of new idiosyncrasies just for some good measure there. McCarthy's biggest issue with FLPL was that it didn't support recursion. Now, I've been trying to never actually get into recursion on this podcast. It's one of those programming and mathematics concepts that can be a little bit hard to wrap your head around. But we're quickly heading towards lisp town now, suckers, so we're gonna have to break this specific podcast taboo. Recursion occurs when a function is defined in terms of itself, or in more computery terms, a recursive function is a function that calls itself. The classic example is the factorial, often notated as x! The factorial of x is defined as x times the factorial of x minus 1. That kind of circular
Starting point is 00:22:49 definition is perfectly valid in mathematics. When you program a factorial function, you usually have to add in what's called a termination case. That is, you stop the function from calling itself once x is equal to 1. It's a little mind-bendy if you aren't used to the concept, I will admit that. So why is recursion such a big deal? The easy answer is that it's just plain powerful. It's one of the most powerful tools in a programmer's belt. Recursion can be used to construct really elegant and, if I can be frank, really cool solutions to complex problems. But speaking more specifically, recursion is a really natural way
Starting point is 00:23:34 to tackle trees of data. For a good example, we can look no further than IPL and Logic Theorist. In that program, logic expressions were represented as tree structures. We can make this a little bit easier to think about by jumping into more common mathematics, but the theory here will still hold. So let's say you start with a simple expression, x plus y times z. You can transform that into a tree with relative ease. The root of the tree will be the main connective. In this case, it's the plus operator. That operator takes two arguments. One is just x, so that ends the first branch. The other argument is the result of a multiplication, so the second branch will be the multiplication operator. That operator has y and z as its children. So let's say you have that tree loaded up into a
Starting point is 00:24:27 computer. How would you go about evaluating it? Now, there are a number of different approaches that are all perfectly valid, but recursion offers perhaps the cleanest solution. All you need to do is write a function called something like, I don't know, evaluate, that takes a tree structure as its only input. This function will attempt to run the operation in the root node. In our case, that's plus. To do so, it has to pass in the two arguments that branch off the tree. At the first level, that's x and multiplication. X is simple,
Starting point is 00:25:07 just program evaluate to grab the value of that variable. But what about the operator? What about multiply? That's also simple. Just call evaluate on that node. Doing so will get you the result of the multiplication, but you don't have to write any special code to handle sub-operations beyond a simple check and call. The point here is that recursion is a natural fit for tree data structures, and trees are just lists. So applying a little bit of our own logic here, recursion and lists go together, well, they go together like keyboards and spilt coffee. IPL didn't explicitly support recursion, but languages usually don't have to explicitly support this feature.
Starting point is 00:25:53 It's more common that the language just allows you to build a recursive function. Fortran, at least early versions of the language, wouldn't even let you compile recursive functions. It was able to check if you were trying to pull any tricks and stop you. So, by extension, FLPL couldn't handle recursion. One final factor made FLPL suboptimal, at least in McCarthy's opinion. In this strange language, not all functions returned values. Now, like recursion, this is a subtle point.
Starting point is 00:26:26 In the strict mathematical sense, a function has to return some value based off its inputs. That's the definition of a function. Some of the new functions introduced by FLPL either returned nothing of consequence or just operated on lists silently. Imagine, if you will, you're building some complex expression. It's composed of functions. In FLPL, you have to just remember which functions give back reasonable results. If you don't, then your expression won't work. It may not even compile.
Starting point is 00:27:07 The issue here is that FLPL was introducing some special cases. In real life, it can be good to be special. But when it comes to programming, the opposite is true. Ideally, a language should be consistent and always act as expected. You should never be surprised by your programming language. That's a really bad sign. You shouldn't have some set of special functions that you just have to remember. Everything should work the same way every time. So this was the general backdrop that McCarthy was operating within during 1956. There were tools being designed for programming AI. There were some really smart people tackling the general problem-thinking machines. But, at least as far as McCarthy was concerned, there was no ideal tool for modeling intelligence. There was a space
Starting point is 00:28:00 opening up where improvements could be made. Before winding back a little bit and talking about proto-Lisp, I want to take a second to make an important observation. When looking at the history of programming languages, it's common to place each language in some larger family tree. It's a quick and easy way to give context about a language. If you don't know much about ALGOL, you could look it up on one of these charts, see that it influenced C, and make some quick inferences about that older language. These are great for quick reference, but there are some issues. I mean, most language family trees don't include the obscure junk that I get into, but that's a separate gripe. My big issue with these trees is that they don't always show negative influences.
Starting point is 00:28:47 The chart will show when a language was based off an earlier language, when it borrowed features, when it took existing concepts and syntax. They don't show what's going on with McCarthy's relationship to IPL and Fortran. Lisp looks nothing like Fortran. It doesn't act like IPL. But both of those languages had huge impacts on how McCarthy would develop his own language. That impact was McCarthy disliked both Fortran and IPL, but that's still an influence. That's a kind of nuance that doesn't fit into a family tree. Maybe it needs its own kind of chart. I, for one, would really like to see a language tree with connections for the kind of anti-influence or negative influence or whatever you want to call this kind of connection. The point I'm trying to piece
Starting point is 00:29:35 together here is that Lisp would come out of this sea of Fortran, FLPL, and IPL. During 1957 and 1958, McCarthy was working primarily in Fortran. The whole time he was keeping notes, making changes, and trying to figure out a better tool for AI. If that's not proof that Lisp was influenced by Fortran, then I don't know what is. This period is also where I personally run into a bit of an annoying problem. The timeline here is really tight. In the space of a few years, McCarthy goes from the Dartmouth Workshop to the first version of Lisp. Primary sources here are mainly academic papers. So far, so good, that's my comfort zone. But this comes with a little idiosyncrasy that I don't think I've mentioned before. Often, it takes a while to finish a paper. You first have to, you know,
Starting point is 00:30:31 do the actual work, and then you write a paper. Then, after that, you send it for publishing. Depending on the outlet, this can take a variable amount of time. You may need to make edits or even changes. The result here is that some of the papers in the lead-up to Lisp weren't published until 1958, so that date stamps them actually past the canonical birthday of Lisp. You kind of have to line up papers with McCarthy's recollections and try to find drafts to be sure that everything's in its right place. This tripped me up when I first started hitting the books, so keep this in mind if you plan to go down this rabbit hole yourself. Our first actual glimpse of Lisp comes from a strange place.
Starting point is 00:31:18 In 1958, McCarthy wrote a paper describing a theoretical program called Advice Taker. paper describing a theoretical program called AdviceTaker. The paper was called Programs with Common Sense. On the surface, it would seem that McCarthy is writing on artificial intelligence, but this quickly gives way to a description of a new kind of language. As the title lays out, AdviceTaker was a program with a limited form of quote-unquote common sense. McCarthy describes it as an ability to make simple deductions, but there's a lot more to the program. Quoting from the paper, the basic program will draw immediate conclusions from a list of premises. These conclusions will be either declarative or imperative sentences. When an imperative sentence is deduced, the program takes a corresponding action. These actions may include printing sentences,
Starting point is 00:32:12 moving sentences on lists, and reinitiating the basic deduction process on these lists. I don't know about you, dear listener, but this is sounding a lot like an interpreter that takes a programming language as input. IPL and, to a different extent, FLPL were programs that ran, quote, deductive processes on lists, at least in a very loose sense. Now, there is a small detail here that I need to point out as early as possible. The fact that McCarthy is using imperative sentences to initiate action is crucial for what's to come. An imperative sentence is something like a command.
Starting point is 00:32:56 Jump, go left, give me your leather jacket. Those are all examples of imperative sentences. The verb here, jump, go, or give, comes first, followed by the rest of the sentence. Or, if we switch around our language a little bit to be more logical, we have an operator followed by arguments. Each operator can have a varying number of arguments. In some cases, there are no arguments, like jump. But in others, you can have a complex number of arguments. In some cases, there are no arguments, like jump, but in others, you can have a complex list of arguments. Alright, so we're starting to see something
Starting point is 00:33:33 like a programming language emerge in this paper. Ah, but this is just the beginning. Back to the literature, where McCarthy describes what makes AdviceTaker so special. Back to the literature, where McCarthy describes what makes advice takers so special. Quote, The main difference between it and other programs or proposed programs for manipulating formal languages is that in the previous programs, the formal system was the subject matter, but the heuristics were all embedded in the program. In this program, the procedures will be described as much as possible in the language itself, and, in particular, the heuristics are also described."
Starting point is 00:34:13 This is some 3D chess kind of stuff, so let me try to unpack what's important here. For AdviceTaker to work, McCarthy had to first define some restricted scope for his problem. You can't just say, I'm gonna make a computer that can understand English. You have to kind of tighten that down a little bit. He has to standardize on some formalized language that AdviceTaker will be programmed to understand. In the case of the earlier logic theorist, Newell, Simon, and Shaw chose formal symbolic logic. The choice of an existing framework for expressions gave the RAND team a bit of a jumpstart. Symbolic logic already had pre-established rules well before logic theorist was written.
Starting point is 00:34:59 McCarthy, mad lad that he was, decided that AdviceTaker would use a new linguistic framework. I'm not just using flowery language here. McCarthy really does explain AdviceTaker's new language in terms of linguistics. The stated reason for this undertaking is clear. AdviceTaker was intended to make general logical conclusions from simple English-like statements. There wasn't really a pre-existing framework that could do that. Symbolic logic is close, but it's nowhere near a normal human language. It's cold, inhuman, and if you spoke in symbolic logic exclusively, you might open some sort of portal. symbolic logic exclusively, you might open some sort of portal. The details of this advice-taker language are where we start to approach Lisp-ish territory. We could probably call this something
Starting point is 00:35:53 like proto-proto-Lisp. It's still a few versions out from the actual language, but I think it makes for an interesting vantage point. McCarthy describes everything in terms of functions, and those functions always come before their arguments. Now, that probably doesn't sound too strange. That sounds like any function in any programming language. You'll usually say something like function of x, y, or something similar. What's different here, or at least what strikes my ear a little differently, is how McCarthy formulates his examples in this paper. You see, in Advice Taker, these functions all act as imperatives, commands that, if followed, will describe some overall state. That means that in some cases, we get lines of pseudocode that sound,
Starting point is 00:36:47 well, a little off. For instance, one example in the paper shows the at function. This function tells AdviceTaker that something is located at some place. The syntax is given as at I comma desk. Now, I can't speak for everyone, but I certainly don't talk like that. I would never say at I desk. I would probably say something like I am at the desk. Even if you strip out the meaningless helper words, it would still be along the lines of I at desk. That's shorter, and it retains the meaning of the phrase. Hey, maybe I should start talking like that to save some time. Anyway, even in the shortened example, we can easily parse out the operator, for lack of a better term, and its arguments. At is what's doing the work here. I and desk are what need
Starting point is 00:37:47 to be connected somehow. McCarthy is shuffling around the expression, making it match an imperative sentence by putting the operator first, and then adding some parentheses and a comma to separate things out for the computer. All functions that McCarthy uses in example code in this paper follow the same pattern, even if translating it directly into a sentence doesn't make, you know, English sense. So why would he do this? What's the advantage? If AdviceTaker is supposed to be able to deal with some English-like language,
Starting point is 00:38:21 then why are verbs thrown at the front? like language, then why are verbs thrown at the front? According to McCarthy, this prefix notation sometimes he calls it infix in his writing, would be crucial to Lisp's coming success. It's a simplification that makes the language easier to parse for the computer. Now, this part can get a little complicated. Once again, these two episodes are pretty heavy on the CompSci 101 stuff. The premise we have to accept here is that human languages, for all of their usefulness, tend to suck. Languages are complicated, inconsistent, and rely on a host of annoying rules and exceptions. The same is true when it comes to the notation we use for mathematics, although perhaps to a lesser extent. An expression like 1 plus 1 is simple enough. At a glance,
Starting point is 00:39:13 you can see that, yeah, that's addition with two arguments. Simple. But what if an argument's negative? We might express that as minus 1 plus 1. Our meaty, very intelligent brains can easily see what that means. To us, this is still a pretty simple expression. But how would you programmatically describe it? Do we have two operations? Or is the negative part of the first argument? If we have two operations, then which one needs to be evaluated first? Or is the negative in front some kind of error in this notation?
Starting point is 00:39:52 Can we even evaluate this? We start getting into a list of considerations and checks needed to understand the expression's meaning. Once again, this is just for a very simple expression. Imagine applying this to a more complicated expression. Prefix notation allows us to skip over a lot of these questions. Not all of them, but a lot just fall away. The operator or the function or the verb, whatever you want to call it, always goes first in prefix notation. That's always followed by a list of arguments. The arguments might be functions themselves,
Starting point is 00:40:33 but those subfunctions still follow the same convention. Applying that logic to the example, we get something like plus one one, or plus negative one one. That looks a little strange, but it's much easier for a computer to parse. Your program can immediately pick out the function and its arguments. That means that you can program a system to understand that much easier. As McCarthy put it when explaining the early history of Lisp, As McCarthy put it when explaining the early history of Lisp, Prefix notation here is a play into a computer's strong suits. This also has the convenient side effect of making it easier to implement an interpreter or compiler for this
Starting point is 00:41:32 language. Is this a cynical and lazy ploy to save some time? Perhaps. That said, I'd say that this is a really smart choice. If you're making some new formalized language to use with computers, then why not tailor it around what a computer is good at? Does it really make sense to go through a lot of work just to make older, more familiar notation work on a computer? Does that save time in the long run? Maybe prefix notation is the objectively right way to go, and other languages just get it wrong. The final pre-lisp nugget that I want to touch on has more to do with the specific computer limitations McCarthy was working with. I think it's safe to say we've established that McCarthy has a keen eye for what a machine is capable of.
Starting point is 00:42:32 Roughly around the same time that AdviceTaker was being devised, McCarthy was filling in specifications for a new language. It's still not quite Lisp yet. It's not quite even real yet, but it's approaching something more impactful. Hardware limitations were an early concern, and that would impact McCarthy's design. The only computer he had access to, or was going to eventually have access to, was the IBM 704 at the New England Computing Center. Even once installed, access would be limited, so McCarthy was going with the pen and paper approach to designing a new programming language. The central feature was, of course, list processing. It follows nicely that the first hurdle, then, was figuring out how to handle linked lists on the 704. We're still talking about the late 1950s here. This is a pretty early period for computers. Machines really haven't hit their
Starting point is 00:43:25 stride yet, at least I wouldn't say they've hit their stride yet. The 704 is a wonderful example. This is a different kind of machine, at least to our modern notions of how a computer should be. It had a 16-bit address space, meaning that memory was addressed using a 16-bit long number. meaning that memory was addressed using a 16-bit long number. Each address in memory held a 36-bit wide number. The computer's registers, its storage for immediate use, came in a few different sizes. Some were 15-bit, one was 36, and one was 38.
Starting point is 00:44:01 If that sounds weird to you, then you aren't alone. It weirds me out a little bit too. So here's a fun question. If each location in memory is 36 bits wide, but memory addresses are only 16 bits, how do you store a pointer? How do you keep track of a location in memory? The docs on FLPL, may its bones rot in the earth, actually give us a pretty good answer to this conundrum. Each location in memory is further subdivided, so you can pack two pointers into each memory cell, plus some extra space for flags and tags and such. McCarthy went with a similar model for his pen and paper LISP. The 704 even provided functionality for this type of subdivision. Plus, McCarthy was at least consulting on the project that led to FLPL,
Starting point is 00:44:53 so there's definitely some crossover going on there. We gotta remember that LISP is influenced by FORTRAN no matter how we look at it. So each memory cell was broken up into two segments. One was called the decrement, and the other was called the address. The naming convention here comes from how the particular IBM machine handled addressing data. An effective address was calculated by subtracting the decrement from some base address. That's neat, but it's not really relevant here. We just need to know the two names for this next part. Lisp's fundamental unit of data is called a cons cell, C-O-N-S.
Starting point is 00:45:31 It's a single element of a linked list. As we know from IPL already, each element has to have two parts, a pointer to the next item in the list, and then the actual data. to the next item in the list, and then the actual data. On the 704, the decrement part of memory was used as the pointer to the next list element. That left the so-called address part of the memory cell for storing data. This data could be in the form of a numeric value or a pointer off to some other list or chunk of data in memory. That's all well and good, but you need more than a fancy memory layout to create a useful language. You need a way to deal with that fancy memory layout, some way to access and manipulate data. To deal with this, McCarthy planned a set of
Starting point is 00:46:20 simple instructions, sometimes called primitives. In this initial set are two that end up sticking around and are of particular interest. These are contents of address register, or CAR, and contents of decrement register, or CDR, sometimes pronounced COODER. I know, it's kind of rough. In practice, car returns the first element of a list, and couldr returns all subsequent elements. The final fundamental piece of Lisp that was laid out in this period was cons, short for construct. Now, nomenclature-wise, this is a bit of a complication. A single element of a list in Lisp is often called a cons element. But cons is also the name of the function used to create and link elements.
Starting point is 00:47:15 It's kind of like how an orange is a fruit but also a color, and the fruit happens to be described by the color. Anyway, these three fundamental functions, car, cutter, and cons, will go on to form a big part of the basis of Lisp. Fundamentally, this is a pretty elegant system. By chaining these functions together, you can construct, manipulate, and access Lisp data any way you want. It's the kind of cool and well-thought-out design that's emblematic of McCarthy's work. You could look at it as highly abstracted. It's definitely way higher level than contemporaries like Fortran or IPL. But by the same token, this early lisp was heavily influenced
Starting point is 00:47:59 by the hardware McCarthy had access to. Prefix notation, as discussed, is one example. That was chosen partially because it makes for a language that computers can easily parse. We can also see this in car and cutter. Those functions will stick as fundamental to Lisp, but we can see this machine-dependent anachronism right in their names. In a lot of ways, McCarthy was balancing between this high-minded abstraction and machine-tied reality. If you've read ahead, or if you're familiar with hacker lore, then you may have been a little bit confused up to this point. Lisp is heavily associated with MIT, specifically the AI Lab. That group was an incubator for a lot of artificial intelligence research, hacker culture, and, of course, Lisp. So far, we've been following
Starting point is 00:48:53 McCarthy's adventures at Dartmouth, and to a lesser extent, working with IBM. Well, don't worry, because we've hit a transition point. In the fall of 1958, McCarthy started teaching at MIT. Shortly after, he and Marvin Minsky, another professor, started the AI Lab. This is where Lisp would move from a strictly theoretical exercise to actual code. At MIT, McCarthy had easier access to computer time and, likely using the foundation of the AI lab's proof, had more institutional backing than ever before. Now, this wasn't some one-and-done kind of process.
Starting point is 00:49:36 McCarthy and his new MIT colleagues approached a full implementation of Lisp in really surprisingly small steps. At the time, it was very difficult to write an entire compiler. That's still true today, actually. So Lisp was first implemented as a library of functions. A handwritten Lisp program could be converted by a human into a series of library calls. This was less of a full solution and more a way to start experimenting with the language without too much upfront work. 1958 is usually cited as the birth date for Lisp. I'm kind of of two minds about this. Lisp is definitely being worked on in 1958.
Starting point is 00:50:23 The first attempts at an implementation do start to occur in this year, but this isn't really a recognizable version of the language. Lisp actually goes through a number of permutations before it reaches some well-known form. In this period, McCarthy describes Lisp as a mishmash of ideas. We're out of the fully pen and paper era, so maybe we should call this the pen and machine code version. For the differences here to really make sense, we have to take a quick detour and talk about what Lisp looks like today. The most well-known feature of Lisp is its unique syntax. That is, Lisp and Lisp-related languages have a really particular look to them. You can identify them right away.
Starting point is 00:51:12 Every function column Lisp is surrounded by a set of parentheses. Now, this is in contrast to most other languages where function names are put outside parentheses. That might sound like a little silly, maybe even unimportant change, but it does have a profound effect when considered alongside the rest of the language. It gives a really unique coat of paint to this programming language. All functions, be they simply addition, car and cutter, or even some user-defined subroutine, follow this pattern. Parenthesis, function name, argument, argument, argument, and on,
Starting point is 00:51:53 and then close parenthesis. And the language is designed in such a way that you need to chain functions in order to accomplish anything major. So you end up with deeply nested function calls. By the end, you might be four or five layers deep, with matching closed parentheses on the other end. This set of characters having open and closed parentheses everywhere is so ubiquitous in the language that some claim Lisp actually stands for lost in a sea of parentheses. One of the neat upshots here is that Lisp's syntax
Starting point is 00:52:27 is remarkably consistent and clear, at least once you get used to it. The point here is that Lisp has a very particular flavor. It's instantly recognizable. So you'd assume this syntax was in place early on, right? Well, no, not so much. The wild Lisp ride starts all the way back with Advice Taker. The pseudolanguage described in that paper didn't use parentheses in anything close to how Lisp would later use that character. It used a style called M-notation or M-expressions. As near as I can tell, this was a McCarthy original, at least the naming convention was. Advice takers sported function calls that started with a function name,
Starting point is 00:53:14 followed by arguments enclosed within parentheses. Arguments were separated by commas. In later papers, McCarthy would change this slightly, replacing the open parentheses and closed parentheses with matching square brackets. Now, as McCarthy explains, early Lisp experiments expanded and mashed up the syntax. Quote, The programs to be hand-compiled were written in an informal notation called M-expressions, intended to resemble Fortran as much as possible.
Starting point is 00:53:46 M-expressions, intended to resemble Fortran as much as possible. Besides Fortran-like assignment statements and gotos, the language allowed conditional expressions and the basic functions of Lisp. The M-notation also used brackets instead of parentheses to enclose arguments of functions in order to reserve parentheses for Lisp structure constants." So So sure, we can say Lisp started in 1958, but it looked totally different. To me, the most important difference is that these early versions lacked syntactic consistency. This is something that shows up in a lot of languages, so it's not really a bad thing, it's just weird to see in early versions of Lisp. Fortran-like assignments here just means lines like x equals 1, that's the usual fare that
Starting point is 00:54:35 any programmer is familiar with. Functions were called using the previously explained method and name, followed by its arguments inside square brackets. Lists, however, were by its arguments inside square brackets. Lists, however, were defined using parentheses and commas. Do you see the pattern here, or rather, the lack thereof? I just described three different tasks you could carry out in proto-Lisp, and three different styles of syntax. That's just how most languages work. At the time, the only real reference was Fortran,
Starting point is 00:55:11 and that's just how Fortran worked. However, that doesn't mean it's the best way to go about things. It's a level of inconsistency that could be seen as acceptable, but could also cause its own problems. Two of the key hallmarks of Lisp are its consistency and its simplicity. It really is a language stripped down to its purest expression. If you were to show someone a snippet of 1950s-era Lisp, they probably couldn't identify it with the later language. A big part of that is Lisp's early syntactic complexity. We can see this complexity in full bloom in the first paper
Starting point is 00:55:55 fully dedicated to the language. The actual paper is published in 1960 with the riveting title Recursive Functions of Symbolic Expressions and Their Computation by Machine. But in full transparency, I'm not working off that specific version of the paper. I'm going to be pulling from an earlier 1959 draft. That draft is simply called AI Memo 8. Once again, this is a case of pub dates being later than the actual work, so I want to be as close to the work as possible. Anyway, Memo8 gives us our first concrete description of this whole Lisp thing. This is also our first glimpse of a new syntax, the S-expression. This is the full parenthesis-laden notation that Lisp will be known for,
Starting point is 00:56:46 but in this early work, S-expressions are only used to define data. That's important because it means that in MIMO 8 Lisp, code and data are defined using different syntax. Functions, definitions, and the rest of the language still rely heavily on M expressions. It's just the actual representation of data that uses S expressions. The entire memo is pretty darn dense. It reads almost like a math paper, which I guess I shouldn't be surprised at, given that McCarthy's background is in mathematics. And while the syntax and naming conventions may be
Starting point is 00:57:25 different, memo 8 is still describing Lisp. You just have to look a little bit deeper. Beyond the core functions we've talked about, and the data representation, there are a few points that we need to examine. First, this early version of Lisp supports recursion. It's even right in the paper's full title. Now, this isn't an explicit feature. Lisp just allows you to write recursive functions. All you have to do is define a function that references itself by name. In general, Memo 8 fleshes out a lot of details about functions that were missing in the earlier shreds of Lisp. Another major piece to this is the ability to treat functions as variables. In other words, the ability to treat code and data interchangeably. Just as
Starting point is 00:58:14 with recursion, this feature works just by virtue of how Lisp is designed. The language does not impose any limitations on what a variable can or can't be. So, thanks to that, Lisp can just treat functions like data, no problem at all. All that is interesting, and there is some more detail on the language described in memo 8, but I'm going to leave that as an exercise for the reader. I don't want to explain too much more syntax and audio form. The real meat to this paper comes once we reach the section on computability. This is where McCarthy shoots the moon, so to speak.
Starting point is 00:58:53 Quote, In this section, we shall show that all functions computable by Turing machine are expressible as S-functions. If, as we contend, S-functions are a more suitable device for developing a theory of computability than Turing machines, the proof in this section is out of place and should be replaced by a plausibility argument similar to what is called Turing's thesis." This is where my eyes lit up. I had been reading references to this part since I started looking into Lisp. For me, this was a big payoff. Throughout his writings, McCarthy takes kind of cheap shots at Turing's model of computing.
Starting point is 00:59:37 McCarthy contended that the Turing machine, the main theoretical construct used to describe a computer, is clunky. There's just not a better way to put it. The main argument McCarthy makes comes down to the fact that the Turing machine model does not play well with functions. Turing's model uses an infinite tape of cells and a head. The head can move between cells, read the state of a cell, and change values according to some list of instructions. It can also change what it's executing depending on the value of certain cells. For some specific set of instructions, it can be shown that this machine can compute anything computable. This is basically where we started last episode.
Starting point is 01:00:27 where we started last episode. McCarthy's issue here is that to assert a proof using Turing's model, you have to work with instructions. You have to go literally bit by bit. McCarthy, math nerd that he was, found this to be an inelegant solution. That's just a nice way to say that McCarthy didn't like the Turing machine. Instead, he thought that computability could be better defined and expressed using functions. The proof here is, well, it's really complicated. It requires some wicked code, and I'd be lying if I said I fully understood it myself. But there's something really important going on. I fully understood it myself. But there's something really important going on. The actual proof consists of a Lisp program, written in the awkward old-style M expressions, that can simulate a Turing machine. Whatever McCarthy's misgivings, this is one of the cool parts of Turing's model.
Starting point is 01:01:19 A Turing machine is able to simulate any other machine, so if you can simulate a Turing machine, that means your system can do the same. It's kind of like a proof backdoor. The cool part, the Lisp-ish part, is how McCarthy represents the Turing machine. He uses S-expressions. Now, maybe that shouldn't come as a surprise given the framework I've described. Now, maybe that shouldn't come as a surprise given the framework I've described. The simulated machine has to be represented using some kind of data, and this early Lisp uses S-expressions exclusively to represent data.
Starting point is 01:01:54 Maybe not simple, but McCarthy's hand is kind of forced. Now, are you ready for the mind-bending part. The proof that McCarthy provides is written in M-expressions, and it evaluates S-expressions. Once again, that may sound kind of mundane and boring, but for me, this was a big payoff, and it took me a bit to get the implications. I've read this paper more than once, I will say. get the implications. I've read this paper more than once, I will say. The Lisp proof is showing that S-expressions can be used to compute anything. The fancy phrase to use here is that S-expressions
Starting point is 01:02:33 are Turing-complete. In memo 8 style Lisp, S-expressions are purely for data, but McCarthy is doing a sleight of hand to execute them or evaluate them like their code. There's definitely some layers here, it's getting to be a pretty thick onion, so let me try and condense what's important. McCarthy has shown that Lisp can be self-hosting. So Lisp, even in this nascent state, is powerful enough to fully describe itself. This is a huge deal for two big reasons. One, this means that Lisp, in theory, is portable. This point is more of a side benefit, but I think it's still huge. Self-hosting is how C will eventually become the portable language of choice.
Starting point is 01:03:24 I don't want to get super deep into this point because portability is a huge topic, but I have to at least give this a passing reference because I think it's really fascinating that Lisp could just do this by virtue of it being Lisp. Point two, Lisp is fully introspective. Lisp can safely and pretty sanely talk about itself and modify itself. Put another way, a Lisp program can be adaptive. This alone, even dropping everything else, makes Lisp a fantastic fit for artificial intelligence. A Lisp program can write another Lisp program just because of how it's designed. In this proof, there's something else that I really like. McCarthy was getting close to putting
Starting point is 01:04:14 the finishing touches on his language, but he was still missing something. When AI Memo 8 is written, there's still no way to actually run Lisp. All experiments were done by hand. But this would change shortly thereafter. In either late 1959 or early 1960, I can't find the exact date here, Steve Russell wrote the first interpreter for Lisp. This got around the complication of a compiler. You see, a full compiler will convert source code into the bits of machine code a computer can actually execute. Compilers tend to be really hard to program. An interpreter, by contrast, doesn't go quite that far.
Starting point is 01:04:56 Interpreters will read in source code and then decide how to execute those statements. They don't actually produce machine code. They just sit between your source code and the actual machine. In general, this is an easier kind of program to write. Russell worked off part of AI MIMO 8. Specifically, the MIMO closes with a so-called universal S-function. This was a program that could evaluate S-functions, written, of course, in m-functions. So the theoretical work, at least most of it, was already done by McCarthy. I mean, that's basically an interpreter right there.
Starting point is 01:05:35 Russell just had to convert that into assembly language and make some adjustments. But Russell would go a step further. But Russell would go a step further. McCarthy's formulation defined this universal function called eval in terms of M expressions. This might not be a big reveal, but the M in M expression stands for meta. McCarthy saw this as a meta language that could be used to define the S expression parts of Lisp. That's neat and all, but it's kind of gross. It's kind of overly complicated. There is also one other big issue with M-expressions. They used square brackets. And the IBM 704 didn't actually support square brackets. It only had a limited set of characters that it could encode. So when Russell actually
Starting point is 01:06:26 sat down and started hammering out his interpreter, he just dropped any support for M expressions. The first Lisp implementation only handled S expressions. That simplification led us into a recognizable form of Lisp. From then on, Lisp was the syntactically consistent, elegant, and refined language we all know and love. Alright, that brings us to the end of our Lisp adventure. We haven't gone far into the adoption and actual use of Lisp, but I'm going to leave that off for another time. I think one month of linked lists and programming language analysis is enough for a little bit. So, where do we stand on my initial questions?
Starting point is 01:07:16 Why are LISPs so important for AI? And why is LISP the, quote, mother tongue of artificial intelligence? And why is Lisp the, quote, mother tongue of artificial intelligence? After two episodes worth of research, I think these are pretty easy questions to tackle. First off, it's linked lists specifically that are important for AI. Their flexibility allows for dynamic data, which is what early AI was all about. Perhaps more importantly, linked lists can be used to construct trees, which were crucial to a lot of early AI software. As for the question of Lisp, I think I finally have the connection in my head. Lisp was designed for AI from the beginning. It was a response to
Starting point is 01:08:01 the inadequacies of Fortran, FLPL, and IPL. McCarthy built his language to have all the Lisp processing you could ever need, plus the flexibility required for recursion and introspection. No other language at the time could do that. IPL, kind of, but it was difficult to use. Really, Lisp was the best tool for the job because it was designed for the job. And beyond that, Lisp is just a well-designed language. I definitely have a newfound respect for its elegance. And hey, maybe it's time for me to brush up on it myself. Thanks for listening to Advent of Computing. I'll be back in two weeks' time with another piece of computing's past.
Starting point is 01:08:45 And hey, if you like the show, there are now a few ways you can support it. If you know someone else who'd be interested in the history of computing, then why not take a minute to share the show with them? You can also rate and review on Apple Podcasts. And if you want to be a super fan, you can support the show directly
Starting point is 01:08:59 through Advent of Computing merch or signing up as a patron on Patreon. Patrons get early access to episodes, polls for the direction of the show, and bonus content. You can find links to everything on my website, adventofcomputing.com. If you have any comments or suggestions for a future episode,
Starting point is 01:09:16 then go ahead and shoot me a tweet. I'm at Advent of Comp on Twitter. And as always, have a great rest of your day.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.