Advent of Computing - Episode 73 - IPL, AI, and Linked Lists

Episode Date: January 10, 2022

I'll let you in on a secret: I've never understood why LISP is so closely associated with artificial intelligence. I've decided to fix this. In this episode, and the next, I'm tracing the early roots... of AI and why list processing is important in the field. This episode we dive into the Information Processing Language, a strange programming language that predates LISP . Along the way we discuss the origin of linked lists, chess playing machines, and a program that could solve logic proofs. Selected Sources: http://bitsavers.org/pdf/rand/ipl/P-620_The_Chess_Machine_Dec54.pdf - The Chess Machine https://www.rand.org/content/dam/rand/pubs/papers/2008/P1929.pdf - IPL V introduction http://shelf1.library.cmu.edu/IMLS/MindModels/logictheorymachine.pdf - Logic Theorist

Transcript
Discussion (0)
Starting point is 00:00:00 The modern general-purpose computer can be characterized as the embodiment of a three-point philosophy. 1. There shall exist a way to compute anything computable. 2. The computer shall be so fast that it does not matter how complicated the way is. And 3. Man shall be so intelligent that he will be able to discern the way and instruct the computer. End quote. Coming straight out of a paper written at Rand in 1954, this offers up a pretty rosy view of computing. The passage was used as the introduction to a paper on a chess-playing program written
Starting point is 00:00:40 by Alan Newell. It may sound optimistic, but just a few lines later, Newell is already drawing bounds around this optimistic view. Really, can you blame him for that? I think a lot of people were in a similar boat. Computers, especially in their early days, offered almost limitless potential. But limits were quickly found. What exactly is computable? What kind of problems count? What kind of problems can even be reduced to a set of instructions? Are humans even smart enough to handle one of these new computer things? Speaking as a consummate professional myself, I can tell you that I'm not always the most intelligent when it comes to discerning the digital way.
Starting point is 00:01:30 Accepting these limitations may seem a little frustrating, at least at first. However, I think that frustration opens up a whole new world of possibilities. It gives us a new avenue to go down. Namely, how can we sneak around these imposed limits? How can we change the definition of what is computable? One approach, one that Newell and others would pioneer, was to trick computers into thinking a little more like humans. Welcome back to Advent of Computing.
Starting point is 00:02:13 I'm your host, Sean Haas, and this is Episode 73, IPL, AI, and Linked Lists. Today, we're starting on a bit of a journey. I've been wanting to talk about artificial intelligence broadly, and Lisp more specifically. Now, I have a bit of a history with the language myself. Lisp was actually my first programming language. Admittedly, that was a dangerous choice, but it was what I had on hand. I learned Common Lisp out of an old textbook and programmed on a partly working AT clone my dad had lying around. From there, my relationship with computers has only become more complicated. Now, there's a lot to be said about Lisp. It's been called subtle and an elegant language. There's a lot of neat implicitness to it. I think it's one of those things that takes time and
Starting point is 00:02:59 understanding to appreciate. I know it's taken me a while to come around to it. That said, I'm still no good at programming Lisp, mind you, but I've started to see more of the beauty in its parentheses as I grow older. But there's one crucial thing about Lisp that I've never been able to fully understand. Lisp is also known as, quote, AI's mother tongue. In fact, those are the first words used to define Lisp in the jargon file. Now, I personally subscribe to what I call a Hopperian view of programming languages, if I can coin a phrase. That is, I share a similar view of programming language diversity espoused by Rear Admiral Grace Hopper. In her opinion, it was impossible to have one universal language for all tasks. Instead, Hopper argued for many multiple specialized languages,
Starting point is 00:03:54 each suited to some particular task. Each of these languages would have to be constructed specifically for its own niche. We can see this borne out in many languages. Fortran, a language meant for math and engineering, has powerful facilities for handling equations. It's fast, it can do matrix math, and latter versions of Fortran have great support for multiple processors. Cobalt, may it rest in peace and never be heard from again, is meant for business, so it's basically a really fancy overpowered spreadsheet. C was designed for portable system programming, so it offers platform-agnostic ways to manipulate memory. That's enough languages to establish a pretty ironclad pattern here. Now, Lisp was designed for programming artificial intelligence systems,
Starting point is 00:04:46 so it supports lists. Now, I don't know if this is just a me problem, but I've never understood why support for lists is so crucial for artificial intelligence. Don't misunderstand me. Lists are a really important data structure. They're up there on my list of favorite data structures. I couldn't have a list of favorite data structures without them. There's just something I'm missing, and I'm really hoping that I'm not the only one out there. Otherwise, I'll feel a little bit silly. This episode and next episode are going to chronicle my adventure trying to understand why Lisp, Lists, and AI go together. Specifically, I want to figure out what's so special about the Lisp and Lisp. To do that, we need to examine IPL, the Information Processing
Starting point is 00:05:34 Language. This was a predecessor to Lisp, although maybe Lisp isn't its direct descendant. My plan here is to outline the state of things before Lisp hit the scene, establish what's going on with these Lisp things in general, and set the groundwork for next episode where we'll try to figure out why Lisp is so suited to artificial intelligence. The term artificial intelligence was coined in the summer of 1956 at a workshop located at Dartmouth College. the summer of 1956 at a workshop located at Dartmouth College. This isn't the real birth of AI, just the point in time when it got a convenient name. The idea of a thinking machine had been in the air for quite some time, and experiments into the technology had already shown promise by 56.
Starting point is 00:06:19 We can thank Alan Turing for at least some of this, and strangely enough here, I don't mean the so-called Turing test. For those of you not in the know, the Turing test is an experiment proposed by its namesake where a computer would communicate with a human via terminal. The human would then try and guess if they were talking to a computer or another human. To pass the test, the computer needed to trick the human. That's fun, but perhaps not fundamental. Here, I'm talking about the Church-Turing thesis. In other words, the model
Starting point is 00:06:53 that describes how all computers can function. In very broad terms, a computer will be able to perform a calculable operation. Now, there are caveats as far as power goes. There are issues as far as what is considered calculable. But the idea here is that if you can reduce a problem to steps, then a computer should be able to handle it for you, as long as the computer is so-called Turing-complete or can emulate a Turing machine. The obvious application is math, but that's not the only application here. If thinking could be described as a series of steps, then that could in theory be fed into a computer. This may be a really complicated problem, but here's the cool result.
Starting point is 00:07:42 If we look at this thesis the right way, problem, but here's the cool result. If we look at this thesis the right way, then a computer should be able to think, whatever that actually means. The cool part here that's easy to just gloss over is this actually gives us a possible path towards a thinking machine. We have a restricted scope. Sure, the scope is still huge. It's defining what thinking is and then reducing it to a series of steps. That's a lot of work, but that is a restriction in scope. We have a more narrow problem that we should be able to solve, at least in theory. Now, a growing number of researchers in this period, we're talking late 1940s, early 50s, believed that human thought could, in fact, be reduced down to a series of steps. This initial cohort was a strange mix of
Starting point is 00:08:32 psychologists, early computer scientists, and at least one political scientist. It's in this mix that we find Alan Newell. In 1954, Newell was oscillating between completing his PhD in psychology and working as a researcher at RAND Corp. While at RAND, he became involved with a series of increasingly complicated simulation studies. Specifically, Newell was part of a research group studying how humans handled complex tasks. The group had set up a simulated air defense control center. It's complete with radar readouts and realistic data. Honestly, it sounds like it would have been a fascinating study to put together. Rand had groups of subjects working in this fake control center in eight-hour shifts, and all information was passed in by researchers or over these simulated radar displays. Newell was the person
Starting point is 00:09:26 on the ones and twos, so to speak. He was in charge of handling the fake radar data, which was driven by a computer. The whole point of the experiment was to study, in a controlled environment, how a team of people would respond to increasingly complex data. The test group had to be as realistic as possible, and the test set of data had to be as realistic as possible, and the test set of data had to be as realistic as possible. Newell's solution was to start off with actual live air traffic control data. Initially, the simulated radar screen displayed a mirror image of what actual radar operators were seeing in a nearby control tower. That was a good baseline. were seeing in a nearby control tower. That was a good baseline. Newell then programmed the radar displays so that he could tweak the live data. More planes could be added, enemy aircraft could
Starting point is 00:10:13 be thrown in. Over time, more and more complicated situations could be displayed. All the while, the RAND research team sat back and watched. The specific application here is pretty neat, but it's not what's important. Newell was doing something relatively new here. At this point, circa 1954, computers were by and large fancy calculators, at least that's how they were often viewed. They crunched numbers, processed data, and maybe solved equations. But Newell's fake radar displays weren't cast from this exact mold. This was non-numeric computing, as Newell put it. He wasn't the only person doing this kind of programming, but he had inadvertently found himself in some pretty rarefied air. This psychology research project, with a side of non-numeric programming,
Starting point is 00:11:07 set up Newell for a profound realization. One day in 54, while at RAND, he met someone named Oliver Selfridge. Newell would later describe this event as a conversion experience. What Oliver Selfridge does when he comes down is to talk about a little crazy machine called the MTC, the Memory Test Computer, which was hardly a machine worth talking about, but that they were playing with. They had done a little pattern recognition task on it, and that just made it clear exactly what was going on as far as I was concerned. I mean, absolutely clear. End quote. far as I was concerned. I mean, absolutely clear. End quote. This is also one of those fun cases where we have both sides of the story and they seem to match up. In an interview, Selfridge said of this encounter, quote, I talked about pattern recognition and Alan and I hit it off and this was really the trigger. I mean, he was ready for it. He got the whole idea. End quote. Selfridge had been working on a computer system that could recognize simple patterns.
Starting point is 00:12:10 This wasn't just programming in some large set of reference images. Selfridge's machine was thinking. Well, at least in the broadest sense of the word. He had designed a program based off theories of how human perception worked. This would really mess up Newell something fierce. He realized that Selfridge's pattern matching system was, essentially, a step beyond the behavioral work he had been carrying out at RAND. Newell was using a computer to observe how humans adapted to stimuli. But here, Selfridge was taking the next leap. He was modeling those findings inside a machine. He was making a
Starting point is 00:12:53 computer that could behave, at least in a limited sense, like a human. From there, Newell was hooked. This new thing didn't even have a name yet, but Newell knew he wanted to be involved. Quote, I went home that weekend. I designed a whole system like that for our air defense, for understanding the people doing the air defense direction center, and I never looked back. And it really did happen just like that. I was all primed for it, literally primed for it, but the conversion to deciding that that was it, and that I knew exactly how it was going to happen, happened mid-November 1954. But you can see all the stuff that prepared for it. Conversion experiences are all the same." End quote.
Starting point is 00:13:37 The conversion here was quick. However, it did take some time for Newell to start writing his own thinking programs. As near as I can tell, this first weekend jaunt was more of an exercise. It was Newell trying on some new ideas for size. It didn't take long for Newell to move from one exercise to a much more realized project. All it would take was the right application to focus Newell's newfound vigor. Luckily, one was close at hand. Chess. AI chess, or whatever you'd call it prior to AI being named, already had its own history by the time Newell took up the challenge. In fact, this takes us full circle already. As early as 1948,
Starting point is 00:14:21 Alan Turing and David Chaperone were trying to program a computer to play chess against a human opponent. They called their algorithm Turochamp after both of their names. The approach involved assigning values to each board state, projecting forward to generate a set of all possible results from a certain move, and then using state values to choose the quote-unquote best move. The final program was rendered useless by, well, the state of computing in 1948, but the theory was somewhat sound. In the next few years, other researchers would turn their eye to the same problem, and by 1954, Newell was the latest victim. In December, Newell would put his ideas
Starting point is 00:15:07 to print. His first paper on the subject was titled The Chess Machine, an example of dealing with a complex task by adaptation. Now, before we get into the paper, we need to deal with the matter of language here. We're at a pretty early phase in the development of computer science as a discipline. That goes for AI too, if you haven't noticed. Newell is using quote-unquote machine to mean computer and program, but sometimes he uses machine to just mean hardware, other times just the program. It's something that tripped me up a little, so I figure it could stand to mention. Anyway, the chess machine is where we get our first glimpse of lists. The paper opens up with the quote I used to start this episode, basically saying that computers offer unparalleled power.
Starting point is 00:15:59 However, Newell immediately puts caveats in place. He starts describing situations where the idea of computable answers start to break down. To pull from Newell's list of doom, computers reach the end of their usefulness when, quote, 1. The relevant information is inexhaustible. 2. The set of potential solutions is neither enumerable nor simply representable. 3. The processing to Remember, we can use the Church-Turing thesis to show that a computer should be able to solve almost any problem, but that's assuming we have limitless resources and some kind of bounds on our input. What if the problem you're trying to solve has infinite inputs? What if inputs never stop coming in? And what if you can't calculate your results very fast? This is where we reach the fringe of what's digitally possible. Take chess, for instance. What would the best possible strategy
Starting point is 00:17:13 be for determining moves? It would probably be something close to Turochamp. You want to look at the current board state, then project out every possible subsequent state that could occur. Then you need to look through those states for every branch that ends in your victory. If there are multiple paths, then pick the fastest route to success, or somehow determine the best route to success. Instead of thinking one or two moves ahead, the program would be thinking every possible move ahead. Easy peasy, if you know everything that could happen, it should be simple to win chess. Of course, doing this the brute force way leads to its own issues. An unintelligent program would have to calculate every possible
Starting point is 00:17:58 permutation. That's a pretty exhaustive search. How many calculations would it take for each move? I'm not going to do the math here, but it has to be some ridiculously large number. That's a pretty exhaustive search. How many calculations would it take for each move? I'm not going to do the math here, but it has to be some ridiculously large number. So time becomes a concern. That's number four on the list of doom. What happens if the program chooses a path, then the human makes a move that closes that path off? Let's say the machine chose a route that would require taking a pawn in the next two turns,
Starting point is 00:18:31 but the human player moves that piece out of danger. That sounds a lot like issue number three on Newell's list. The computer wasn't able to generate the correct result, since the result depended on processing that had not yet occurred. The net result is that the computer loses a game of chess, so not the end of the world, but I think this is an illustrative point. Here's what makes chess such an interesting problem, and really what makes it so well suited to early AI as an exercise. It's a well-bounded problem. In other words, the input data is finite, and the possible states are also finite.
Starting point is 00:19:06 Sure, those finite numbers may be pretty large, but that is a bound. The relevant information is, in fact, exhaustible, and the set of solutions is very numerable. So chess only runs into 50% of Newell's list of computer doom. Sounds like a pretty good warm-up exercise. Newell would take an approach similar to Turochamp, but that was just the base for something new. You see, the trick was to add in a little bit of smarts. In general, Turochamp was a pretty fair algorithm. By that, I mean it wasn't doing anything particularly tricky.
Starting point is 00:19:47 It's all relatively brute force calculations. Newell's chess machine wasn't that straightforward. It was a lot closer to something intelligent. The big shift here was the element of choice, or rather, the simulation. Turochamp had a small amount of choices built in, but Newell would take that to a more complicated level. His chess machine followed the basic format of calculating all possible results for a set of possible moves, but didn't always follow branches all the way to checkmate. Instead, Newell incorporated this idea of a sub-goal.
Starting point is 00:20:25 The basic plan here is that you could limit the amount of moves calculated by stopping short at certain overall game goals. For instance, the computer could decide to limit its search to moves that allowed it to capture a rook. Those sub-goals could then have their own sub-goals, so you end up building something closer to a strategy, something a lot closer to how a human would think about goals. Instead of looking at every possible move, you start to get a chain of goals. Say, take out the human's rook, grab some pawns along the way, and eventually remove the queen in heading towards checkmate. It's a more abstracted approach to describing a chess strategy. Newell's chess machine takes this even a step further. In addition to modeling its
Starting point is 00:21:12 own strategy, the program models its opponent's strategy, or at least it tries to guess what the opponent is likely to do. The idea here is that the computer can factor in predictions about the human player. It's trying to get into the opponent's head, at least in a limited, digital kind of way. The algorithm here seems pretty neat, but for me, the specifics are a sideshow. The real deal all comes down to data structures. Chess is cool and all if you're a nerd, but data structures, that's the real cool stuff. Anyway, for this particular application, lists end up playing a really big role. Now, to be 100% clear, I'm working off Newell's paper here and not actual source code. As far as I can tell, we don't get actual
Starting point is 00:22:06 source code until maybe a lot later, but I don't think that's going to be super relevant to the overall discussion. We'll address that down the line. To start with, Newell used a list to represent game states. This part isn't super deep into the whole AI aspect, but I do like the elegance here. Newell was using a list of each chess piece, where each element was the piece and its location on the board. I just like the neat efficiency there. It saves you from having to traverse a grid of data. In general, a one-dimensional list is easier to deal with a two-dimensional list. That's neat, but we can go a little deeper. The chess machine also used lists to manage what Newell called continuations. These are the possible future board states that I keep talking about.
Starting point is 00:22:59 Now, Newell isn't just using an ordinary list. Well, he is, but it's a step beyond an everyday list. Newell suggested the use of a tree. And this, dear listener, is where one of the big missing puzzle pieces clicked into place for me. I guess this is where I should throw in a quick sidebar about data structures. Remember, we're in cool kid town talking about data structures, not chess. So, welcome back to Computer Science 101. A list is, well, it's just a list. You have a bunch of elements that come one after the other, all stored in some kind of order,
Starting point is 00:23:40 or at least accessible in some kind of order. You can grab an element by its index in that list. Theoretically speaking, those elements can be anything you want. In practice, there are usually restrictions, thanks to computers sometimes being, well, being kind of bad. But in theory, you can make a list out of anything. A two-dimensional list is a special case where you can construct something like a grid, or a chessboard. Let's say you have a list with five elements. Then each of those elements is itself a five-item long list, and you have a five-by-five grid. You can access each point on the grid using two indexes, one for the outer list and one for the inner list.
Starting point is 00:24:27 using two indexes, one for the outer list and one for the inner list. But we can get more wild with it. We can make a whole tree. This is another special case where each element fits into a larger hierarchy. Every element has what's called a parent that contains it, and optionally can have its own children. In the chess machine example, Newell constructed a tree that used continuations to form branches. Following along one branch could lead you down towards a goal, like capturing a rook. That branch may have its own sub-branches for different outcomes. The chess machine might predict a chance of the human moving the rook. That would sprout off a new branch on the tree. To further complicate things and push us more towards nuanced thinking, Newell describes each branch in the tree as having an
Starting point is 00:25:12 associated probability. This is most important for branches related to opponent decisions. The machine might calculate that there is a 10% chance of the human moving the rook out of danger, and a 90% chance of the rook staying in position. Thus, one branch would be favored in decision-making. That all sounds fairly complicated, but let's just stay focused on the data structures here. The fanciest structure that Newell described was a tree, and those are just a particular application of lists. I often make the mistake of lumping in trees with some other higher-order data structures in my mind, but that's wrong. We're just dealing with lists of lists of lists, and I have to keep reminding myself that. I should really make a note that trees are just lists.
Starting point is 00:26:03 A tree is a little harder to traverse sometimes, it can require recursion if you want to be a cool kind of dude, but they're just lists. Now, to close off this program, there's one more important use of lists. This is a bit of a long passage, but I think it's needed to illustrate my point. Direct from Newell's chess paper, quote, there is a common pattern to the solutions to be described here. In all of them, the machine uses very particular and approximate methods. It is as if the machine consisted of a vast collection of rules of thumb. Each rule is a much oversimplified expression of how the machine should behave with respect to some particular aspect of the problem. The rules are of all kinds. Chess principles to follow, measurements to make, what to do next, how to interpret rules of thumb,
Starting point is 00:26:57 and so on. These are so organized that they form the necessary qualifications and additional specifications for each other. Each rule is essentially a machine program. At any particular instance, the machine is under the control of some rule, or shunting between rules under the control of a master program. End quote. Remember how I said a list can contain anything? If you stick to theory or a nice programming language, then a list is allowed to contain executable code. Really, there's no reason it can't. The ability to put code in a list to treat a program at data is a fantastic generalization. That means more flexibility and more power, especially in the case of AI. Newell is hiding it a little here, but he's onto something powerful. That line about interpreting rules of thumb is the tip-off.
Starting point is 00:27:54 He's talking about a program that can change its own programming. To do that in any sane way, you need to be able to handle data and code in a similar manner. way, you need to be able to handle data and code in a similar manner. Getting to the tail end of the paper, that's where you'd expect to find the gritty implementation details, right? Here's this long narrative that describes the algorithm and all the data structures. You want to follow that up with a little bit of prose about your code. Maybe, oh, I wrote this program on Joniac, it works great. code. Maybe, oh, I wrote this program on Janiac, it works great. However, Newell dropped something that I think is more important. In 1954, you didn't really have access to programming languages,
Starting point is 00:28:36 at least not high-level ones. Fortran was just a draft specification deep inside an IBM lab. a draft specification deep inside an IBM lab. Hopper's A-series compilers were kicking around somewhere at UNIVAC. All you had to work with was machine code or assembly language. Long-time listeners should know that I'm a bit of an assembly language aficionado myself. I like working in the low-level language recreationally. I think it's fun to get down and dirty sometimes. But it's really hard to write an assembly, especially when it comes to dealing with complex data. I've been slowly working on a DOS game for the last year,
Starting point is 00:29:15 and the vast majority of that code is dedicated to constructing, traversing, and modifying data structures. You end up spending more time writing that kind of code than the actual interesting bits of the program. Newell comes right out and says that his chess machine isn't really doable with only assembly language. Quote, It is necessary to add a few comments about the general purpose language required to make a machine of this nature feasible. An essential feature of this language is its ability to refer to itself as well as to the external chess
Starting point is 00:29:51 situation. The only languages of this power of expression that have been formalized at all are those used in symbolic logic such as calculus or propositions. The one illustrated below is an adaptation for computer use." Basically, Newell is saying that assembly isn't strong enough stuff, or at least it's not well-suited to the task of making a chess-playing machine. The refer-to-itself part here is where some neat stuff really comes in. Nowadays, this would be called either introspective or reflexive programming. It's the idea that you can write a program that can modify itself in a smart and safe way. The other part, the external piece, is basically just talking about list
Starting point is 00:30:40 management. That includes the whole forest of trees we discussed earlier. Newell gives a snippet of pseudocode as an example for the new language, his idea being that to make a more advanced program like his chess machine requires more advanced tools. There's not much to say at this point, the paper is just a whiff of the idea of a new language. But there is one point I need to hit on. Newell only shows a single line of example code. It's there, it's terse, it's dense, and it's hard to puzzle out without the rest of the paper. But crucially, he breaks the expression down using a little diagram. That diagram, dear listener, is in the shape of a tree, with subfunctions feeding their outputs into higher-up functions. The next jump into the AI waters comes immediately
Starting point is 00:31:34 after the chess paper. This is when a team starts to assemble around Newell's work, or rather, a group of friends all start to get obsessed with the same problem. This group was primarily three researchers. We have Alan Newell, who we've already met. Next up is Herb Simon, a political scientist that was also working at RAND. Simon was studying group dynamics. It sounds like his work was similar to what Newell was doing back before the AI bug got to him. The ideas in Newell's chess paper easily passed the bug onto Simon. The final member, Cliff Shaw, was another staffer at RAND and a more fully-fledged
Starting point is 00:32:13 computer programmer. There isn't as much information about Shaw, although most of the code output in this period is attributed to him. In interviews, Newell described this collaboration as something more like a friendship than work partnership. It seems like these three were just fascinated by the prospects of a thinking machine. They had the know-how to make it happen, and they liked working together. It's the kind of situation that we should all hope to be in one day. Now, I'm not going to spend a lot of time getting into the core trio here, just know that everything going forward is a group effort. Over the next few months, a number of projects formed. Chess, as a continuation of Newell's earlier paper, was one.
Starting point is 00:32:57 Another project involved a program that could automatically solve logic proofs. These were neat applications, but they were all dependent on a larger project, a new programming language. Now, the term programming language here might be a little bit off the mark. At least, this isn't as simple as it sounds. At this point in time, 1955 or 56, we're still in an odd transitionary period. The idea of some higher order programming tool existed. Compilers did technically exist. Fortran was soon to be announced, but convention and adoption were still years and years away. In a May 1956 paper, Newell and Simon used the term pseudocode, interpretive language, and automatic computing to
Starting point is 00:33:48 describe their project. You could try and dismiss this as just a difference in terminology, but I'd hazard against that hasty assessment. It's not just that the terms were different, but the concepts of this next step in programming were also different. I'm going to be calling what Newell, Simon, and Shaw create a programming language, since otherwise this episode would get pretty inaccessible. But the simple fact is that their creation, which they will call the information processing language, isn't exactly what we think of today as a programming language. Initially, they would have used either automatic coding system or interpretive language to refer
Starting point is 00:34:32 to IPL. I think the better way to understand it today would be as a programming system. It's not a tightly defined thing like a language. It doesn't have syntax like a language, but it's definitely a system you can program with. Now, the Information Processing Language, or IPL, was designed explicitly to make programming artificial intelligence easier. It was the first language or system built for anything like this, so I think IPL will give us a good rundown of what a language needs to be the mother tongue of AI. Luckily for us, those requirements are enumerated pretty clearly in the 1956 paper. That paper was called Current Developments in Complex Information
Starting point is 00:35:20 Processing, and Newell and Simon dedicate a section to IPL, or as it was called at the time, logic language. The rationale behind creating the new language is really clearly stated, albeit it's a little wordy. So we're going to go point by point, and once again, it seems that the crew here really liked lists. So starting with number one. We should like to be free from specifying where information is to be stored, and to be able to get information in terms of our need for it without having to remember where it is. End quote. Working with raw memory addresses does really suck. The main point here is abstraction, or maybe we could just call it variables. Newell and Simon want to get further away from dealing with the actual computer
Starting point is 00:36:10 and free up some brainpower for solving larger problems. That's pretty in line with contemporary thinking on automatic programming. So far, so good. Next, quote, 2. We should like plenty of working space, so we needn't be concerned about writing over good information or have to do too much information shuffling, end quote. The space part here is a bit of a red herring. This is all about clean data management. We've
Starting point is 00:36:39 already seen that Newell in particular is keen on the use of lists. I think this point is clearly related to those longer data structures. Managing data in assembly language alone can get a little dangerous, especially on larger projects. You can run into issues where you just need to remember what part of memory is being used for which data. If you slip up either by forgetting where data lives or writing some dangerous code, you can really easily overwrite your data. So clearly, a list E language will need some type of system built in for dealing with data in a safe and reasonable manner. Alright, next point. Quote 3. We should like to be able to define new concepts wherever we find them useful, so as to express our ideas in compact form. End quote. I'm cheating a little
Starting point is 00:37:35 bit here. I've already looked ahead so I can make better sense of what this point relates to. On the surface, this seems like Newell and Simon just want a convenient language, something easy to use. Fair enough. Compact here could just be shorthand for powerful code, the kind of language where a simple line can compile to hundreds of instructions. However, this takes on a different meaning in the context of later iterations of IPL. This is where we see introspection rear its kind of confusing head. One of the slick features of IPL is its ability to define new functions. On the surface, that's
Starting point is 00:38:15 basic. Just about every language supports something like functions or methods or subroutines. But in IPL, those functions can be defined and stored as data in a list. This gets weird. It gets into symbols and links to symbols. We'll talk more about this soon. I just want to introduce this idea as early as possible. Now, the final line item, quote, four. We should like to have operations which are at the natural level of thinking about complex processes. This holds for a. a computational operation such as addition and subtraction, b. control operations which make decisions as to what process shall be carried out, and c. procedural operations which set up computations, see that the right information is in the right place, and keep track of the status of computations still in progress.
Starting point is 00:39:11 End quote. Bit of a long one there, but once again, this is part of what sets IPL apart from contemporaries. Well, at least in theory. Newell and Simon are talking about a programming language as a bridge between human and machine. They want a language that can serve as an intermediary, one that's modeled after how humans think about problems. They want it to be easy to translate notes and ideas into their computer language. We're heading in the same direction as Vannevar Bush and Doug Engelbart here. Well, at least in theory. So far, it sounds like IPL is shaping up to be something really good, right?
Starting point is 00:39:51 It handles data, abstracts away computations and complications, and comes with a slate of powerful features. I wouldn't get too excited if I were you. This is still a language from the 1950s we're dealing with. excited if I were you. This is still a language from the 1950s we're dealing with. Now, I'm not going to take us through the progression of IPL from an early system up into a more fully-fledged language. While that may be an interesting exercise, I don't think we need to embark on it. Plus, early reports on IPL leave a bit to be desired. They describe the language, but can still be unclear about the finer details. IPL is usually described as an aside in larger papers, at least early on. So we'll be looking at IPL 5 as described in late 1959. Even at this point in its evolution, IPL still looks like
Starting point is 00:40:41 nothing out there, so I think it's safe to go a little bit into the future to get some better documentation. To start with, I think it's time we really dig into what I mean when I said IPL isn't what we would think of as a programming language today. I think the best way to describe IPL would be as an instruction set architecture. Explanations of the language invariably end up sounding like some abstract definition of a computer. It doesn't use expressions. Every line is given as a single instruction. This is already, at least basically, how assembly language works. Each line of assembly is a simple instruction.
Starting point is 00:41:23 You don't tell a computer to go add x to y and then store that value in z. You'd handle that kind of operation over a few smaller instructions. The same goes for IPL. But the similarities don't stop there. IPL instructions are structured in a very particular way. The documentation calls each instruction a quote IPL word composed of prefixes and a symbol. This is followed by arguments. The details here are a little bit too fine to really get into. I mean, it's a table of prefix values and combinations. There's not all that much to go through. You get prefixes for different types of addressing, a handful of different operations,
Starting point is 00:42:07 and some qualifiers about symbols and arguments. That should all sound a little cumbersome, and that's because it is. What's more, this is really similar to how a computer's instruction set is specified. Usually, each instruction issued to a processor comes as a chunk of data packed into a few different parts. This can be a single word long, or longer, or some variable length.
Starting point is 00:42:33 We have some kind of prefixes or suffixes for specifying address modes, an operation, and optional arguments. Once again, IPL is mimicking an instruction set architecture very closely here. On the surface, this just seems, well, it seems kind of lame. Newell and Simon were talking a pretty big game, but looking at the spec, they don't seem to back that up. This was my initial take on the matter, and I think it's a fair first assessment. That said, it is a bit dismissive to just, you know, call IPL lame and move on. IPL is doing a lot more than just reinventing the digital wheel. Its instruction set is built in a somewhat savvy way that actually advances Newell and Simon's stated goals. There are eight initial operations in IPL 5. Expanding out that
Starting point is 00:43:28 matrix with address modes gives us maybe a few dozen. These are all the lowest level instructions IPL can handle. Call them primitives if you like. On a normal computer, this is where you'd get such hits as load register, add, compare, and maybe increment. But in IPL, most of these primitives are for dealing with list data. For instance, we get two instructions called preserve and restore, which are roughly equivalent to push and pop. They let a programmer easily push a value to the top of a list or pop that value off a list. The only odd instruction out here are branch, the basic conditional branching operation that all computers need, and execute. Now, execute is where the magic is really happening here. This is how you actually run routines in IPL. You just instruct IPL to execute something. This can be some normal, unadulterated address,
Starting point is 00:44:34 a function neat, if you will. But we can get tricky with it. Remember that we have different addressing modes, different ways to specify locations in memory. Execute can accept a pointer. This means that you can tell IPL to run a function at some variable location in memory. Instead of telling IPL to go execute the code at location 100, you say go execute the code at r, where r is a variable that holds the address of your routine. This sounds a little confusing at first, it kind of is, but it gives us some major upsides. For one, this means IPL is able to talk about itself. This is that introspection thing that I keep bringing up. You can write an IPL program that keeps a list of routines, that can manipulate that list of routines,
Starting point is 00:45:26 and even run those routines. In other words, IPL can treat code and data interchangeably, and that includes variable execution. An IPL program can decide what code to run when. That's a really powerful feature. If you don't get the immediate utility of that, then don't worry, we'll be treading deeper into this realm as the episode progresses. Hopefully I can string together a good explanation as to why breaking down this distinction between code and data is so important. Now, in service of fairness, you can jump through a pointer in assembly language. The difference here is that
Starting point is 00:46:05 IPL offers a more convenient and nicely parceled up way to do this. In general, this is what IPL is bringing to the table. It's a convenient way to get around a lot of the unfun part of assembly code. So there we have the instruction part of IPL. It's primitive, it's weird, it's small, and frankly, even reading through the later docs, it's kind of inaccessible. It's hard to get your head around right away. But hey, that's just a small part of the picture. The most important part of a language, I'd wager the most important part of a computer in general, comes down to how it handles memory. Yes, this is a hill I will die on. Memory is the most important piece. So how does IPL deal with memory? Well, that gets a little bit more weird and a lot more inaccessible. The fundamental unit of data in IPL is the list. That fact should
Starting point is 00:47:02 surprise no one at this point. Pretty much all data in IPL either comes in the form of a list or lives inside some list. Ah, but we can get even more specific. The bread and butter here is the linked list. Once again, this is where we get into how general solutions will always, always, always be better. In a normal list in, say, Fortran or C, we're usually dealing with one element after another after another. That's pretty easy to implement, really, and easy to deal with. In this framing, a list is just a long, continuous chunk of memory broken into cells for each element. A linked list is a little more sophisticated. Under the IPL prescription, each element contains a chunk of data, which is called a symbol,
Starting point is 00:47:56 followed by a link to the next element in the list. Interestingly enough, Newell, Simon, and Shaw are credited as the creators of the linked list. This is some deeply fundamental stuff. Linked lists end up being really important for later developments. The data structure is still widely used today. I've even linked quite a few lists myself. It's darn handy to have around. We can thank IPL, this somewhat obscure little language, for that development.
Starting point is 00:48:26 So why is the linked list here such a big deal? Once again, we're looking at something subtle. With a linked list, you don't have to store data in one big continuous chunk in memory. A linked list can be spread throughout memory. Elements can be drawn from anywhere. Elements don't need to be in any kind of order while they're throughout memory. Elements can be drawn from anywhere. Elements don't need to be in any kind of order while they're in memory. The linkages mediate order for you. For one, this is just convenient. But of course, there's more reason for linked lists than just convenience.
Starting point is 00:49:00 It's hard to deal with dynamic data using a conventional list. You run into the shuffling issue that Newell and Simon mention in their 56 paper. As the list grows, you'll eventually reach the end of its designated region of memory. To make that list any bigger, you then have to go find a larger chunk of free space and then relocate your data to that chunk. That's a pain. relocate your data to that chunk. That's a pain. It's slow, and in general, it's one of those times where you end up dealing more with the idiosyncrasies of a computer than your actual program. That's the situation you're trying to avoid most of all. Using a linked list straight up removes that issue. No mess, no fuss. That is, as long as you have some cogent way of handling linked lists.
Starting point is 00:49:47 And IPL has just that. Although, it's a little confusing at first. The important thing to remember here is that IPL is structured like a virtual computer. Keeping that in mind should help. IPL models memory, that is to say, all the memory it can access, as one big linked list. That includes cells that are still free. IPL keeps free chunks of memory on a list called H2. And yeah, the naming convention here kind of sucks. In general, IPL lists are named as a letter followed by a number. If you ever want to program in this language, you just have to get used to
Starting point is 00:50:25 that. When you need to add a new cell to some list, IPL just checks H2 for a free cell. Finding one, IPL then just pops it off of H2 and links it to your list. When a cell is removed from a list, IPL sends it back to the end of H2. The huge advantage here is that it's easy to expand lists. You just draw from a ready pool of free cells. As cells fall out of the list, they revert back to being free. So in theory, you never need to worry about where you're getting memory space from. IPL handles all the gritty details, you just throw together lists. There is, of course, a lot more to IPL. You can read a manual if you really want to go deeper into the language. I'll link one in the
Starting point is 00:51:11 description. But specifically, we're looking at a list machine. We've already looked at some of the theoretical reasons for IPL's development, but what about the practical applications? development, but what about the practical applications? What did IPL actually do? This is where we finally return to artificial intelligence. As I touched on briefly, IPL was designed as a tool to help Newell, Simon, and Shaw in their artificial intelligence research. The language was really a means to a much larger end. There were a number of projects that IPL ended up playing a role in. The one I want to focus our attention on was called Logic Theorist. It was a program that could solve symbolic logic proofs. If you've taken geometry or calculus at really any level, then you're probably familiar with the concept of proofs, at least in passing. You start with
Starting point is 00:52:04 some assumption, usually an equation, and then through a series of operations, you show that you're probably familiar with the concept of proofs, at least in passing. You start with some assumption, usually an equation, and then through a series of operations, you show that it's true. This is most often done by manipulating and contorting the equation until it matches some other equation that's already known to be true, or going from an equation that's known true and manipulating that into the theorem. It's rigorous and repetitive work, but importantly for us, proofs can be really difficult. To properly prove an equation, you need to have a good grasp of the math involved. You need to be able to recognize known axioms, those are the known good equations, and it often helps to have an eye for slick math tricks. Sometimes a proof just takes
Starting point is 00:52:47 intuition. Ideally, you arrive at some simple and elegant solution. It's all a very human process, something that a machine shouldn't be able to do, right? The symbolic logic part here is where the pieces start to fall into place. In an interview for the book AI, The Tumultuous History of the Search for Artificial Intelligence, Simon admits that, quote, Our original intent was to start out with chess, and then we were also considering doing geometry. We thought, probably wrongly, that in both of these there were perceptual problems, important as humans perform the task, that would be hard to deal with. So we went to logic for no deeper reason than that I had two volumes of Principia at Home.
Starting point is 00:53:37 Principia here is Principia Mathematica. It's a work on the fundamental theory of mathematics. Mathematica. It's a work on the fundamental theory of mathematics. It comes with an entire section of logic proofs that pretty much show the underpinnings of math. It's some really dense stuff to be sure. In general, symbolic logic is pretty dense. You can think of it as an abstraction beyond normal mathematics. It deals with the simple logic operations like AND, OR, or NOT that can be used to build up more complex operations. Simon makes it sound like this early focus on logical proofs was mainly a coincidence. Maybe that was the case, but it couldn't have been a better choice. Despite being, well, hard to deal with, symbolic logic
Starting point is 00:54:26 relies on some pretty simple operations. That's especially true when it comes to writing proofs. You're only dealing with symbols, in other words, variables. No numbers are involved here, at least not in the proofs that the Rand crew were dealing with. There are only a handful of operations used. There are only a few axioms, the known true equations. And perhaps best of all, there are only a handful of different types of manipulations used to arrive at a proof. Put together, and just like we saw with chess, this gives us a pretty well-defined bound for the problem. The other little bit about logic that I have to point out is how well it aligns with computing. When you get down to it, at the lowest level, a computer is composed of logic elements. Everything is built from physical logic
Starting point is 00:55:19 gates. Even once you get all the way up to a running machine with instructions and programs, you still have access to those simple logic operations that are used in symbolic logic. There's just this little nice connection going on here that, while maybe not playing a factor in decision-making, must have made Newell, Simon, and Shaw's life a little easier. The early development of logic theorists, this proof-producing machine, and IPL went hand in hand. That much is very clear. But here's where we reach what I think is my favorite part of the story. The first version of logic theorists, and I guess strictly speaking the first version of IPL, were both purely theoretical. At least, neither of them
Starting point is 00:56:06 actually existed on a computer. Later versions would actually be programmed, but the initial version was not. Logic Theorist is explained in pretty fine detail in a June 1956 paper simply titled The Logic Theory Machine. It includes annotated source code and everything, so as far as I'm concerned, this is the definitive source for how the program worked once completed. I'm going to be using that as our guide. So, how did the theorist make its theories? The RAND team broke things down into three smaller problems. Matching, searching, and the actual proof methods used. Now, all of these problems are built around trees of data. After reading a few of these logic papers, I think I've fried my brain pretty well, so stick with me while I try to explain the connections here.
Starting point is 00:56:59 To start off, we have matching. The initial expression that Logic Theorist sets out to prove has to be entered into the computer. This is done, how else, but as a tree. The notation used for symbolic logic is a littleā€¦ gross? It's similar to math when you get down to it, but it just looks daunting. So forgive me if a small example scares you off. Let's say you have the expression P implies P or Q. The implies is a little arrow, the or is a V for whatever reason. P and Q are just variables. Logic Theorist represents that as a tree, where implies is the root of the tree. It's like an equals in
Starting point is 00:57:46 other forms of math. The left branch is just that first p variable. The right branch is the or operation, with the last p and q branching off from the or. So you get this nice little happy tree that fits well into memory and fits well into the framework of linked lists that IPL is all about. From there, Logic Theorist is able to work up a general description of the expression. Newell and Simon explain that they wanted a way to loosely characterize how an expression looks. This is mimicking how a human running a proof would gauge an expression. looks. This is mimicking how a human running a proof would gauge an expression. So they went with three values. The number of variable slots, the number of unique variables, and the levels of the tree. For our example, we have three variable slots, two unique variables, and the
Starting point is 00:58:38 tree representation has three levels. So all that is just to get a representation of the expression. And here's why that matters. Logic Theorist keeps a list of known true proofs, known axioms. This started with five axioms provided by Principia Mathematica, but it would grow as the program ran. Those expressions are also stored as trees, each with their own three-value descriptions. Those descriptions are used as a reference library of axioms. If you give Logic Theorist an expression, it can tell you if it knows of any similar expressions. This is the searching part of the problem. Logic Theorist is able to search its memory to find a starting place for a proof. The match and search scheme shows up a lot in logic theorist. Looking for a starting point is just one example. Most of the methods used for manipulating
Starting point is 00:59:35 expressions also rely on matching expressions and searching known expressions. There's a lot of lists to go through. The next step is carrying out a series of methods to construct the actual proof. There are well-defined rules for these types of methods, just like with normal mathematical proofs, so the bulk of what logic theorists does here is predetermined by the rules of logic. The neat part is how those methods are coordinated. The neat part is how those methods are coordinated. The general algorithm is to look for the most similar known axiom. From there, logic theorist chooses a method to use to manipulate that axiom.
Starting point is 01:00:17 This is done using a series of heuristics. Now, that's just a fancy word for rules of thumb. In practice, it's basically a big long program. Newell, Simon, and Shaw worked up a list of rules for how logic theorists should proceed with proofs. For picking methods, logic theorists goes down a list of options. Initially, this included substitution, detachment, and chaining. Once again, the rules for these operations are all already well known, but heuristics were added so logic theorists would be a little smart about how it would apply these methods. Take chaining, for instance.
Starting point is 01:00:52 This method is where you use the fact that if A equals B and B equals C, then A equals C. To run this step, logic theorists first looked at an axiom where one side of the expression was similar to one side of the proposed theorem. This was getting the A equals B part set up. Then the next step is showing that B equals C. But you can't do that with just the chaining method alone. You've basically taken one step on a longer path here. So logic theorist stores this down as part of a subproblem. Now it needs to show that B equals C to complete the chain. This works out exactly how Newell's
Starting point is 01:01:33 chess machine was described. Certain methods will create these subproblems that are analogous to the chess machine's subgoals. Solving for B equals C is the same as the computer deciding it needs to take a rook before checkmate. It's an abstraction that allows for more complex behavior to take place. Another trick that Logic Theorist employs is a set of so-called stop rules. These are another set of heuristics specifically for deciding when to stop attempting a solution. Newell and Simon specifically explain stop rules in the context of when to stop the a solution. Newell and Simon specifically explain stop rules in the context of when to stop the overall proof-finding process. The main heuristic here is a work measurement. Logic theorists would keep track of how many steps the proof reached and
Starting point is 01:02:17 terminate after a set threshold. There were also similar rules in effect throughout the system. For instance, logic theorists would keep a running list of failed solutions and failed proofs. If at any point a branch of the proof tree matched an expression in this failure list, it would stop pursuing that branch and prune it off. Most of the algorithm was worked out early in the project. We're looking at early 1956. Before actually implementing the theorist and before implementing IPL, Newell and Simon rigged up a simulation. Once again, reading from AI the Turbulent History, Simon explained, quote, Alan Newell and I wrote out the rules for the components of the program in English on index cards, and also made up cards for the
Starting point is 01:03:05 locations of the memories. At the Graduate School of Industrial Administration building, on a dark winter evening in January 1956, we assembled my wife and three children together with some graduate students. To each member of the group, we gave one of the cards, so that each person became, in effect, a component of the logic theorist computer program, a subroutine that performed some specific function or a component of its memory. It was the task of each participant to execute his or her subroutine or to provide the contents of his or her memory whenever called by the routine at the next level above that was in control." at the next level above that was in control. End quote. This must have been a pretty messy experiment. The data structures involved in Logic Theorist are pretty clean, but they are big and
Starting point is 01:03:53 they are numerous. Then again, it was probably satisfying to lay down the structured trees of index cards. Either way, by the end of the evening, Newell and Simon were convinced that Logic Theorist could work. The next steps were, of course, IPL, followed by the actual software implementation. Shaw would serve as the primary programmer, with Newell and Simon hammering out much of the design work. I think that gives us all the background we need to take a look at one final question. us all the background we need to take a look at one final question. Why is IPL the way it is? The list part is easy here. Newell's chess machine made aggressive use of list-based data structures. Logic theorists pushed lists that much further. Lists, and especially trees, were crucial to this approach to artificial intelligence. Linked lists were quite literally invented
Starting point is 01:04:47 to satisfy the RAND team's need to work with dynamic data structures. That all falls into place really neatly. Then what about the strange instruction-based-ness of IPL? Why is there no real syntax to speak of? While this has less to do with the overall story of AI, I think it's a pretty interesting question to ask. For this, I think we can benefit from a contemporary perspective. In the early and mid-50s, Grace Hopper developed the A-series of compilers. These were some of the earliest examples of this dark art. A3, aka Arithmetic, was created around 1955, so I think it's a good comparison point.
Starting point is 01:05:34 In this system, we can see a similar low-level approach. Every line of Arithmetic was a single instruction. You couldn't just write x equals y squared plus square root of z. You had to break that down into its constituent operations. IPL wasn't unique in its approach. One reason for this is probably simple precedence. Assembly language was written this way. For most researchers of this era, that was their only experience. There's also the matter of complexity. Delving into the realm of a complex language, one with syntax and semantics and all the stuff that makes a language really usable to a modern programmer, requires a much more sophisticated compiler. It requires logic to parse a complete language, and that's no small feat.
Starting point is 01:06:28 Also, just incidentally enough, the team at RAND wasn't setting out primarily to make a new language. They developed IPL as a tool to help with a larger set of problems. So maybe making a language with syntax and nice-looking code was more of a secondary concern. Now, strangely enough, Logic Theorist was getting pretty close to this kind of complexity. It was able to represent expressions as trees built up of primitive operations. That's one step used during the parsing phase of most compilers. We're seeing the building blocks stack up here just not quite to their full height yet. Another reason we should consider comes from the stated goals of IPL itself.
Starting point is 01:07:20 One of Newell's requirements for a new language was, quote, we should like to have operations which are at the natural level for thinking about complex processes, end quote. That's one of those obvious goals. Of course a language should be at a level that humans are comfortable with. But what is that level? What is the natural unit of human thought? It seems like Newell, Simon, and Shaw might have suggested that unit was a simple instruction. If you were writing instructions for a human-powered simulation, you probably wouldn't write out an equation or define some fancy syntax. More likely, you'd just jot down a list of simple instructions. Put the first card you get on this stack. Check if
Starting point is 01:08:07 the next card is positive. If it is, then drop it in another stack. That sort of thing. So, why does IPL look like assembly language? I think it's got to be a combination of these factors, both technological limitations and the team's way of attacking the problem. I don't think it's just because it's such an early language, although that does play some role here. That said, it's this primitive approach to programming that would leave an opening for a new contender. Alright, that brings us to the end of this episode.
Starting point is 01:08:44 I think we've covered a lot of ground, and I think we can start to answer some of my initial questions. Specifically, IPL's history helps us establish why lists are important for artificial intelligence. It all comes down to the complex and dynamic type of data needed for simulating intelligence. Or, as Newell and Simon would have put it, lists are required for complex information processing. IPL would be the first generic tool for tackling this problem, but it wouldn't last that long. Just two years after IPL was specified, Lisp hit the scene. The RAN language would continue on through numerous versions, but Lisp was just the superior option. Next time, we'll be starting with the first use of the term artificial intelligence. That will lead us nicely
Starting point is 01:09:33 into Lisp. How did the new language get its start? How did it improve on the groundwork laid by IPL? And are the two languages even very closely related? We'll grapple with those questions and probably a lot more two weeks from now. Until then, thanks for listening to Advent of Computing. If you like the show, there are now a few ways you can help support it. If you know someone else who would like the show, then why not take a minute to share it with them? You can also rate and review on Apple Podcasts. And if you want to be a super fan, then you can support the show directly through Advent of Computing merch or signing up as a patron on Patreon. Patrons get early access
Starting point is 01:10:10 to episodes, polls for the direction of the show, and bonus content. You can find links to everything on my website, adventofcomputing.com. And if you have any comments or suggestions for a future episode, then go ahead and shoot me a tweet. I'm at Advent of Comp on Twitter. And as always, have a great rest of your day.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.