Advent of Computing - Episode 73 - IPL, AI, and Linked Lists
Episode Date: January 10, 2022I'll let you in on a secret: I've never understood why LISP is so closely associated with artificial intelligence. I've decided to fix this. In this episode, and the next, I'm tracing the early roots... of AI and why list processing is important in the field. This episode we dive into the Information Processing Language, a strange programming language that predates LISP . Along the way we discuss the origin of linked lists, chess playing machines, and a program that could solve logic proofs. Selected Sources: http://bitsavers.org/pdf/rand/ipl/P-620_The_Chess_Machine_Dec54.pdf - The Chess Machine https://www.rand.org/content/dam/rand/pubs/papers/2008/P1929.pdf - IPL V introduction http://shelf1.library.cmu.edu/IMLS/MindModels/logictheorymachine.pdf - Logic Theorist
Transcript
Discussion (0)
The modern general-purpose computer can be characterized as the embodiment of a three-point philosophy.
1. There shall exist a way to compute anything computable.
2. The computer shall be so fast that it does not matter how complicated the way is.
And 3. Man shall be so intelligent that he will be able to discern the way and instruct the computer.
End quote.
Coming straight out of a paper written at Rand in 1954, this offers up a pretty rosy
view of computing.
The passage was used as the introduction to a paper on a chess-playing program written
by Alan Newell.
It may sound optimistic, but just a few lines later, Newell is already
drawing bounds around this optimistic view. Really, can you blame him for that? I think a lot
of people were in a similar boat. Computers, especially in their early days, offered almost
limitless potential. But limits were quickly found. What exactly is computable? What kind of problems count?
What kind of problems can even be reduced to a set of instructions? Are humans even smart enough
to handle one of these new computer things? Speaking as a consummate professional myself,
I can tell you that I'm not always the most intelligent when it comes to discerning the digital way.
Accepting these limitations may seem a little frustrating, at least at first.
However, I think that frustration opens up a whole new world of possibilities.
It gives us a new avenue to go down.
Namely, how can we sneak around these imposed limits?
How can we change the definition of what is computable?
One approach, one that Newell and others would pioneer,
was to trick computers into thinking a little more like humans.
Welcome back to Advent of Computing.
I'm your host, Sean Haas, and this is Episode 73, IPL, AI, and Linked Lists.
Today, we're starting on a bit of a journey.
I've been wanting to talk about artificial intelligence broadly, and Lisp more specifically.
Now, I have a bit of a history with the language myself.
Lisp was actually my first programming language. Admittedly, that was a dangerous choice,
but it was what I had on hand. I learned Common Lisp out of an old textbook and programmed on a partly working AT clone my dad had lying around. From there, my relationship with computers has only become more complicated.
Now, there's a lot to be said about Lisp. It's been called subtle and an elegant language.
There's a lot of neat implicitness to it. I think it's one of those things that takes time and
understanding to appreciate. I know it's taken me a while to come around to it. That said, I'm still no good at
programming Lisp, mind you, but I've started to see more of the beauty in its parentheses as I
grow older. But there's one crucial thing about Lisp that I've never been able to fully understand.
Lisp is also known as, quote, AI's mother tongue. In fact, those are the first words used to define Lisp in the
jargon file. Now, I personally subscribe to what I call a Hopperian view of programming languages,
if I can coin a phrase. That is, I share a similar view of programming language diversity
espoused by Rear Admiral Grace Hopper. In her opinion, it was impossible to have one
universal language for all tasks. Instead, Hopper argued for many multiple specialized languages,
each suited to some particular task. Each of these languages would have to be constructed
specifically for its own niche. We can see this borne out in many languages. Fortran, a language
meant for math and engineering, has powerful facilities for handling equations. It's fast,
it can do matrix math, and latter versions of Fortran have great support for multiple processors.
Cobalt, may it rest in peace and never be heard from again, is meant for business,
so it's basically a really fancy overpowered spreadsheet. C was designed for portable system
programming, so it offers platform-agnostic ways to manipulate memory. That's enough languages to
establish a pretty ironclad pattern here. Now, Lisp was designed for programming artificial intelligence systems,
so it supports lists. Now, I don't know if this is just a me problem, but I've never understood
why support for lists is so crucial for artificial intelligence. Don't misunderstand me. Lists are a
really important data structure. They're up there on my list of favorite data structures.
I couldn't have a list of favorite data structures without them.
There's just something I'm missing, and I'm really hoping that I'm not the only one out there.
Otherwise, I'll feel a little bit silly.
This episode and next episode are going to chronicle my adventure trying to understand why Lisp, Lists, and AI go together.
Specifically, I want to figure out what's so special about the Lisp and Lisp. To do that, we need to examine IPL, the Information Processing
Language. This was a predecessor to Lisp, although maybe Lisp isn't its direct descendant.
My plan here is to outline the state of things before Lisp hit the scene, establish what's
going on with these Lisp things in general, and set the groundwork for next episode where
we'll try to figure out why Lisp is so suited to artificial intelligence.
The term artificial intelligence was coined in the summer of 1956 at a workshop located
at Dartmouth College.
the summer of 1956 at a workshop located at Dartmouth College. This isn't the real birth of AI, just the point in time when it got a convenient name. The idea of a thinking machine had been in
the air for quite some time, and experiments into the technology had already shown promise by 56.
We can thank Alan Turing for at least some of this, and strangely enough here, I don't
mean the so-called Turing test.
For those of you not in the know, the Turing test is an experiment proposed by its namesake
where a computer would communicate with a human via terminal.
The human would then try and guess if they were talking to a computer or another human.
To pass the test, the computer needed to trick the human.
That's fun, but perhaps not
fundamental. Here, I'm talking about the Church-Turing thesis. In other words, the model
that describes how all computers can function. In very broad terms, a computer will be able to
perform a calculable operation. Now, there are caveats as far as power goes.
There are issues as far as what is considered calculable. But the idea here is that if you
can reduce a problem to steps, then a computer should be able to handle it for you, as long as
the computer is so-called Turing-complete or can emulate a Turing machine. The obvious application is math, but that's not the only application here.
If thinking could be described as a series of steps,
then that could in theory be fed into a computer.
This may be a really complicated problem, but here's the cool result.
If we look at this thesis the right way,
problem, but here's the cool result. If we look at this thesis the right way, then a computer should be able to think, whatever that actually means. The cool part here that's easy to just
gloss over is this actually gives us a possible path towards a thinking machine. We have a
restricted scope. Sure, the scope is still huge. It's defining what thinking is and then reducing it to a series
of steps. That's a lot of work, but that is a restriction in scope. We have a more narrow problem
that we should be able to solve, at least in theory. Now, a growing number of researchers in
this period, we're talking late 1940s, early 50s, believed that human thought
could, in fact, be reduced down to a series of steps. This initial cohort was a strange mix of
psychologists, early computer scientists, and at least one political scientist. It's in this mix
that we find Alan Newell. In 1954, Newell was oscillating between completing his PhD in psychology
and working as a researcher at RAND Corp. While at RAND, he became involved with a series of
increasingly complicated simulation studies. Specifically, Newell was part of a research
group studying how humans handled complex tasks. The group had set up a simulated air defense control center. It's
complete with radar readouts and realistic data. Honestly, it sounds like it would have been a
fascinating study to put together. Rand had groups of subjects working in this fake control center
in eight-hour shifts, and all information was passed in by researchers or over these simulated radar displays. Newell was the person
on the ones and twos, so to speak. He was in charge of handling the fake radar data, which
was driven by a computer. The whole point of the experiment was to study, in a controlled
environment, how a team of people would respond to increasingly complex data. The test group had
to be as realistic as possible, and the test set of data had to be as realistic as possible, and the test set
of data had to be as realistic as possible. Newell's solution was to start off with actual
live air traffic control data. Initially, the simulated radar screen displayed a mirror image
of what actual radar operators were seeing in a nearby control tower. That was a good baseline.
were seeing in a nearby control tower. That was a good baseline. Newell then programmed the radar displays so that he could tweak the live data. More planes could be added, enemy aircraft could
be thrown in. Over time, more and more complicated situations could be displayed. All the while,
the RAND research team sat back and watched. The specific application here is pretty neat, but it's not what's important.
Newell was doing something relatively new here. At this point, circa 1954, computers were by and
large fancy calculators, at least that's how they were often viewed. They crunched numbers,
processed data, and maybe solved equations. But Newell's fake radar displays
weren't cast from this exact mold. This was non-numeric computing, as Newell put it. He
wasn't the only person doing this kind of programming, but he had inadvertently found
himself in some pretty rarefied air. This psychology research project, with a side of non-numeric programming,
set up Newell for a profound realization. One day in 54, while at RAND, he met someone named
Oliver Selfridge. Newell would later describe this event as a conversion experience.
What Oliver Selfridge does when he comes down is to talk about a little crazy machine called the MTC, the Memory Test Computer, which was hardly a machine worth talking about, but that they were playing with.
They had done a little pattern recognition task on it, and that just made it clear exactly what was going on as far as I was concerned.
I mean, absolutely clear. End quote.
far as I was concerned. I mean, absolutely clear. End quote. This is also one of those fun cases where we have both sides of the story and they seem to match up. In an interview, Selfridge said
of this encounter, quote, I talked about pattern recognition and Alan and I hit it off and this
was really the trigger. I mean, he was ready for it. He got the whole idea. End quote. Selfridge had been working on a computer system that could recognize simple patterns.
This wasn't just programming in some large set of reference images.
Selfridge's machine was thinking.
Well, at least in the broadest sense of the word.
He had designed a program based off theories of how human perception
worked. This would really mess up Newell something fierce. He realized that Selfridge's pattern
matching system was, essentially, a step beyond the behavioral work he had been carrying out at
RAND. Newell was using a computer to observe how humans adapted to stimuli. But here, Selfridge was
taking the next leap. He was modeling those findings inside a machine. He was making a
computer that could behave, at least in a limited sense, like a human. From there, Newell was hooked.
This new thing didn't even have a name yet, but Newell knew he wanted to be involved.
Quote, I went home that weekend. I designed a whole system like that for our air defense,
for understanding the people doing the air defense direction center, and I never looked back. And it
really did happen just like that. I was all primed for it, literally primed for it, but the conversion
to deciding that that was it,
and that I knew exactly how it was going to happen, happened mid-November 1954. But you can
see all the stuff that prepared for it. Conversion experiences are all the same." End quote.
The conversion here was quick. However, it did take some time for Newell to start writing his
own thinking programs. As near as I can tell, this first weekend jaunt was more of an exercise.
It was Newell trying on some new ideas for size.
It didn't take long for Newell to move from one exercise to a much more realized project.
All it would take was the right application to focus Newell's newfound vigor.
Luckily, one was close at hand. Chess.
AI chess, or whatever you'd call it prior to AI being named, already had its own history by the
time Newell took up the challenge. In fact, this takes us full circle already. As early as 1948,
Alan Turing and David Chaperone were trying to program a computer to play chess against a human opponent.
They called their algorithm Turochamp after both of their names.
The approach involved assigning values to each board state,
projecting forward to generate a set of all possible results from a certain move,
and then using state values to choose the quote-unquote best move.
The final program was rendered useless by, well, the state of computing in 1948,
but the theory was somewhat sound. In the next few years, other researchers would turn their
eye to the same problem, and by 1954, Newell was the latest victim. In December, Newell would put his ideas
to print. His first paper on the subject was titled The Chess Machine, an example of dealing
with a complex task by adaptation. Now, before we get into the paper, we need to deal with the
matter of language here. We're at a pretty early phase in the development of computer science as a
discipline. That goes for AI too, if you haven't noticed. Newell is using quote-unquote machine
to mean computer and program, but sometimes he uses machine to just mean hardware, other times
just the program. It's something that tripped me up a little, so I figure it could stand to mention.
Anyway, the chess machine is where we get our first glimpse of lists. The paper opens up with
the quote I used to start this episode, basically saying that computers offer unparalleled power.
However, Newell immediately puts caveats in place. He starts describing situations where the idea of
computable answers start to break down. To pull from Newell's list of doom, computers reach the
end of their usefulness when, quote, 1. The relevant information is inexhaustible. 2. The
set of potential solutions is neither enumerable nor simply representable. 3. The processing to Remember, we can use the Church-Turing thesis to show that a computer should be able to solve almost any
problem, but that's assuming we have limitless resources and some kind of bounds on our input.
What if the problem you're trying to solve has infinite inputs? What if inputs never stop coming
in? And what if you can't calculate your results very fast? This is where we reach the
fringe of what's digitally possible. Take chess, for instance. What would the best possible strategy
be for determining moves? It would probably be something close to Turochamp. You want to look at
the current board state, then project out every possible subsequent state that could occur.
Then you need to look through those states for every branch that ends in your victory.
If there are multiple paths, then pick the fastest route to success, or somehow determine the best route to success.
Instead of thinking one or two moves ahead, the program would be thinking every possible move ahead.
Easy peasy, if you know
everything that could happen, it should be simple to win chess. Of course, doing this the brute
force way leads to its own issues. An unintelligent program would have to calculate every possible
permutation. That's a pretty exhaustive search. How many calculations would it take for each move?
I'm not going to do the math here, but it has to be some ridiculously large number. That's a pretty exhaustive search. How many calculations would it take for each move?
I'm not going to do the math here, but it has to be some ridiculously large number.
So time becomes a concern.
That's number four on the list of doom.
What happens if the program chooses a path,
then the human makes a move that closes that path off?
Let's say the machine chose a route that would require taking a pawn in the next two turns,
but the human player moves that piece out of danger. That sounds a lot like issue number three on Newell's list. The computer wasn't able to generate the correct result, since the result
depended on processing that had not yet occurred. The net result is that the computer loses a game
of chess, so not the end of the world,
but I think this is an illustrative point.
Here's what makes chess such an interesting problem, and really what makes it so well
suited to early AI as an exercise.
It's a well-bounded problem.
In other words, the input data is finite, and the possible states are also finite.
Sure, those finite numbers may be pretty large, but that is a bound.
The relevant information is, in fact, exhaustible, and the set of solutions is very numerable.
So chess only runs into 50% of Newell's list of computer doom.
Sounds like a pretty good warm-up exercise.
Newell would take an approach similar to Turochamp, but that was just the base for something new.
You see, the trick was to add in a little bit of smarts.
In general, Turochamp was a pretty fair algorithm.
By that, I mean it wasn't doing anything particularly tricky.
It's all relatively brute force calculations.
Newell's chess machine wasn't that straightforward.
It was a lot closer to something intelligent.
The big shift here was the element of choice, or rather, the simulation.
Turochamp had a small amount of choices built in, but Newell
would take that to a more complicated level. His chess machine followed the basic format of
calculating all possible results for a set of possible moves, but didn't always follow branches
all the way to checkmate. Instead, Newell incorporated this idea of a sub-goal.
The basic plan here is that you could limit the amount of moves calculated by stopping short at certain overall game goals.
For instance, the computer could decide to limit its search to moves that allowed it to capture a rook.
Those sub-goals could then have their own sub-goals, so you end up building something closer to a strategy, something a lot closer to how a human would think about goals.
Instead of looking at every possible move, you start to get a chain of goals.
Say, take out the human's rook, grab some pawns along the way, and eventually remove
the queen in heading towards checkmate.
It's a more abstracted approach to describing a
chess strategy. Newell's chess machine takes this even a step further. In addition to modeling its
own strategy, the program models its opponent's strategy, or at least it tries to guess what the
opponent is likely to do. The idea here is that the computer can factor in predictions about the
human player. It's trying to get into the opponent's head, at least in a limited,
digital kind of way. The algorithm here seems pretty neat, but for me, the specifics are a
sideshow. The real deal all comes down to data structures. Chess is cool and all if you're a nerd, but data structures,
that's the real cool stuff. Anyway, for this particular application, lists end up playing
a really big role. Now, to be 100% clear, I'm working off Newell's paper here and not actual
source code. As far as I can tell, we don't get actual
source code until maybe a lot later, but I don't think that's going to be super relevant to the
overall discussion. We'll address that down the line. To start with, Newell used a list to represent
game states. This part isn't super deep into the whole AI aspect, but I do like the elegance here.
Newell was using a list of each chess piece, where each element was the piece and its location on the board.
I just like the neat efficiency there. It saves you from having to traverse a grid of data.
In general, a one-dimensional list is easier to deal with a two-dimensional list. That's
neat, but we can go a little deeper. The chess machine also used lists to manage what Newell
called continuations. These are the possible future board states that I keep talking about.
Now, Newell isn't just using an ordinary list. Well, he is, but it's a step beyond an everyday list.
Newell suggested the use of a tree.
And this, dear listener, is where one of the big missing puzzle pieces clicked into place for me.
I guess this is where I should throw in a quick sidebar about data structures.
Remember, we're in cool kid town talking about data structures, not chess.
So, welcome back to Computer Science 101.
A list is, well, it's just a list.
You have a bunch of elements that come one after the other, all stored in some kind of order,
or at least accessible in some kind of order.
You can grab an element by its index in
that list. Theoretically speaking, those elements can be anything you want. In practice, there are
usually restrictions, thanks to computers sometimes being, well, being kind of bad. But in theory,
you can make a list out of anything. A two-dimensional list is a special case where you can construct something
like a grid, or a chessboard. Let's say you have a list with five elements. Then each of those
elements is itself a five-item long list, and you have a five-by-five grid. You can access each
point on the grid using two indexes, one for the outer list and one for the inner list.
using two indexes, one for the outer list and one for the inner list. But we can get more wild with it. We can make a whole tree. This is another special case where each element fits into a larger
hierarchy. Every element has what's called a parent that contains it, and optionally can have its own
children. In the chess machine example, Newell constructed a tree that used continuations to
form branches. Following along one branch could lead you down towards a goal, like capturing a
rook. That branch may have its own sub-branches for different outcomes. The chess machine might
predict a chance of the human moving the rook. That would sprout off a new branch on the tree.
To further complicate things
and push us more towards nuanced thinking, Newell describes each branch in the tree as having an
associated probability. This is most important for branches related to opponent decisions.
The machine might calculate that there is a 10% chance of the human moving the rook out of danger, and a 90% chance of the rook staying in
position. Thus, one branch would be favored in decision-making. That all sounds fairly complicated,
but let's just stay focused on the data structures here. The fanciest structure that Newell described
was a tree, and those are just a particular application of lists. I often make the mistake of lumping in trees with some other higher-order data structures in my mind,
but that's wrong.
We're just dealing with lists of lists of lists, and I have to keep reminding myself that.
I should really make a note that trees are just lists.
A tree is a little harder to traverse sometimes, it can
require recursion if you want to be a cool kind of dude, but they're just lists. Now, to close off
this program, there's one more important use of lists. This is a bit of a long passage, but I think
it's needed to illustrate my point. Direct from Newell's chess paper, quote, there is a common pattern to the solutions to be described here. In all of them, the machine
uses very particular and approximate methods. It is as if the machine consisted of a vast
collection of rules of thumb. Each rule is a much oversimplified expression of how the machine should behave
with respect to some particular aspect of the problem. The rules are of all kinds.
Chess principles to follow, measurements to make, what to do next, how to interpret rules of thumb,
and so on. These are so organized that they form the necessary qualifications and additional specifications for each other. Each rule is essentially a machine program. At any particular instance,
the machine is under the control of some rule, or shunting between rules under the control of
a master program. End quote. Remember how I said a list can contain anything? If you stick to theory or a nice programming language, then a list
is allowed to contain executable code. Really, there's no reason it can't. The ability to put
code in a list to treat a program at data is a fantastic generalization. That means more flexibility
and more power, especially in the case of AI.
Newell is hiding it a little here, but he's onto something powerful.
That line about interpreting rules of thumb is the tip-off.
He's talking about a program that can change its own programming.
To do that in any sane way, you need to be able to handle data and code in a similar manner.
way, you need to be able to handle data and code in a similar manner.
Getting to the tail end of the paper, that's where you'd expect to find the gritty implementation details, right?
Here's this long narrative that describes the algorithm and all the data structures.
You want to follow that up with a little bit of prose about your code.
Maybe, oh, I wrote this program on Joniac, it works great.
code. Maybe, oh, I wrote this program on Janiac, it works great. However, Newell dropped something that I think is more important. In 1954, you didn't really have access to programming languages,
at least not high-level ones. Fortran was just a draft specification deep inside an IBM lab.
a draft specification deep inside an IBM lab. Hopper's A-series compilers were kicking around somewhere at UNIVAC. All you had to work with was machine code or assembly language.
Long-time listeners should know that I'm a bit of an assembly language aficionado myself.
I like working in the low-level language recreationally. I think it's fun to get down
and dirty sometimes.
But it's really hard to write an assembly,
especially when it comes to dealing with complex data.
I've been slowly working on a DOS game for the last year,
and the vast majority of that code is dedicated to constructing,
traversing, and modifying data structures.
You end up spending more time writing that kind of code
than the actual interesting bits of the program. Newell comes right out and says that his chess
machine isn't really doable with only assembly language. Quote,
It is necessary to add a few comments about the general purpose language required to make a
machine of this nature feasible. An essential
feature of this language is its ability to refer to itself as well as to the external chess
situation. The only languages of this power of expression that have been formalized at all
are those used in symbolic logic such as calculus or propositions. The one illustrated below is an adaptation for computer use."
Basically, Newell is saying that assembly isn't strong enough stuff,
or at least it's not well-suited to the task of making a chess-playing machine.
The refer-to-itself part here is where some neat stuff really comes in.
Nowadays, this would be called either introspective
or reflexive programming. It's the idea that you can write a program that can modify itself in a
smart and safe way. The other part, the external piece, is basically just talking about list
management. That includes the whole forest of trees we discussed earlier. Newell gives a snippet
of pseudocode as an example for the new language, his idea being that to make a more advanced
program like his chess machine requires more advanced tools. There's not much to say at this
point, the paper is just a whiff of the idea of a new language. But there is one point I need to hit on. Newell
only shows a single line of example code. It's there, it's terse, it's dense, and it's hard to
puzzle out without the rest of the paper. But crucially, he breaks the expression down using
a little diagram. That diagram, dear listener, is in the shape of a tree, with subfunctions
feeding their outputs into higher-up functions. The next jump into the AI waters comes immediately
after the chess paper. This is when a team starts to assemble around Newell's work, or rather,
a group of friends all start to get obsessed with the same problem. This group was primarily three researchers.
We have Alan Newell, who we've already met.
Next up is Herb Simon, a political scientist that was also working at RAND.
Simon was studying group dynamics.
It sounds like his work was similar to what Newell was doing back before the AI bug got to him.
The ideas in Newell's chess paper easily passed the
bug onto Simon. The final member, Cliff Shaw, was another staffer at RAND and a more fully-fledged
computer programmer. There isn't as much information about Shaw, although most of the
code output in this period is attributed to him. In interviews, Newell described this collaboration as something more
like a friendship than work partnership. It seems like these three were just fascinated by the
prospects of a thinking machine. They had the know-how to make it happen, and they liked working
together. It's the kind of situation that we should all hope to be in one day. Now, I'm not
going to spend a lot of time getting into the core trio
here, just know that everything going forward is a group effort. Over the next few months,
a number of projects formed. Chess, as a continuation of Newell's earlier paper, was one.
Another project involved a program that could automatically solve logic proofs. These were
neat applications, but they were all dependent on a larger project,
a new programming language. Now, the term programming language here might be a little
bit off the mark. At least, this isn't as simple as it sounds. At this point in time, 1955 or 56,
we're still in an odd transitionary period. The idea of some higher
order programming tool existed. Compilers did technically exist. Fortran was soon to be announced,
but convention and adoption were still years and years away. In a May 1956 paper, Newell and Simon
used the term pseudocode, interpretive language, and automatic computing to
describe their project. You could try and dismiss this as just a difference in terminology, but I'd
hazard against that hasty assessment. It's not just that the terms were different, but the concepts
of this next step in programming were also different.
I'm going to be calling what Newell, Simon, and Shaw create a programming language, since
otherwise this episode would get pretty inaccessible. But the simple fact is that
their creation, which they will call the information processing language, isn't exactly
what we think of today as a programming language.
Initially, they would have used either automatic coding system or interpretive language to refer
to IPL. I think the better way to understand it today would be as a programming system.
It's not a tightly defined thing like a language. It doesn't have syntax like a language, but it's definitely a system you can program with.
Now, the Information Processing Language, or IPL, was designed explicitly to make programming
artificial intelligence easier.
It was the first language or system built for anything like this, so I think IPL will
give us a good rundown of what a language
needs to be the mother tongue of AI. Luckily for us, those requirements are enumerated pretty
clearly in the 1956 paper. That paper was called Current Developments in Complex Information
Processing, and Newell and Simon dedicate a section to IPL, or as it was called at the time,
logic language. The rationale behind creating the new language is really clearly stated, albeit
it's a little wordy. So we're going to go point by point, and once again, it seems that the crew
here really liked lists. So starting with number one. We should like to be free from specifying where
information is to be stored, and to be able to get information in terms of our need for it
without having to remember where it is. End quote. Working with raw memory addresses does
really suck. The main point here is abstraction, or maybe we could just call it variables.
Newell and Simon want to get further away from dealing with the actual computer
and free up some brainpower for solving larger problems.
That's pretty in line with contemporary thinking on automatic programming.
So far, so good.
Next, quote,
2. We should like plenty of working space,
so we needn't be concerned
about writing over good information or have to do too much information shuffling, end quote.
The space part here is a bit of a red herring. This is all about clean data management. We've
already seen that Newell in particular is keen on the use of lists. I think this point is clearly related to
those longer data structures. Managing data in assembly language alone can get a little dangerous,
especially on larger projects. You can run into issues where you just need to remember what part
of memory is being used for which data. If you slip up either by forgetting where data lives or writing some
dangerous code, you can really easily overwrite your data. So clearly, a list E language will
need some type of system built in for dealing with data in a safe and reasonable manner.
Alright, next point. Quote 3. We should like to be able to define new concepts wherever
we find them useful, so as to express our ideas in compact form. End quote. I'm cheating a little
bit here. I've already looked ahead so I can make better sense of what this point relates to.
On the surface, this seems like Newell and Simon just want a convenient language, something easy to use.
Fair enough.
Compact here could just be shorthand for powerful code,
the kind of language where a simple line can compile to hundreds of instructions.
However, this takes on a different meaning in the context of later iterations of IPL.
This is where we see introspection rear its kind of confusing head.
One of the slick features of IPL is its ability to define new functions. On the surface, that's
basic. Just about every language supports something like functions or methods or subroutines.
But in IPL, those functions can be defined and stored as data in a list. This gets
weird. It gets into symbols and links to symbols. We'll talk more about this soon. I just want to
introduce this idea as early as possible. Now, the final line item, quote, four. We should like to
have operations which are at the natural level of thinking about complex processes. This
holds for a. a computational operation such as addition and subtraction, b. control operations
which make decisions as to what process shall be carried out, and c. procedural operations which
set up computations, see that the right information is in the right place, and keep track of the status of computations still in progress.
End quote.
Bit of a long one there, but once again, this is part of what sets IPL apart from contemporaries.
Well, at least in theory.
Newell and Simon are talking about a programming language as a bridge between
human and machine. They want a language that can serve as an intermediary, one that's modeled after
how humans think about problems. They want it to be easy to translate notes and ideas into their
computer language. We're heading in the same direction as Vannevar Bush and Doug Engelbart here. Well, at least in theory.
So far, it sounds like IPL is shaping up to be something really good, right?
It handles data, abstracts away computations and complications, and comes with a slate of powerful features.
I wouldn't get too excited if I were you.
This is still a language from the 1950s we're dealing with.
excited if I were you. This is still a language from the 1950s we're dealing with. Now, I'm not going to take us through the progression of IPL from an early system up into a more fully-fledged
language. While that may be an interesting exercise, I don't think we need to embark on it.
Plus, early reports on IPL leave a bit to be desired. They describe the language, but can still be unclear about the finer details.
IPL is usually described as an aside in larger papers, at least early on. So we'll be looking
at IPL 5 as described in late 1959. Even at this point in its evolution, IPL still looks like
nothing out there, so I think it's safe to go a little bit into the future
to get some better documentation. To start with, I think it's time we really dig into what I mean
when I said IPL isn't what we would think of as a programming language today. I think the best way
to describe IPL would be as an instruction set architecture. Explanations of the language invariably end up sounding like some abstract definition of a computer.
It doesn't use expressions.
Every line is given as a single instruction.
This is already, at least basically, how assembly language works.
Each line of assembly is a simple instruction.
You don't tell a computer to go
add x to y and then store that value in z. You'd handle that kind of operation over a few smaller
instructions. The same goes for IPL. But the similarities don't stop there. IPL instructions
are structured in a very particular way. The documentation calls each instruction a quote IPL word composed of
prefixes and a symbol. This is followed by arguments. The details here are a little bit
too fine to really get into. I mean, it's a table of prefix values and combinations.
There's not all that much to go through. You get prefixes for different types of addressing,
a handful of different operations,
and some qualifiers about symbols and arguments.
That should all sound a little cumbersome,
and that's because it is.
What's more, this is really similar
to how a computer's instruction set is specified.
Usually, each instruction issued to a processor
comes as a chunk of data packed
into a few different parts. This can be a single word long, or longer, or some variable length.
We have some kind of prefixes or suffixes for specifying address modes, an operation,
and optional arguments. Once again, IPL is mimicking an instruction set architecture very closely here.
On the surface, this just seems, well, it seems kind of lame. Newell and Simon were talking a
pretty big game, but looking at the spec, they don't seem to back that up. This was my initial
take on the matter, and I think it's a fair first assessment. That said, it is a bit dismissive to
just, you know, call IPL lame and move on. IPL is doing a lot more than just reinventing the
digital wheel. Its instruction set is built in a somewhat savvy way that actually advances Newell
and Simon's stated goals. There are eight initial operations in IPL 5. Expanding out that
matrix with address modes gives us maybe a few dozen. These are all the lowest level instructions
IPL can handle. Call them primitives if you like. On a normal computer, this is where you'd get such hits as load register, add, compare, and maybe increment. But in IPL, most of these
primitives are for dealing with list data. For instance, we get two instructions called
preserve and restore, which are roughly equivalent to push and pop. They let a programmer easily push a value to the top of a list or pop
that value off a list. The only odd instruction out here are branch, the basic conditional
branching operation that all computers need, and execute. Now, execute is where the magic
is really happening here. This is how you actually run routines in IPL.
You just instruct IPL to execute something. This can be some normal, unadulterated address,
a function neat, if you will. But we can get tricky with it. Remember that we have different
addressing modes, different ways to specify locations in memory. Execute can accept a
pointer. This means that you can tell IPL to run a function at some variable location in memory.
Instead of telling IPL to go execute the code at location 100, you say go execute the code at r,
where r is a variable that holds the address of your routine. This sounds a little
confusing at first, it kind of is, but it gives us some major upsides. For one, this means IPL is
able to talk about itself. This is that introspection thing that I keep bringing up. You can write an
IPL program that keeps a list of routines, that can manipulate that list of routines,
and even run those routines. In other words, IPL can treat code and data interchangeably,
and that includes variable execution. An IPL program can decide what code to run when.
That's a really powerful feature. If you don't get the immediate utility of that, then don't worry, we'll be treading
deeper into this realm as the episode progresses.
Hopefully I can string together a good explanation as to why breaking down this distinction between
code and data is so important.
Now, in service of fairness, you can jump through a pointer in assembly language.
The difference here is that
IPL offers a more convenient and nicely parceled up way to do this. In general, this is what IPL
is bringing to the table. It's a convenient way to get around a lot of the unfun part of assembly
code. So there we have the instruction part of IPL. It's primitive, it's weird, it's small, and frankly, even reading
through the later docs, it's kind of inaccessible. It's hard to get your head around right away.
But hey, that's just a small part of the picture. The most important part of a language, I'd wager
the most important part of a computer in general, comes down to how it handles memory. Yes, this is a hill I will die on. Memory is the
most important piece. So how does IPL deal with memory? Well, that gets a little bit more weird
and a lot more inaccessible. The fundamental unit of data in IPL is the list. That fact should
surprise no one at this point. Pretty much all data in IPL
either comes in the form of a list or lives inside some list. Ah, but we can get even more specific.
The bread and butter here is the linked list. Once again, this is where we get into how general solutions will always, always, always be better.
In a normal list in, say, Fortran or C, we're usually dealing with one element after another
after another. That's pretty easy to implement, really, and easy to deal with. In this framing,
a list is just a long, continuous chunk of memory broken into cells for each element.
A linked list is a little more sophisticated.
Under the IPL prescription, each element contains a chunk of data, which is called a symbol,
followed by a link to the next element in the list.
Interestingly enough, Newell, Simon, and Shaw are credited as the creators of the linked list.
This is some deeply fundamental stuff.
Linked lists end up being really important for later developments.
The data structure is still widely used today.
I've even linked quite a few lists myself.
It's darn handy to have around.
We can thank IPL, this somewhat obscure little language, for that development.
So why is the linked list here such a big deal?
Once again, we're looking at something subtle.
With a linked list, you don't have to store data in one big continuous chunk in memory.
A linked list can be spread throughout memory.
Elements can be drawn from anywhere.
Elements don't need to be in any kind of order while they're throughout memory. Elements can be drawn from anywhere. Elements don't need to be in
any kind of order while they're in memory. The linkages mediate order for you. For one,
this is just convenient. But of course, there's more reason for linked lists than just convenience.
It's hard to deal with dynamic data using a conventional list. You run into the shuffling issue that Newell and Simon mention in their 56 paper.
As the list grows, you'll eventually reach the end of its designated region of memory.
To make that list any bigger, you then have to go find a larger chunk of free space
and then relocate your data to that chunk.
That's a pain.
relocate your data to that chunk. That's a pain. It's slow, and in general, it's one of those times where you end up dealing more with the idiosyncrasies of a computer than your actual
program. That's the situation you're trying to avoid most of all. Using a linked list straight
up removes that issue. No mess, no fuss. That is, as long as you have some cogent way of handling linked lists.
And IPL has just that. Although, it's a little confusing at first. The important thing to
remember here is that IPL is structured like a virtual computer. Keeping that in mind should
help. IPL models memory, that is to say, all the memory it can access, as one big linked list.
That includes cells that are still free.
IPL keeps free chunks of memory on a list called H2.
And yeah, the naming convention here kind of sucks.
In general, IPL lists are named as a letter followed by a number.
If you ever want to program in this language, you just have to get used to
that. When you need to add a new cell to some list, IPL just checks H2 for a free cell. Finding one,
IPL then just pops it off of H2 and links it to your list. When a cell is removed from a list,
IPL sends it back to the end of H2. The huge advantage here is that it's easy to expand lists. You just draw
from a ready pool of free cells. As cells fall out of the list, they revert back to being free.
So in theory, you never need to worry about where you're getting memory space from.
IPL handles all the gritty details, you just throw together lists. There is, of course,
a lot more to IPL.
You can read a manual if you really want to go deeper into the language. I'll link one in the
description. But specifically, we're looking at a list machine. We've already looked at some of the
theoretical reasons for IPL's development, but what about the practical applications?
development, but what about the practical applications? What did IPL actually do?
This is where we finally return to artificial intelligence. As I touched on briefly,
IPL was designed as a tool to help Newell, Simon, and Shaw in their artificial intelligence research.
The language was really a means to a much larger end. There were a number of projects that IPL ended up playing a role in. The one I want to focus our attention on was called Logic Theorist. It was a program
that could solve symbolic logic proofs. If you've taken geometry or calculus at really any level,
then you're probably familiar with the concept of proofs, at least in passing. You start with
some assumption, usually an equation, and then through a series of operations, you show that you're probably familiar with the concept of proofs, at least in passing. You start with some
assumption, usually an equation, and then through a series of operations, you show that it's true.
This is most often done by manipulating and contorting the equation until it matches some
other equation that's already known to be true, or going from an equation that's known true
and manipulating that into the theorem. It's rigorous and repetitive work,
but importantly for us, proofs can be really difficult. To properly prove an equation,
you need to have a good grasp of the math involved. You need to be able to recognize
known axioms, those are the known good equations, and it often helps to have an eye for slick math tricks. Sometimes a proof just takes
intuition. Ideally, you arrive at some simple and elegant solution. It's all a very human process,
something that a machine shouldn't be able to do, right? The symbolic logic part here is where the
pieces start to fall into place.
In an interview for the book AI, The Tumultuous History of the Search for Artificial Intelligence,
Simon admits that, quote,
Our original intent was to start out with chess, and then we were also considering doing geometry.
We thought, probably wrongly, that in both of these there were perceptual problems, important as humans perform the task, that would be hard to deal with.
So we went to logic for no deeper reason than that I had two volumes of Principia at Home.
Principia here is Principia Mathematica.
It's a work on the fundamental theory of mathematics.
Mathematica. It's a work on the fundamental theory of mathematics. It comes with an entire section of logic proofs that pretty much show the underpinnings of math. It's some really dense
stuff to be sure. In general, symbolic logic is pretty dense. You can think of it as an abstraction
beyond normal mathematics. It deals with the simple logic operations like AND, OR, or NOT that can be used
to build up more complex operations. Simon makes it sound like this early focus on logical proofs
was mainly a coincidence. Maybe that was the case, but it couldn't have been a better choice.
Despite being, well, hard to deal with, symbolic logic
relies on some pretty simple operations. That's especially true when it comes to writing proofs.
You're only dealing with symbols, in other words, variables. No numbers are involved here,
at least not in the proofs that the Rand crew were dealing with. There are only a handful of operations used. There are only a few
axioms, the known true equations. And perhaps best of all, there are only a handful of different
types of manipulations used to arrive at a proof. Put together, and just like we saw with chess,
this gives us a pretty well-defined bound for the problem. The other little bit about logic
that I have to point out is how well it aligns with computing. When you get down to it, at the
lowest level, a computer is composed of logic elements. Everything is built from physical logic
gates. Even once you get all the way up to a running machine with instructions and programs,
you still have access to those simple logic operations that are used in symbolic logic.
There's just this little nice connection going on here that,
while maybe not playing a factor in decision-making,
must have made Newell, Simon, and Shaw's life a little easier.
The early development of logic theorists, this proof-producing machine, and IPL went hand in hand. That much is very clear. But here's where we reach what I think is my favorite part
of the story. The first version of logic theorists, and I guess strictly speaking the first version of
IPL, were both purely theoretical. At least, neither of them
actually existed on a computer. Later versions would actually be programmed, but the initial
version was not. Logic Theorist is explained in pretty fine detail in a June 1956 paper simply
titled The Logic Theory Machine. It includes annotated source code and everything,
so as far as I'm concerned, this is the definitive source for how the program worked once completed.
I'm going to be using that as our guide. So, how did the theorist make its theories?
The RAND team broke things down into three smaller problems. Matching, searching, and the actual proof methods used. Now, all of
these problems are built around trees of data. After reading a few of these logic papers, I think
I've fried my brain pretty well, so stick with me while I try to explain the connections here.
To start off, we have matching. The initial expression that Logic Theorist sets out to prove has to be
entered into the computer. This is done, how else, but as a tree. The notation used for symbolic
logic is a littleā¦ gross? It's similar to math when you get down to it, but it just looks daunting.
So forgive me if a small example scares you off. Let's say you have the expression P implies P or Q.
The implies is a little arrow, the or is a V for whatever reason.
P and Q are just variables.
Logic Theorist represents that as a tree, where implies is the root of the tree.
It's like an equals in
other forms of math. The left branch is just that first p variable. The right branch is the
or operation, with the last p and q branching off from the or. So you get this nice little happy
tree that fits well into memory and fits well into the framework of linked lists
that IPL is all about. From there, Logic Theorist is able to work up a general description of the
expression. Newell and Simon explain that they wanted a way to loosely characterize how an
expression looks. This is mimicking how a human running a proof would gauge an expression.
looks. This is mimicking how a human running a proof would gauge an expression. So they went with three values. The number of variable slots, the number of unique variables, and the levels
of the tree. For our example, we have three variable slots, two unique variables, and the
tree representation has three levels. So all that is just to get a representation of the expression. And here's
why that matters. Logic Theorist keeps a list of known true proofs, known axioms. This started with
five axioms provided by Principia Mathematica, but it would grow as the program ran. Those
expressions are also stored as trees, each with their own three-value
descriptions. Those descriptions are used as a reference library of axioms. If you give Logic
Theorist an expression, it can tell you if it knows of any similar expressions. This is the
searching part of the problem. Logic Theorist is able to search its memory to find a starting place for a proof. The match and search scheme shows up a lot in logic theorist. Looking for a
starting point is just one example. Most of the methods used for manipulating
expressions also rely on matching expressions and searching known
expressions. There's a lot of lists to go through. The next step is carrying out a series of methods
to construct the actual proof. There are well-defined rules for these types of methods,
just like with normal mathematical proofs, so the bulk of what logic theorists does here
is predetermined by the rules of logic. The neat part is how those methods are coordinated.
The neat part is how those methods are coordinated.
The general algorithm is to look for the most similar known axiom.
From there, logic theorist chooses a method to use to manipulate that axiom.
This is done using a series of heuristics.
Now, that's just a fancy word for rules of thumb.
In practice, it's basically a big long program.
Newell, Simon, and Shaw worked up a list of rules for how logic theorists should proceed with proofs. For picking methods, logic theorists
goes down a list of options. Initially, this included substitution, detachment, and chaining.
Once again, the rules for these operations are all already well known, but heuristics were added so logic
theorists would be a little smart about how it would apply these methods.
Take chaining, for instance.
This method is where you use the fact that if A equals B and B equals C, then A equals
C.
To run this step, logic theorists first looked at an axiom where one side of the expression
was similar to one side of
the proposed theorem. This was getting the A equals B part set up. Then the next step is showing that
B equals C. But you can't do that with just the chaining method alone. You've basically taken one
step on a longer path here. So logic theorist stores this down as part of a subproblem.
Now it needs to show that B equals C to complete the chain. This works out exactly how Newell's
chess machine was described. Certain methods will create these subproblems that are analogous to the
chess machine's subgoals. Solving for B equals C is the same as the computer deciding it needs to take a
rook before checkmate. It's an abstraction that allows for more complex behavior to take place.
Another trick that Logic Theorist employs is a set of so-called stop rules. These are another
set of heuristics specifically for deciding when to stop attempting a solution. Newell and Simon
specifically explain stop rules in the context of when to stop the a solution. Newell and Simon specifically explain stop rules
in the context of when to stop the overall proof-finding process. The main heuristic here
is a work measurement. Logic theorists would keep track of how many steps the proof reached and
terminate after a set threshold. There were also similar rules in effect throughout the system.
For instance, logic theorists would keep a running list of failed solutions and failed proofs. If at any point a branch of the proof
tree matched an expression in this failure list, it would stop pursuing that branch and prune it off.
Most of the algorithm was worked out early in the project. We're looking at early 1956. Before actually implementing the theorist
and before implementing IPL, Newell and Simon rigged up a simulation. Once again, reading from
AI the Turbulent History, Simon explained, quote,
Alan Newell and I wrote out the rules for the components of the program in English on index
cards, and also made up cards for the
locations of the memories. At the Graduate School of Industrial Administration building,
on a dark winter evening in January 1956, we assembled my wife and three children together
with some graduate students. To each member of the group, we gave one of the cards, so that each
person became, in effect, a component of the logic theorist
computer program, a subroutine that performed some specific function or a component of its memory.
It was the task of each participant to execute his or her subroutine or to provide the contents
of his or her memory whenever called by the routine at the next level above that was in control."
at the next level above that was in control. End quote. This must have been a pretty messy experiment. The data structures involved in Logic Theorist are pretty clean, but they are big and
they are numerous. Then again, it was probably satisfying to lay down the structured trees of
index cards. Either way, by the end of the evening, Newell and Simon were convinced that Logic Theorist
could work. The next steps were, of course, IPL, followed by the actual software implementation.
Shaw would serve as the primary programmer, with Newell and Simon hammering out much of the design
work. I think that gives us all the background we need to take a look at one final question.
us all the background we need to take a look at one final question. Why is IPL the way it is?
The list part is easy here. Newell's chess machine made aggressive use of list-based data structures.
Logic theorists pushed lists that much further. Lists, and especially trees, were crucial to this approach to artificial intelligence. Linked lists were quite literally invented
to satisfy the RAND team's need to work with dynamic data structures.
That all falls into place really neatly.
Then what about the strange instruction-based-ness of IPL?
Why is there no real syntax to speak of? While this has less to do with the
overall story of AI, I think it's a pretty interesting question to ask. For this, I think
we can benefit from a contemporary perspective. In the early and mid-50s, Grace Hopper developed
the A-series of compilers. These were some of the earliest examples of this dark art.
A3, aka Arithmetic, was created around 1955, so I think it's a good comparison point.
In this system, we can see a similar low-level approach. Every line of Arithmetic was a single
instruction. You couldn't just write x equals y squared plus
square root of z. You had to break that down into its constituent operations. IPL wasn't unique in
its approach. One reason for this is probably simple precedence. Assembly language was written
this way. For most researchers of this era, that was their only experience.
There's also the matter of complexity. Delving into the realm of a complex language, one with
syntax and semantics and all the stuff that makes a language really usable to a modern programmer,
requires a much more sophisticated compiler. It requires logic to parse a complete language, and that's no small feat.
Also, just incidentally enough, the team at RAND wasn't setting out primarily to make a new
language. They developed IPL as a tool to help with a larger set of problems. So maybe making
a language with syntax and nice-looking code was more of a
secondary concern. Now, strangely enough, Logic Theorist was getting pretty close to this kind
of complexity. It was able to represent expressions as trees built up of primitive operations.
That's one step used during the parsing phase of most compilers.
We're seeing the building blocks stack up here just not quite to their full height yet.
Another reason we should consider comes from the stated goals of IPL itself.
One of Newell's requirements for a new language was, quote,
we should like to have operations which are at the natural level for thinking about complex processes, end quote. That's one of those obvious goals.
Of course a language should be at a level that humans are comfortable with.
But what is that level?
What is the natural unit of human thought?
It seems like Newell, Simon, and Shaw might have suggested that unit was a simple instruction. If you were writing instructions for a human-powered simulation,
you probably wouldn't write out an equation or define some fancy syntax. More likely,
you'd just jot down a list of simple instructions. Put the first card you get on this stack. Check if
the next card is positive. If it is, then drop it in another stack. That sort of thing.
So, why does IPL look like assembly language? I think it's got to be a combination of these
factors, both technological limitations and the team's way of attacking the problem.
I don't think it's just because it's such an early language,
although that does play some role here.
That said, it's this primitive approach to programming
that would leave an opening for a new contender.
Alright, that brings us to the end of this episode.
I think we've covered a lot of ground, and I think we can start to answer some of my initial questions.
Specifically, IPL's history helps us establish why lists are important for artificial intelligence.
It all comes down to the complex and dynamic type of data needed for simulating intelligence.
Or, as Newell and Simon would have put it, lists are required for complex
information processing. IPL would be the first generic tool for tackling this problem,
but it wouldn't last that long. Just two years after IPL was specified, Lisp hit the scene.
The RAN language would continue on through numerous versions, but Lisp was just the superior option. Next time,
we'll be starting with the first use of the term artificial intelligence. That will lead us nicely
into Lisp. How did the new language get its start? How did it improve on the groundwork laid by IPL?
And are the two languages even very closely related? We'll grapple with those questions and probably a lot more two weeks from now.
Until then, thanks for listening to Advent of Computing.
If you like the show, there are now a few ways you can help support it.
If you know someone else who would like the show, then why not take a minute to share it with them?
You can also rate and review on Apple Podcasts.
And if you want to be a super fan, then you can support the show directly
through Advent of Computing merch or signing up as a patron on Patreon. Patrons get early access
to episodes, polls for the direction of the show, and bonus content. You can find links to everything
on my website, adventofcomputing.com. And if you have any comments or suggestions for a future
episode, then go ahead and shoot me a tweet. I'm at Advent of Comp on Twitter. And as always,
have a great rest of your day.