Advent of Computing - Episode 74 - The Elegance of LISP
Episode Date: January 24, 2022This is the conclusion to my exploration of why LISP is the "mother tongue of artificial intelligence". We pick up from the end of last episode and continue to cover the early days of AI. We follow t...he meandering path from the FORTRAN List Processing Language and IPL, up to pen-and-paper versions of LISP and into the first true implementation of the language. Along the way we will see just why LISP is called elegant, and how it was tailored for thinking machines. Selected Sources: https://sci-hub.se/10.1145/321021.321022 - FLPL http://www-formal.stanford.edu/jmc/mcc59.pdf - Machines with Common Sense https://dspace.mit.edu/bitstream/handle/1721.1/6096/AIM-008.pdf - AI Memo 8
Transcript
Discussion (0)
Lisp is a strange language. Well, I guess that might not be totally fair. Let me rephrase that
a little bit. Lisp looks strange when compared to other programming languages. If you ran into it
in the field without any prior exposure, then you'd probably have a hard time classifying it.
hard time classifying it. It doesn't have the semicolons so familiar in C and C-like languages.
Lines don't start with numbers like in BASIC. You don't get separate data and code definitions like in COBOL. And, perhaps most upsetting of all, Lisp doesn't even have the familiar mathematical expressions seen in Fortran.
That last piece is especially strange, because Fortran, for all of its faults,
was a massively influential language.
In many ways, it was THE programming language.
While an argument could be made that Fortran wasn't the first high-level language,
it was the first one that was fully implemented and saw actual use.
A huge number of later languages can draw their lineage back through Fortran.
The path may be a little bit winding, but hey, just look for a few equal signs.
Now, Lisp and its descendants do not share that distinction.
This language just plain doesn't look like anything else.
Look at a fresh line of Lisp, and the first thing you'll probably notice is going to be a C of parentheses.
Then, you'd see that operators all seem to come before arguments, a bit of an inversion to the usual mathematical system.
Look deeper, and the differences will just continue
to pile up. I firmly believe, or at least I hope, that people don't create new programming languages
just to be different. There has to be more going on here. Difference isn't often a sound design goal. There has to be a reason why Lisp is so profoundly unique. And
there is. You see, Lisp was built differently than its contemporaries. Among other things,
Lisp was designed as a tool to make machines think.
Welcome back to Advent of Computing.
I'm your host, Sean Haas, and this is episode 74, The Elegance of Lisp.
Today, we're picking up where we left off last episode.
Just as a quick recap, last time we looked at IPL, the first language designed specifically for use in artificial intelligence. That episode, as well as this one, are my attempt
to understand why Lisp is called the mother tongue of AI, and, in a more micro kind of view,
why lists are a critical feature for that specific application. IPL, or the Information Processing Language, was built around lists.
This was especially visible in more complex structures like trees,
which are, I must always remind myself, just built out of simple lists.
To get even more specific, the real meat of IPL was all about handling linked lists. Now, I think I was able
to pretty well establish why lists matter so much last episode. It all comes down to structuring
data, building decision trees, and representing complex and dynamic problems in memory. I've been
able to hammer that part into my own head at least. The last half of my question is what we're going to be tackling today.
That is, why is Lisp specifically called the mother tongue of AI?
Why not IPL or why not some other language?
And of course, the easiest way to approach that question is to look at the development of Lisp.
One easy contributing factor that I need
to throw out before we go any further is the fact that IPL would have just sucked to use.
It was a really, really early programming language coming onto the scene before a lot
of conventions and niceties were established. IPL looks a lot more like assembly language than anything else. It had a
lot of the features that we'll be seeing reimagined in Lisp, but IPL was just a pain to work with.
It was only a matter of time before something better came along. So let's look at that something
better. We're going to be examining the roots of Lisp, the Lisp
processing language, and we're going to be trying to understand its connection to AI.
Why has Lisp become so tightly associated with this one field? What does it offer over other
languages? And hey, what's with all the parentheses anyway? This episode, we're going to primarily be following the early career of
John McCarthy. And my impression of McCarthy is that he was a very busy man. This seems to have
been especially true in the late 40s and early 50s. He earned a bachelor's degree in mathematics
in 1948 at Caltech. He studied for his PhD first at Caltech and then at Princeton. He became a
Dr. John McCarthy in 1951 after completing his dissertation. Between then and 1955,
he taught at Princeton, Stanford, and eventually Dartmouth College. McCarthy was also entering
mathematics at an interesting point in time. In 1945, some of the first digital computers started crunching numbers.
By the time McCarthy graduated from Princeton, the first stored program machines had come
into being.
He got the chance to go from just hearing about computers in lectures to coming face
to face with functioning machines.
It must have been fascinating to watch a new discipline,
a whole new branch of the sciences, unfold around you. And in a very profound way,
this early era of computing reshaped what kind of mathematics were possible.
So how do we get from this emerging field to artificial intelligence? What's the connection
between John McCarthy, the mathematician,
and John McCarthy, the eventual creator of Lisp? Well, we don't have any one big moment to point
to. It almost sounds like McCarthy naturally slid into computing, and from there, his curiosity and
exposure to new ideas pulled him towards artificial intelligence. In an oral history
interview with the Computer History Museum, McCarthy explains that there were a number of
influences driving him towards thinking machines. Early inspiration may have come from a lecture on
behavior in animals given at Caltech. According to McCarthy, the speakers compared the brain
to a computer, thus sending the
soon-to-be doctor down the rabbit hole.
The only real issue is that McCarthy doesn't seem 100% sure about this one.
Quote,
I went and looked at the proceedings of that symposium, and I had supposed that artificial
intelligence had been discussed there, that is, the use of computers to behave intelligently.
The actual inciting incident here may just be a mix of factors.
Whatever the case, it's safe to say that McCarthy was on the AI tip really early in his career.
His position in academia also helped drive him towards more specific goals. In that same interview, McCarthy talked about meeting John von Neumann,
a figure that looms really large in the early development of computers.
I only had one interview with von Neumann, in which I told him some idea that I had about
artificial intelligence, representing a brain and its environment by finite automaton. And he encouraged me. He said, write it up, write it up.
But afterwards, after my conversation with him, I came to the conclusion that it was, in fact,
a bad idea, because one couldn't represent knowledge in terms of the structure of finite
automaton, which was to represent the brain, so I didn't write it up.
Now, I'm bringing this encounter up early to show a crucial point. One of von Neumann's
larger areas of study was the so-called finite automaton, sometimes called the cellular automaton.
This is a well-defined system of automata, that's small entities that perform
some predetermined task based on a series of inputs. Think of it like a tiny program.
Conway's Game of Life is probably the most well-known example of finite automata at work.
In that system, cells will either turn on or off depending on the number of active neighbors.
In general, automaton can be defined to
follow any rule set. By carefully choosing your rules and carefully generating some initial state,
you can create pretty complicated systems. Some of these systems can even behave unexpectedly,
which gives the idea of intelligence. The theory McCarthy is alluding to here is that finite automata could be used
to model living cells, more specifically the neurons that make up a brain. On the surface,
this seems reasonable enough. A cell is kind of like its own little self-contained pre-programmed
entity, at least if you reduce things down far enough. It can react to
outside stimuli. If you could simulate the operation of a single cell, then it's just a hop,
skip, and a memory jump to simulating an entire brain. Of course, it's not exactly that simple.
This assumes that we know precisely how a brain cell functions, or at least that we know every possible response
to every possible stimuli, and we can simulate that correctly. We kind of can't do that. It also
assumes that we know how a cell in a brain is connected to all of its neighboring cells. Now,
I'm no flesh scientist, but I think it's safe to assume that the human brain isn't exactly a 3D matrix of little cubic cells.
There's some ambiguity there that we don't really know how to model yet.
But that all aside, McCarthy had a very different issue with this automata model.
As he put it, one couldn't represent knowledge with automata alone.
In other words, automata are a poor substitute for memory.
Even at this early juncture, McCarthy understood that memory would be an important component for any artificial intelligence program.
The next step on the road towards LISP would take place in 1955.
That year, McCarthy and a handful of his colleagues started planning a workshop.
From their initial proposal,
We propose that a two-month, ten-man study of artificial intelligence be carried out during the summer of 1956 at Dartmouth College in Hanover, New Hampshire.
New Hampshire. The study is to proceed on the basis of the conjecture that every aspect of learning or any other feature of intelligence can, in principle, be so precisely described
that a machine can be made to simulate it. An attempt will be made to find how to make machines
use language, form abstractions and concepts, solve kinds of problems now reserved for humans and improve themselves.
We think that a significant advance can be made in one or more of these problems
if a carefully selected group of scientists work on it together for a summer. End quote.
The plan was to form something like a temporary think tank. Work on computer intelligence had
been occurring throughout the country at this
point. The proposal itself was pinned by researchers from four different institutions.
You have McCarthy from Dartmouth, who was the primary author of the plan, but you also have
Marvin Minsky from Harvard, Nathaniel Rochester from IBM, and Claude Shannon from Bell Labs.
McCarthy wanted to pull the field together, exchange ideas, and
hopefully focus efforts on some concrete projects over the summer. To that end, McCarthy invited
nearly 50 researchers to the summer session. Of those, only 10 would show up, and even then,
most were only there for part of the summer. So maybe the grandiose plans didn't fully pan out, but the
Dartmouth Summer Workshop was far from a failure. It was the first real symposium dedicated to
artificial intelligence. In fact, this is where the field was named. McCarthy seemed to have decided
on the term early on, so when it came time to put together a conference, he stuck the AI label on it,
and it's been called artificial intelligence ever since. Now, there are two groups of attendees that
we have to talk about. The first is probably the most obviously influential on the future of Lisp.
Alan Newell and Herb Simon came down to Dartmouth to show off Logic Theorist and IPL.
The two would only be there for a few weeks, but their appearance offers a link into a larger story.
If this wasn't McCarthy's first exposure to IPL, then it was at least his first major contact with the language.
And, to be quite blunt, McCarthy did not like what he saw.
The second group that matters here is IBM.
Now, the only employee from Big Blue to actually go to the workshop was Rochester.
That said, there were more IBMers on the invite list that didn't show.
This included John Backus, the creator of Fortran.
At this point, IBM had a profound interest in AI, at least on the research side
of their operation. As McCarthy himself explained in an essay on the development of Lisp,
IBM was undertaking to develop a program for proving theorems in plane geometry,
based on an idea of Marvin Minsky's, and I was to serve as a consultant to that project.
At the time, IBM looked like a good
bet to pursue artificial intelligence research vigorously, and future projects were expected."
Once again, the problem of proofs rears its ugly head. Now, I think these two groups at this
workshop, RAND and IBM, are important because they mark the ends of a spectrum of solutions to the AI problem.
That's kind of a mouthful, but stick with me here.
On the RAND side, we have IPL, a language designed from the ground up specifically for AI.
It has fantastic support for lists, but it's also hard to work with. On the IBM side,
we have something else entirely. BigBlue was a Fortran shop. I guess you could just say it was
THE Fortran shop. At the time, IBM was the only vendor in the world to offer a truly high-level programming language, since,
you know, Fortran was really the only game in town. It offered something IPL didn't have,
expressiveness. Fortran actually has fully-fledged syntax. You can type in complete lines of code
strung together with special characters. You can even write mathematical equations and expressions in Fortran
that look really close to their handwritten equivalents. Fortran even has lists.
Well, it has arrays. The distinction here actually matters a lot. A Fortran array is just a blob of
data subdivided into elements. I know, that's boring.
It's flat.
It doesn't have any dynamic life to it.
Each element is of the same fixed size, and the array has a fixed length.
So, boring.
To expand the array, you have to shuffle around blocks of memory.
IPL used the much more flexible and dynamic linked list,
wherein each element contains the actual stored data
and a pointer off to the next element of the list.
So, not boring, alive, and dynamic.
That may seem like a small detail that I'm kind of overblowing,
but as we saw last episode, dynamic data storage is
crucial for AI. The different ways that IPL and Fortran handle list data impacts not only how you
can use those languages, but what kind of programs are viable in each. Just remember, a language's
usefulness is often determined by how it deals with memory.
Now, IBM also offered something that the small IPL team at RAND couldn't.
Hardware.
At roughly the same time McCarthy was finishing preparations for the Dartmouth workshop,
IBM started planning the New England Computing Center.
Cool name, but it's a little misleading at this point
in time. This was an initiative, together with MIT, aimed at providing New England colleges with
access to a computer. IBM installed a single 704 on MIT's campus. MIT got the bulk of the
computer's time, but the remainder was to be shared between other schools in the region.
So, as you can see, McCarthy was able to get access to computer time.
At least, in theory.
The New England Computing Center would take a little bit to get up and running,
and even then, McCarthy could only get limited time on a single computer.
The machine could handle assembly language and Fortran. Two tools
not very particularly well suited to AI, I might add. But it was the best path that McCarthy could
take. At least for the time being, McCarthy would bet on IBM. The other factor here is McCarthy's
aversion to IPL. On his first run with the language at the summer workshop, McCarthy had this to say,
quote,
The people who were furthest ahead at the time were Newell and Simon, who came for just
a few days, and I got the idea of list processing from them, but did not at all like the language
that they used, end quote.
Some ideas did flow in from IPL, but ultimately McCarthy did not like the language's approach.
Like I keep saying, and will probably say a few more times, IPL had all the tools you needed for AI.
It's just that everything is hidden inside a language that's not exactly easy to learn or use.
The IBM and Fortran connection here is the more fruitful branch.
After the summer workshop, McCarthy would actually consult for IBM on one of their AI projects.
And I want to take a second here to examine this part of the story before circling back in time a little bit, since it takes us to a weird place that I think is
important. McCarthy was consulting for the Geometric Proof Project that I mentioned briefly.
This took place after his exposure to the IPL way of AI. So, quite naturally, he suggested that the
IBM team wouldn't be able to make very intelligent software
without list processing capabilities. This would lead to the birth of Fortran List Processing
Language, or FLPL. I don't know if I can string together a more frightening series of words.
FLPL is described in a little paper called a Fortran compiled list processing
language. And for whatever reason, the only scan I can find of this paper must have been slightly
damp or wrinkled when it was digitized. The top few lines of every file are a little bit wavy, which I think adds a fitting chaotic effect. FLPL wasn't an entirely
new language. It was instead an extension to Fortran's standard libraries that added support
for linked lists, or as the paper calls them, Newell-Shaw-Simon lists. You know, just in case
you need a reminder that linked lists came directly from IPL.
Some of the specific implementation details are different, but by and large, FLPL follows the
RAND prescription pretty closely. It even keeps track of as-yet-unused cells using a big free
list. FLPL came packed with most of the functionality you needed for working with
that error's take on AI. You got dynamic lists, you can easily build trees, and you have all the
functions needed to traverse and modify list data. FLPL even introduced pointers to Fortran,
which allowed list elements to effectively contain any data type. It sounds a lot like IPL, but better,
right? Well, yes and no. FLPL is a bit of a monster. It fits into this uncanny valley,
that's the best way I can put it. It comes with most of the features that made IPL an attractive
option to people like McCarthy, but also comes
with all the issues that pushed researchers away from Fortran. You also get a handful of new
idiosyncrasies just for some good measure there. McCarthy's biggest issue with FLPL was that it
didn't support recursion. Now, I've been trying to never actually get into recursion on this podcast.
It's one of those programming and mathematics concepts that can be a little bit hard to wrap
your head around. But we're quickly heading towards lisp town now, suckers, so we're gonna
have to break this specific podcast taboo. Recursion occurs when a function is defined in terms of itself, or in more
computery terms, a recursive function is a function that calls itself. The classic example
is the factorial, often notated as x! The factorial of x is defined as x times the factorial of x minus 1. That kind of circular
definition is perfectly valid in mathematics. When you program a factorial function, you usually have
to add in what's called a termination case. That is, you stop the function from calling itself once
x is equal to 1. It's a little mind-bendy if you aren't used to the concept, I will admit that.
So why is recursion such a big deal?
The easy answer is that it's just plain powerful.
It's one of the most powerful tools in a programmer's belt.
Recursion can be used to construct really elegant and,
if I can be frank, really cool solutions to complex problems. But speaking more specifically, recursion is a really natural way
to tackle trees of data. For a good example, we can look no further than IPL and Logic Theorist.
In that program, logic expressions were represented as tree structures. We can make this
a little bit easier to think about by jumping into more common mathematics, but the theory here will
still hold. So let's say you start with a simple expression, x plus y times z. You can transform
that into a tree with relative ease. The root of the tree will be the main connective. In this case,
it's the plus operator. That operator takes two arguments. One is just x, so that ends the first
branch. The other argument is the result of a multiplication, so the second branch will be the
multiplication operator. That operator has y and z as its children. So let's say you have that tree loaded up into a
computer. How would you go about evaluating it? Now, there are a number of different approaches
that are all perfectly valid, but recursion offers perhaps the cleanest solution. All you need to do
is write a function called something like, I don't know, evaluate, that
takes a tree structure as its only input.
This function will attempt to run the operation in the root node.
In our case, that's plus.
To do so, it has to pass in the two arguments that branch off the tree.
At the first level, that's x and multiplication. X is simple,
just program evaluate to grab the value of that variable. But what about the operator? What about
multiply? That's also simple. Just call evaluate on that node. Doing so will get you the result
of the multiplication, but you don't have to write any special code to handle
sub-operations beyond a simple check and call. The point here is that recursion is a natural fit
for tree data structures, and trees are just lists. So applying a little bit of our own logic here,
recursion and lists go together, well, they go together like keyboards and spilt coffee.
IPL didn't explicitly support recursion, but languages usually don't have to explicitly
support this feature.
It's more common that the language just allows you to build a recursive function.
Fortran, at least early versions of the language, wouldn't even let you compile recursive
functions.
It was able to check if you were trying to pull any tricks and stop you.
So, by extension, FLPL couldn't handle recursion.
One final factor made FLPL suboptimal, at least in McCarthy's opinion.
In this strange language, not all functions returned values.
Now, like recursion, this is a subtle point.
In the strict mathematical sense, a function has to return some value based off its inputs. That's
the definition of a function. Some of the new functions introduced by FLPL either returned
nothing of consequence or just operated on lists silently.
Imagine, if you will, you're building some complex expression.
It's composed of functions.
In FLPL, you have to just remember which functions give back reasonable results.
If you don't, then your expression won't work.
It may not even compile.
The issue here is that FLPL was introducing some special cases. In real life, it can be good to be special. But when it comes to
programming, the opposite is true. Ideally, a language should be consistent and always act
as expected. You should never be surprised by your programming language.
That's a really bad sign. You shouldn't have some set of special functions that you just
have to remember. Everything should work the same way every time. So this was the general backdrop
that McCarthy was operating within during 1956. There were tools being designed for programming AI. There were
some really smart people tackling the general problem-thinking machines. But, at least as far
as McCarthy was concerned, there was no ideal tool for modeling intelligence. There was a space
opening up where improvements could be made. Before winding back a little bit and talking
about proto-Lisp, I want to take a second to make an important observation. When looking at the
history of programming languages, it's common to place each language in some larger family tree.
It's a quick and easy way to give context about a language. If you don't know much about ALGOL,
you could look it up on one of these charts, see that it influenced C, and make some quick inferences about that older language.
These are great for quick reference, but there are some issues. I mean, most language family
trees don't include the obscure junk that I get into, but that's a separate gripe.
My big issue with these trees is that they don't always show negative influences.
The chart will show when a language was based off an earlier language, when it borrowed features,
when it took existing concepts and syntax. They don't show what's going on with McCarthy's
relationship to IPL and Fortran. Lisp looks nothing like Fortran. It doesn't act like IPL. But both of those languages
had huge impacts on how McCarthy would develop his own language. That impact was McCarthy
disliked both Fortran and IPL, but that's still an influence. That's a kind of nuance that doesn't
fit into a family tree. Maybe it needs its own kind of chart. I, for one, would really
like to see a language tree with connections for the kind of anti-influence or negative influence
or whatever you want to call this kind of connection. The point I'm trying to piece
together here is that Lisp would come out of this sea of Fortran, FLPL, and IPL. During 1957 and 1958, McCarthy was working primarily in Fortran.
The whole time he was keeping notes, making changes, and trying to figure out a better
tool for AI. If that's not proof that Lisp was influenced by Fortran, then I don't know what is.
This period is also where I personally run into a bit of an annoying problem. The timeline
here is really tight. In the space of a few years, McCarthy goes from the Dartmouth Workshop
to the first version of Lisp. Primary sources here are mainly academic papers. So far, so good,
that's my comfort zone. But this comes with a little idiosyncrasy that I don't think I've
mentioned before. Often, it takes a while to finish a paper. You first have to, you know,
do the actual work, and then you write a paper. Then, after that, you send it for publishing.
Depending on the outlet, this can take a variable amount of time. You may need to make edits or even
changes. The result here is that
some of the papers in the lead-up to Lisp weren't published until 1958, so that date stamps them
actually past the canonical birthday of Lisp. You kind of have to line up papers with McCarthy's
recollections and try to find drafts to be sure that everything's in its right place.
This tripped me up when I first started hitting the books, so keep this in mind if you plan to
go down this rabbit hole yourself. Our first actual glimpse of Lisp comes from a strange place.
In 1958, McCarthy wrote a paper describing a theoretical program called Advice Taker.
paper describing a theoretical program called AdviceTaker. The paper was called Programs with Common Sense. On the surface, it would seem that McCarthy is writing on artificial intelligence,
but this quickly gives way to a description of a new kind of language. As the title lays out,
AdviceTaker was a program with a limited form of quote-unquote common sense. McCarthy describes it
as an ability to make simple deductions, but there's a lot more to the program. Quoting from
the paper, the basic program will draw immediate conclusions from a list of premises. These
conclusions will be either declarative or imperative sentences. When an imperative sentence is deduced,
the program takes a corresponding action. These actions may include printing sentences,
moving sentences on lists, and reinitiating the basic deduction process on these lists.
I don't know about you, dear listener, but this is sounding a lot like an interpreter
that takes a programming language as input.
IPL and, to a different extent, FLPL were programs that ran, quote,
deductive processes on lists, at least in a very loose sense.
Now, there is a small detail here that I need to point out as early as possible.
The fact that McCarthy is using imperative sentences to initiate action is crucial for what's to come.
An imperative sentence is something like a command.
Jump, go left, give me your leather jacket.
Those are all examples of imperative sentences.
The verb here, jump, go, or give, comes first, followed by the rest of the sentence.
Or, if we switch around our language a little bit to be more logical,
we have an operator followed by arguments.
Each operator can have a varying number of arguments.
In some cases, there are no arguments, like jump. But in others, you can have a complex number of arguments. In some cases, there are no arguments, like jump, but in others,
you can have a complex list of arguments. Alright, so we're starting to see something
like a programming language emerge in this paper. Ah, but this is just the beginning.
Back to the literature, where McCarthy describes what makes AdviceTaker so special.
Back to the literature, where McCarthy describes what makes advice takers so special.
Quote,
The main difference between it and other programs or proposed programs for manipulating formal languages is that in the previous programs, the formal system was the subject matter,
but the heuristics were all embedded in the program.
In this program, the procedures will be described as much as possible
in the language itself, and, in particular, the heuristics are also described."
This is some 3D chess kind of stuff, so let me try to unpack what's important here.
For AdviceTaker to work, McCarthy had to first define some
restricted scope for his problem. You can't just say, I'm gonna make a computer that can
understand English. You have to kind of tighten that down a little bit. He has to standardize
on some formalized language that AdviceTaker will be programmed to understand. In the case
of the earlier logic theorist, Newell, Simon, and Shaw chose formal
symbolic logic. The choice of an existing framework for expressions gave the RAND team a bit of a
jumpstart. Symbolic logic already had pre-established rules well before logic theorist was written.
McCarthy, mad lad that he was, decided that AdviceTaker would use a new linguistic framework. I'm not
just using flowery language here. McCarthy really does explain AdviceTaker's new language in terms
of linguistics. The stated reason for this undertaking is clear. AdviceTaker was intended
to make general logical conclusions from simple English-like statements.
There wasn't really a pre-existing framework that could do that.
Symbolic logic is close, but it's nowhere near a normal human language.
It's cold, inhuman, and if you spoke in symbolic logic exclusively, you might open some sort of portal.
symbolic logic exclusively, you might open some sort of portal. The details of this advice-taker language are where we start to approach Lisp-ish territory. We could probably call this something
like proto-proto-Lisp. It's still a few versions out from the actual language, but I think it makes
for an interesting vantage point. McCarthy describes everything in terms of functions, and those
functions always come before their arguments. Now, that probably doesn't sound too strange.
That sounds like any function in any programming language. You'll usually say something like
function of x, y, or something similar. What's different here, or at least what strikes my ear a little differently,
is how McCarthy formulates his examples in this paper. You see, in Advice Taker, these functions
all act as imperatives, commands that, if followed, will describe some overall state.
That means that in some cases, we get lines of pseudocode that sound,
well, a little off. For instance, one example in the paper shows the at function. This function
tells AdviceTaker that something is located at some place. The syntax is given as at I comma desk.
Now, I can't speak for everyone, but I certainly don't talk like that.
I would never say at I desk.
I would probably say something like I am at the desk.
Even if you strip out the meaningless helper words, it would still be along the lines of I at desk. That's shorter,
and it retains the meaning of the phrase. Hey, maybe I should start talking like that to save some time. Anyway, even in the shortened example, we can easily parse out the operator, for lack of
a better term, and its arguments. At is what's doing the work here. I and desk are what need
to be connected somehow. McCarthy is shuffling around the expression, making it match an
imperative sentence by putting the operator first, and then adding some parentheses and a comma to
separate things out for the computer. All functions that McCarthy uses in example code in this paper
follow the same pattern,
even if translating it directly into a sentence doesn't make, you know, English sense.
So why would he do this?
What's the advantage?
If AdviceTaker is supposed to be able to deal with some English-like language,
then why are verbs thrown at the front?
like language, then why are verbs thrown at the front? According to McCarthy, this prefix notation sometimes he calls it infix in his writing, would be crucial to Lisp's coming success.
It's a simplification that makes the language easier to parse for the computer. Now, this part
can get a little complicated. Once again, these two episodes are pretty heavy on
the CompSci 101 stuff. The premise we have to accept here is that human languages, for all of
their usefulness, tend to suck. Languages are complicated, inconsistent, and rely on a host
of annoying rules and exceptions. The same is true when it comes to the notation we use for mathematics,
although perhaps to a lesser extent. An expression like 1 plus 1 is simple enough. At a glance,
you can see that, yeah, that's addition with two arguments. Simple. But what if an argument's
negative? We might express that as minus 1 plus 1. Our meaty, very intelligent brains can easily see what that means.
To us, this is still a pretty simple expression.
But how would you programmatically describe it?
Do we have two operations?
Or is the negative part of the first argument?
If we have two operations, then which one needs to be evaluated first?
Or is the negative in front some kind of error in this notation?
Can we even evaluate this?
We start getting into a list of considerations and checks
needed to understand the expression's meaning.
Once again, this is just for a very simple expression. Imagine applying
this to a more complicated expression. Prefix notation allows us to skip over a lot of these
questions. Not all of them, but a lot just fall away. The operator or the function or the verb,
whatever you want to call it, always goes first in prefix notation.
That's always followed by a list of arguments. The arguments might be functions themselves,
but those subfunctions still follow the same convention. Applying that logic to the example,
we get something like plus one one, or plus negative one one.
That looks a little strange, but it's much easier for a computer to parse.
Your program can immediately pick out the function and its arguments.
That means that you can program a system to understand that much easier.
As McCarthy put it when explaining the early history of Lisp,
As McCarthy put it when explaining the early history of Lisp,
Prefix notation here is a play into a computer's strong suits. This also has the convenient side effect of making it easier to implement an interpreter or compiler for this
language. Is this a cynical and lazy ploy to save some time? Perhaps. That said, I'd say that this
is a really smart choice. If you're making some new formalized language to
use with computers, then why not tailor it around what a computer is good at? Does it really make
sense to go through a lot of work just to make older, more familiar notation work on a computer?
Does that save time in the long run? Maybe prefix notation is the objectively right way to go,
and other languages just get it wrong. The final pre-lisp nugget that I want to touch on
has more to do with the specific computer limitations McCarthy was working with.
I think it's safe to say we've established that McCarthy has a keen eye for what a machine is capable of.
Roughly around the same time that AdviceTaker was being devised, McCarthy was filling in specifications for a new language. It's still not quite Lisp yet. It's not quite even real yet,
but it's approaching something more impactful. Hardware limitations were an early concern,
and that would impact McCarthy's design. The only computer he had access to, or was going to
eventually have access to, was the IBM 704 at the New England Computing Center. Even once installed,
access would be limited, so McCarthy was going with the pen and paper approach to designing a new programming language. The central feature was, of course, list processing.
It follows nicely that the first hurdle, then, was figuring out how to handle linked lists
on the 704. We're still talking about the late 1950s here. This is a pretty early period
for computers. Machines really haven't hit their
stride yet, at least I wouldn't say they've hit their stride yet. The 704 is a wonderful example.
This is a different kind of machine, at least to our modern notions of how a computer should be.
It had a 16-bit address space, meaning that memory was addressed using a 16-bit long number.
meaning that memory was addressed using a 16-bit long number.
Each address in memory held a 36-bit wide number.
The computer's registers, its storage for immediate use,
came in a few different sizes.
Some were 15-bit, one was 36, and one was 38.
If that sounds weird to you, then you aren't alone.
It weirds me out a little bit too.
So here's a fun question. If each location in memory is 36 bits wide, but memory addresses are only 16 bits, how do you store
a pointer? How do you keep track of a location in memory? The docs on FLPL, may its bones rot in the earth, actually give us a pretty good answer
to this conundrum. Each location in memory is further subdivided, so you can pack two
pointers into each memory cell, plus some extra space for flags and tags and such.
McCarthy went with a similar model for his pen and paper LISP. The 704 even provided functionality for this type of subdivision.
Plus, McCarthy was at least consulting on the project that led to FLPL,
so there's definitely some crossover going on there.
We gotta remember that LISP is influenced by FORTRAN no matter how we look at it.
So each memory cell was broken up into two segments.
One was called the decrement, and the other was called the address. The naming convention here comes from
how the particular IBM machine handled addressing data. An effective address was calculated by
subtracting the decrement from some base address. That's neat, but it's not really relevant here.
We just need to know the two names for this next part.
Lisp's fundamental unit of data is called a cons cell, C-O-N-S.
It's a single element of a linked list.
As we know from IPL already, each element has to have two parts,
a pointer to the next item in the list, and then the actual data.
to the next item in the list, and then the actual data. On the 704, the decrement part of memory was used as the pointer to the next list element. That left the so-called address part of the memory cell
for storing data. This data could be in the form of a numeric value or a pointer off to some other
list or chunk of data in memory. That's all well and good, but you need more than
a fancy memory layout to create a useful language. You need a way to deal with that fancy memory
layout, some way to access and manipulate data. To deal with this, McCarthy planned a set of
simple instructions, sometimes called primitives. In this initial set are two that
end up sticking around and are of particular interest. These are contents of address register,
or CAR, and contents of decrement register, or CDR, sometimes pronounced COODER. I know,
it's kind of rough. In practice, car returns the first element of a list,
and couldr returns all subsequent elements. The final fundamental piece of Lisp that was
laid out in this period was cons, short for construct. Now, nomenclature-wise, this is a bit of a complication.
A single element of a list in Lisp is often called a cons element.
But cons is also the name of the function used to create and link elements.
It's kind of like how an orange is a fruit but also a color,
and the fruit happens to be described by the color.
Anyway, these three fundamental functions, car, cutter, and cons,
will go on to form a big part of the basis of Lisp. Fundamentally, this is a pretty elegant
system. By chaining these functions together, you can construct, manipulate, and access Lisp data
any way you want. It's the kind of cool and well-thought-out design that's emblematic of
McCarthy's work. You could look at it as highly abstracted. It's definitely way higher level than
contemporaries like Fortran or IPL. But by the same token, this early lisp was heavily influenced
by the hardware McCarthy had access to. Prefix notation, as discussed, is one example. That was
chosen partially because it makes for a language that computers can easily parse. We can also see
this in car and cutter. Those functions will stick as fundamental to Lisp, but we can see this
machine-dependent anachronism right in their names. In a lot of ways, McCarthy was balancing
between this high-minded abstraction and machine-tied reality. If you've read ahead, or if
you're familiar with hacker lore, then you may have been a little bit confused up to this point.
Lisp is heavily associated with MIT, specifically the AI Lab. That group was an incubator for a lot of
artificial intelligence research, hacker culture, and, of course, Lisp. So far, we've been following
McCarthy's adventures at Dartmouth, and to a lesser extent, working with IBM. Well, don't worry,
because we've hit a transition point. In the fall of 1958, McCarthy started teaching at MIT.
Shortly after, he and Marvin Minsky, another professor, started the AI Lab.
This is where Lisp would move from a strictly theoretical exercise to actual code.
At MIT, McCarthy had easier access to computer time and,
likely using the foundation of the
AI lab's proof, had more institutional backing than ever before.
Now, this wasn't some one-and-done kind of process.
McCarthy and his new MIT colleagues approached a full implementation of Lisp in really surprisingly
small steps.
At the time, it was very difficult
to write an entire compiler. That's still true today, actually. So Lisp was first implemented
as a library of functions. A handwritten Lisp program could be converted by a human into a
series of library calls. This was less of a full solution and more a way to start
experimenting with the language without too much upfront work. 1958 is usually cited as the birth
date for Lisp. I'm kind of of two minds about this. Lisp is definitely being worked on in 1958.
The first attempts at an implementation do start to occur in this year,
but this isn't really a recognizable version of the language. Lisp actually goes through a number
of permutations before it reaches some well-known form. In this period, McCarthy describes Lisp as
a mishmash of ideas. We're out of the fully pen and paper era,
so maybe we should call this the pen and machine code version. For the differences here to really
make sense, we have to take a quick detour and talk about what Lisp looks like today.
The most well-known feature of Lisp is its unique syntax. That is, Lisp and Lisp-related languages have a really particular look to them.
You can identify them right away.
Every function column Lisp is surrounded by a set of parentheses.
Now, this is in contrast to most other languages where function names are put outside parentheses.
That might sound like a little silly, maybe even unimportant change,
but it does have a profound effect when considered alongside the rest of the language.
It gives a really unique coat of paint to this programming language.
All functions, be they simply addition, car and cutter,
or even some user-defined subroutine,
follow this pattern. Parenthesis, function name, argument, argument, argument, and on,
and then close parenthesis. And the language is designed in such a way that you need to chain
functions in order to accomplish anything major. So you end up with deeply nested function calls.
By the end, you might be four or five layers deep,
with matching closed parentheses on the other end.
This set of characters having open and closed parentheses everywhere
is so ubiquitous in the language that some claim Lisp actually stands for
lost in a sea of parentheses.
One of the neat upshots here is that Lisp's syntax
is remarkably consistent and clear, at least once you get used to it. The point here is that Lisp
has a very particular flavor. It's instantly recognizable. So you'd assume this syntax was in place early on, right? Well, no, not so much.
The wild Lisp ride starts all the way back with Advice Taker.
The pseudolanguage described in that paper didn't use parentheses in anything close
to how Lisp would later use that character.
It used a style called M-notation or M-expressions.
As near as I can tell, this was a McCarthy original, at least the naming convention was.
Advice takers sported function calls that started with a function name,
followed by arguments enclosed within parentheses.
Arguments were separated by commas.
In later papers, McCarthy would change this slightly,
replacing the open parentheses and closed parentheses with matching square brackets.
Now, as McCarthy explains, early Lisp experiments expanded and mashed up the syntax.
Quote,
The programs to be hand-compiled were written in an informal notation called
M-expressions, intended to resemble Fortran as much as possible.
M-expressions, intended to resemble Fortran as much as possible. Besides Fortran-like assignment statements and gotos, the language allowed conditional expressions and the basic functions
of Lisp. The M-notation also used brackets instead of parentheses to enclose arguments of functions
in order to reserve parentheses for Lisp structure constants." So So sure, we can say Lisp started in 1958,
but it looked totally different. To me, the most important difference is that these early versions
lacked syntactic consistency. This is something that shows up in a lot of languages, so
it's not really a bad thing,
it's just weird to see in early versions of Lisp.
Fortran-like assignments here just means lines like x equals 1, that's the usual fare that
any programmer is familiar with.
Functions were called using the previously explained method and name, followed by its
arguments inside square brackets.
Lists, however, were by its arguments inside square brackets. Lists,
however, were defined using parentheses and commas. Do you see the pattern here, or rather,
the lack thereof? I just described three different tasks you could carry out in proto-Lisp,
and three different styles of syntax. That's just how most languages work.
At the time, the only real reference was Fortran,
and that's just how Fortran worked.
However, that doesn't mean it's the best way to go about things.
It's a level of inconsistency that could be seen as acceptable,
but could also cause its own problems.
Two of the key hallmarks of Lisp are its consistency and its simplicity. It really
is a language stripped down to its purest expression. If you were to show someone a
snippet of 1950s-era Lisp, they probably couldn't identify it with the later language. A big part of that
is Lisp's early syntactic complexity. We can see this complexity in full bloom in the first paper
fully dedicated to the language. The actual paper is published in 1960 with the riveting title
Recursive Functions of Symbolic Expressions and Their Computation
by Machine. But in full transparency, I'm not working off that specific version of the paper.
I'm going to be pulling from an earlier 1959 draft. That draft is simply called AI Memo 8.
Once again, this is a case of pub dates being later than the actual work,
so I want to be as close to the work as possible. Anyway, Memo8 gives us our first concrete
description of this whole Lisp thing. This is also our first glimpse of a new syntax,
the S-expression. This is the full parenthesis-laden notation that Lisp will be known for,
but in this early work, S-expressions are only used to define data. That's important because
it means that in MIMO 8 Lisp, code and data are defined using different syntax. Functions,
definitions, and the rest of the language still rely heavily on M expressions.
It's just the actual representation of data that uses S expressions.
The entire memo is pretty darn dense.
It reads almost like a math paper, which I guess I shouldn't be surprised at,
given that McCarthy's background is in mathematics.
And while the syntax and naming conventions may be
different, memo 8 is still describing Lisp. You just have to look a little bit deeper. Beyond the
core functions we've talked about, and the data representation, there are a few points that we
need to examine. First, this early version of Lisp supports recursion. It's even right in the paper's
full title. Now, this isn't an explicit feature.
Lisp just allows you to write recursive functions. All you have to do is define a function that
references itself by name. In general, Memo 8 fleshes out a lot of details about functions
that were missing in the earlier shreds of Lisp. Another major piece to this is the ability to treat functions
as variables. In other words, the ability to treat code and data interchangeably. Just as
with recursion, this feature works just by virtue of how Lisp is designed. The language does not
impose any limitations on what a variable can or can't be.
So, thanks to that, Lisp can just treat functions like data, no problem at all.
All that is interesting, and there is some more detail on the language described in memo 8,
but I'm going to leave that as an exercise for the reader.
I don't want to explain too much more syntax and audio form.
The real meat to this paper comes once we reach the section on computability.
This is where McCarthy shoots the moon, so to speak.
Quote,
In this section, we shall show that all functions computable by Turing machine are expressible as S-functions.
If, as we contend, S-functions are a more suitable device for
developing a theory of computability than Turing machines, the proof in this section
is out of place and should be replaced by a plausibility argument similar to what is called
Turing's thesis." This is where my eyes lit up. I had been reading references to this part since I started looking into Lisp.
For me, this was a big payoff.
Throughout his writings, McCarthy takes kind of cheap shots at Turing's model of computing.
McCarthy contended that the Turing machine,
the main theoretical construct used to describe a computer, is clunky.
There's just not a better way to put it. The main argument McCarthy makes comes down to the fact that the Turing machine
model does not play well with functions. Turing's model uses an infinite tape of cells and a head.
The head can move between cells, read the state of a cell, and change values according to some list of instructions.
It can also change what it's executing depending on the value of certain cells.
For some specific set of instructions, it can be shown that this machine can compute anything computable.
This is basically where we started last episode.
where we started last episode. McCarthy's issue here is that to assert a proof using Turing's model, you have to work with instructions. You have to go literally bit by bit. McCarthy,
math nerd that he was, found this to be an inelegant solution. That's just a nice way to
say that McCarthy didn't like the Turing machine. Instead, he thought that computability could be better defined and expressed using functions.
The proof here is, well, it's really complicated.
It requires some wicked code, and I'd be lying if I said I fully understood it myself.
But there's something really important going on.
I fully understood it myself. But there's something really important going on. The actual proof consists of a Lisp program, written in the awkward old-style M expressions, that can simulate a
Turing machine. Whatever McCarthy's misgivings, this is one of the cool parts of Turing's model.
A Turing machine is able to simulate any other machine, so if you can simulate a Turing machine,
that means your system can do the same.
It's kind of like a proof backdoor.
The cool part, the Lisp-ish part, is how McCarthy represents the Turing machine.
He uses S-expressions.
Now, maybe that shouldn't come as a surprise given the framework I've described.
Now, maybe that shouldn't come as a surprise given the framework I've described. The simulated machine has to be represented using some kind of data, and this early Lisp
uses S-expressions exclusively to represent data.
Maybe not simple, but McCarthy's hand is kind of forced.
Now, are you ready for the mind-bending part. The proof that McCarthy provides is written in M-expressions,
and it evaluates S-expressions.
Once again, that may sound kind of mundane and boring,
but for me, this was a big payoff,
and it took me a bit to get the implications.
I've read this paper more than once, I will say.
get the implications. I've read this paper more than once, I will say. The Lisp proof is showing that S-expressions can be used to compute anything. The fancy phrase to use here is that S-expressions
are Turing-complete. In memo 8 style Lisp, S-expressions are purely for data, but McCarthy
is doing a sleight of hand to execute them or evaluate them like their code.
There's definitely some layers here, it's getting to be a pretty thick onion, so let me try and
condense what's important. McCarthy has shown that Lisp can be self-hosting. So Lisp, even in this nascent state, is powerful enough to fully describe itself.
This is a huge deal for two big reasons.
One, this means that Lisp, in theory, is portable.
This point is more of a side benefit, but I think it's still huge.
Self-hosting is how C will eventually become the portable language of choice.
I don't want to get super deep into this point because portability is a huge topic,
but I have to at least give this a passing reference because I think it's really fascinating
that Lisp could just do this by virtue of it being Lisp.
Point two, Lisp is fully introspective.
Lisp can safely and pretty sanely talk about itself and modify itself.
Put another way, a Lisp program can be adaptive. This alone, even dropping everything else,
makes Lisp a fantastic fit for artificial intelligence. A Lisp program can write another Lisp program just because of how it's designed.
In this proof, there's something else that I really like. McCarthy was getting close to putting
the finishing touches on his language, but he was still missing something. When AI Memo 8 is written,
there's still no way to actually run Lisp. All experiments were done by hand.
But this would change shortly thereafter. In either late 1959 or early 1960, I can't
find the exact date here, Steve Russell wrote the first interpreter for Lisp. This got around
the complication of a compiler. You see, a full compiler will convert source code into the bits of machine code a computer
can actually execute.
Compilers tend to be really hard to program.
An interpreter, by contrast, doesn't go quite that far.
Interpreters will read in source code and then decide how to execute those statements.
They don't actually produce machine code.
They just sit between your source
code and the actual machine. In general, this is an easier kind of program to write.
Russell worked off part of AI MIMO 8. Specifically, the MIMO closes with a so-called
universal S-function. This was a program that could evaluate S-functions, written, of course, in m-functions.
So the theoretical work, at least most of it, was already done by McCarthy.
I mean, that's basically an interpreter right there.
Russell just had to convert that into assembly language and make some adjustments.
But Russell would go a step further.
But Russell would go a step further. McCarthy's formulation defined this universal function called eval in terms of M expressions. This might not be a big reveal, but the M in M expression
stands for meta. McCarthy saw this as a meta language that could be used to define the S
expression parts of Lisp. That's neat and all, but it's kind of gross. It's kind of
overly complicated. There is also one other big issue with M-expressions. They used square brackets.
And the IBM 704 didn't actually support square brackets. It only had a limited set of characters
that it could encode. So when Russell actually
sat down and started hammering out his interpreter, he just dropped any support for M expressions.
The first Lisp implementation only handled S expressions. That simplification led us into
a recognizable form of Lisp. From then on, Lisp was the syntactically consistent, elegant, and refined language we all know and love.
Alright, that brings us to the end of our Lisp adventure.
We haven't gone far into the adoption and actual use of Lisp,
but I'm going to leave that off for another time.
I think one month of linked lists and programming language analysis is enough for a little bit.
So, where do we stand on my initial questions?
Why are LISPs so important for AI?
And why is LISP the, quote, mother tongue of artificial intelligence?
And why is Lisp the, quote, mother tongue of artificial intelligence?
After two episodes worth of research, I think these are pretty easy questions to tackle.
First off, it's linked lists specifically that are important for AI.
Their flexibility allows for dynamic data, which is what early AI was all about.
Perhaps more importantly, linked lists can be used to construct trees,
which were crucial to a lot of early AI software. As for the question of Lisp, I think I finally have the connection in my head. Lisp was designed for AI from the beginning. It was a response to
the inadequacies of Fortran, FLPL, and IPL. McCarthy built his language to
have all the Lisp processing you could ever need, plus the flexibility required for recursion and
introspection. No other language at the time could do that. IPL, kind of, but it was difficult to use.
Really, Lisp was the best tool for the job because it was designed
for the job. And beyond that, Lisp is just a well-designed language. I definitely have a
newfound respect for its elegance. And hey, maybe it's time for me to brush up on it myself.
Thanks for listening to Advent of Computing. I'll be back in two weeks' time with another
piece of computing's past.
And hey, if you like the show,
there are now a few ways you can support it.
If you know someone else who'd be interested
in the history of computing,
then why not take a minute to share the show with them?
You can also rate and review on Apple Podcasts.
And if you want to be a super fan,
you can support the show directly
through Advent of Computing merch
or signing up as a patron on Patreon.
Patrons get early access to episodes,
polls for the direction of the show,
and bonus content.
You can find links to everything on my website,
adventofcomputing.com.
If you have any comments or suggestions for a future episode,
then go ahead and shoot me a tweet.
I'm at Advent of Comp on Twitter.
And as always, have a great rest of your day.