Advent of Computing - Episode 112 - Prolog, Part I
Episode Date: July 16, 2023I've been told I need to do an episode about Prolog. Well, here's the start of that process. To talk about Prolog we first need to come to grips with natural language processing, it's tools, and it's... languages. This episode we are doing just that, going from ELIZA to Planner ro SHRDLU in an attempt to figure out how AI was first taught human tongues, where smoke and mirrors end, and where facinting programming begins.  Selected Sources:  https://dl.acm.org/doi/pdf/10.1145/365153.365168 - ELIZA  https://stacks.stanford.edu/file/druid:cm792pj8606/cm792pj8606.pdf - Planner  https://web.archive.org/web/20200725084321/http://hci.stanford.edu/~winograd/shrdlu/AITR-235.pdf - SHRDLU
Transcript
Discussion (0)
One of the most enduring digital struggles is the user interface.
How should a human use a computer?
How should data be presented?
How should inputs be given?
These are all huge questions that, frankly, don't have a right answer.
The simple fact is that computers are alien things.
They're these weird number crunchers that have a mind of their own.
When I was younger and just getting into programming, I remember reading that computers
are machines that simulate mathematics. Now, I can't remember where I read that,
and I can't find it now. If you know where that's from, then please hit me up. I'd love to have a
citation for that phrase.
That definition has stuck with me, the idea that computers simulate mathematics,
specifically because it sounds kind of alien.
When I first read that, I stopped and I tried to roll it around in my head.
It just didn't make a whole lot of sense to me.
It's taken a while for it to have full effect.
didn't make a whole lot of sense to me. It's taken a while for it to have full effect.
Computers, on a very basic level, are simulating mathematics. Now, that doesn't mean they have little chalkboards and textbooks. Rather, that a computer can't really do traditional math.
It pretends to do math by flipping automatic switches, shunting around current, and making binary decisions.
The math done by a computer can, in some cases, have no relation to the pen and paper mathematics that we humans do.
These are alien machines, plain and simple.
So, back to the point.
How are we expected to use these things? Should we bend our brains to fit
into these weird digital molds? Should we have big flashy interfaces with custom-built pointing
devices? Or should we bend computers to fit into the mold of the human mind itself?
I've looked at a lot of different approaches to this problem. That ranges from
hypertext to text-based interfaces to graphical systems. One that I haven't really explored is
natural language. That's a wild third path. The idea is that with careful programming,
a computer can be made to act more like a human. If done correctly, you could interface with a computer just as you interface with a colleague.
All by simple voice, by a natural human language.
As with all solutions, this comes with its own set of challenges,
and its own set of very unique tools.
unique tools. Welcome back to Advent of Computing. I'm your host, Sean Haas, and this is episode 112,
Prologue, part one. We're back to another programming language episode, but this isn't just any language. I've received actually a whole lot of requests to
cover Prologue, so I'm finally taking the dive. If you've emailed me or messaged me about this,
this one's for you. This is going to be a pretty deep dive for me to take, though.
Prologue isn't really like any other language. It's a declarative language. I'll get into the
details as the episode drags on, but
just know that this makes Prolog different than almost every other language out there.
There just aren't a lot of declarative languages in common use today. On its own, that makes
discussing Prolog a little complicated. If possible, I try to at least understand how to
use a language when I'm discussing it.
And I'm not saying I'm an expert on every language I've ever covered on this show.
I'm barely an expert on any programming language.
Rather, that I just want to be able to read some code.
I want to be able to have a feel for it.
I think taking that approach as a type of initiate gives me some insight into how difficult or easy a language is to use.
It gives me a pretty good feel for the tongue.
With Prolog, however, that's a bit of a tough task.
At least, it's been proving difficult for me.
Prolog just looks and acts differently from languages I'm more familiar with.
Prologue just looks and acts differently from languages I'm more familiar with.
There is another impediment that we run into, or rather, that I run into.
Prologue was developed for use in natural language processing.
Now, that's a branch of artificial intelligence that deals with understanding human language.
Right there we have two things that are pretty opaque to me.
AI and linguistics.
This puts us in a fun position where Prologue's background needs some background of its own to make any sense.
What I'm getting to is that we have a series on our hands.
In fact, I'm calling it right now.
I'm going to need some time to come to grips with natural language systems. The reason here is pretty simple,
and it comes back to the Hopperian view of languages. I'm sure you're all sick of me saying
this. Grace Hopper posited that there can't be one universal programming language. Instead, she argued,
there need to be many languages designed for specific niches and applications. I think this
is an important lens to use when discussing programming languages. It helps prevent me
from instantly recoiling from something I don't understand, or something that just looks icky.
I don't understand, or something that just looks icky. It's kind of a way to suppress my,
oh god, oh god, oh no, what is this, please help kind of response. If I see a language and instantly gag, well, that probably doesn't mean it's a bad language. More likely, it means that language is
for some niche that I don't understand, or the language is just trying
to do something that I'm not familiar with. Each language is tailored in some way to solve a
problem. In the case of Prolog, that problem is handling human language, a field which your
intrepid host knows next to nothing about. So here we go. We're going to start from zero and
work our way up to why Prolog is so distinct
from other languages and what kind of special features it can offer. This may take a bit,
but I want to get this right. This episode will be working up to the start of Prolog itself.
We'll be discussing natural language systems, how they were programmed in more traditional languages, and the challenges of that approach. That should give us a foundation to start picking at the
history of Prologue itself. Along the way, I'm going to try and pull my favorite magic trick.
I want to go from recoiling at the mere sight of unusual syntax to being able to read it. At least,
maybe read a little bit of it. We'll see how that one
goes. The main source and jumping off point for me is a paper presented at the ACM's Second History
of Programming Languages Conference. That paper is titled The Birth of Prologue, and it was written
by Alain Colmiguerre and Philippe Rosel. I am so sorry, there are a lot
of French names and I'm going to do my best. Now, these were two of the original developers
of the language, so we get really good inside detail on prologue. It's not from the era of
prologue, which is the 1970s, but it's close enough that it's basically primary. There's enough space
even that there's some retrospection going on. This is where I started my research, and it's
where I realized that I was in for a bit of a rough time. The paper is very, well, technical.
I can usually deal with technical programming stuff. It is my field, after all, but this one threw me for a loop.
There are just a lot of concepts here that I'm not used to. So, to kick things off, I'm going to
take you down my meandering journey of understanding. First of all, we need to discuss
natural language processing, or NLP. The short story is that NLP is a field concerned with getting computers to
understand human language. Those are some really heavy scare quotes around understand. I don't
necessarily mean a thinking machine that speaks English with all its subtleties, but think more
on the side of a way for natural human languages to be processed,
broken down, and used by a computer. There were two key uses for this technology that were being
investigated in the 1960s and 70s. The first of these is automatic translation between languages.
In this instance, the NLP program is loaded up with some type of model that describes each language.
Then you pass in the input text, the computer breaks it down somehow, does some magic,
then it outputs the same text in another language, another human language.
The second application is the human-computer interface.
This one is a little more subtle than straight-up translating text.
The idea here is to create some type of conversational computer interface.
Think of the computer on the bridge of the Starship Enterprise.
That machine isn't always given concrete instructions.
You could ask it to bring phasers online, but you could also ask it to work out a problem for you.
There's some wiggle room that allows for more naturalistic dialogue between operator and machine.
For that to work, the computer has to be able to understand human language
well enough to both break down inputs and construct outputs.
And, you know, to do something with those inputs.
So this is a bit of a two-pronged problem.
NLP software needs to be able to break down natural language and produce natural language. There's a third unseen part that
ties both of these sides together. Somehow the program needs to model and understand language.
It needs to have some internal way of making sense of what it's given.
You don't just want a computer to parrot back what you say. That actually doesn't require much
fancy programming at all. I can give you a one-liner that does that. A good NLP system will
actually do something useful or interesting. This is all pretty vague, so let me hit you with an example.
I'm going to start us off with something simple, some light NLP to wet the palette,
then work up to more rigorous examples of the art.
Our first stop on this whirlwind tour shall be Eliza.
This program was written by Joseph Weisenbaum sometime around 1964 at MIT. An interesting side note is technically this was paid for by the US government, since Weisenbaum was working under Project Mac.
Anyway, Eliza presents a user with a simulated conversation. It's technically configurable to
do all sorts of conversation styles, but it's most well known for a script that simulates a psychiatrist or a psychotherapist, depending on what nomenclature
you want to use. I'm going to borrow a word from the world of physics here. I'd call Eliza
a toy model. This means a model that's superficially correct and is backed up by logic,
This means a model that's superficially correct and is backed up by logic, but isn't as rigorous as an actual model.
Toy models are used by a lot of physicists to get to a qualitative understanding of something.
It's a way to give a concise explanation of a phenomenon without getting too far into
the details.
One example that I'm guilty of using on the show is the water analogy of
electronics. In this model, current is described as the flow of water. A wire is a pipe, a resistor
is a crimp in that pipe, and so on. This gives a very limited explanation of how electronics work.
It can even be used to conduct simple experiments, but it's not entirely
accurate. It breaks down on the edges. It's just a qualitative model that's kind of fun to mess
around with, and it may lead you to ask more important questions. ELISA firmly fits into this
mold. The program is developed as a simplified NLP system. As you converse with Eliza, it spits back somewhat reasonable responses.
It seems a little stilted at times, a little strange, but overall it feels intelligent.
As Weizenbaum explains, however, that's not the case.
You see, it's all a trick, all smoke and mirrors. By 1966, he had already
tricked a good number of users into thinking Eliza was in fact intelligent. This wasn't on
purpose though, Weisenbaum was very upfront with his fakery. But that wasn't enough. He recounts
users claiming their terminals could feel emotions or could actually talk back to them.
Weizenbaum opens up one paper discussing ELISA with this track, which, I gotta say, is very generally applicable.
Quote,
It is said that to explain is to explain away.
This maxim is nowhere near so well fulfilled as it is in the area of computer programming,
especially in what is called heuristic programming and artificial intelligence.
For in those realms, machines are made to behave in wondrous ways,
often sufficient to dazzle even the most experienced observer.
But once a particular program is unmasked,
once its inner workings are explained in language
sufficiently plain to induce understanding, its magic crumbles away.
It stands revealed as a mere collection of procedures, each quite comprehensible.
End quote.
Once a human understands a trick, then that trick kind of loses its shine.
The seemingly intelligent terminal is reduced to really just a cool program.
ELISA isn't a program that actually understands or models natural language.
It's missing a key piece of the puzzle.
ELISA doesn't internally model language in some generic or abstract way.
It doesn't have any deeper machinations.
It's nowhere near that complex.
It doesn't have any deeper machinations.
It's nowhere near that complex.
Rather, it simply decomposes and recomposes words and sentences.
As soon as you load up Eliza, the first puffs of smoke start to appear.
It prints,
Hello, I am Eliza. I'll be your therapist today.
That's how the trick begins.
Weissenbaum explains that his first script for Eliza was in the mold of a psychotherapist. This gave him a little bit of wiggle room for weird responses.
Now, let's say you start talking to the compu-therapist. You type, I am sad, and Eliza
responds with something like, did you come to me because you are sad? Now, ELISA doesn't know what it means
to be sad. It doesn't know that it's actually being used for faux therapy. Rather, it's looking
for keywords in its inputs than generating a result that follows a set of rules. It's what's
known as a rule-based system. These types of systems can be very complex, but
Eliza is not. The key to all the smoke and to a few of the mirrors is smart use of string replaces
and patterning. The first layer of this is simple. When Eliza gets an input, it shifts it from the first to the second person.
I's become you's, my's become yours, and so on.
This is mainly done to fix up grammar for the next steps.
The string is then scanned for keywords.
On ElizaScript, the data structure used to configure the program has a list of these magic words.
Basically, the program just tries to find one of these keywords in the input.
Weizenbaum calls keywords the most important words,
with very heavy quotations as well.
The string I am sad does get kind of shifted around into different tints,
but the bottom line is the keyword here is am or I am. A lot of keywords are verbs, which feeds into the next step.
Once a keyword is identified, Eliza looks up a decomposition rule.
This is also configured in the script.
Think of decomposition rules as a simplified means of pattern matching.
There's something like,
match all sentences that have I am followed by any number of words.
That would match I am sad,
as well as inputs like,
I am having a bad day,
or I am in need of validation for my parking ticket.
These decomposition patterns also tell Eliza
how to, well, decompose a sentence.
In the I am blah example, the sentence would be turned into two elements,
I am and then the rest of the sentence.
That's stored as a list.
Now, there is some weirdness here if you read the paper.
In programming, zero is usually used to refer to the first element of a list.
It's kind of just how memory works, it's just convention.
Eliza's decomposition patterns use numbers to identify placeholder words.
So, I am followed by one word would be encoded as I am one.
Here's the weirdness for readers.
In these decomposition rules, zero is a special number.
It means any number of words.
So I am zero would mean the phrase I am followed by anything. The paper mixes decomposition rules
that use special zeros with rules that use list zeros, so if you look at this paper,
you gotta read kind of carefully. Anyway, back to the actual point. These decomposition rules
each come along with reassembly rules. These tell ELISA how to
shuffle up your inputs to make an output. In the case of our I am blah example, that's told to be
reassembled into, did you come to me because you are blah. The word blah here is a very technical
term taken from Weissenbaum 66. The blah in reality will be some kind of list index.
This allows you to write some pretty neat scripts for Eliza.
You can shuffle up pretty complex sentences into seemingly intelligent responses.
But that's the thing of it, dear listener.
This isn't intelligent.
There are some more tricks at play that I'm skipping
over, but believe me, there isn't some secret part of the program I'm leaving out that suddenly
makes Eliza sentient. It reads a string, looks for an important word, then rearranges things
around that important word. It's not very sophisticated NLP. That said, it does show two of the three features I'm trying to wrap my head around.
First is decomposition.
Breaking down language into something a computer can process.
In ELISA, this is just pattern matching.
The result is a list of sentence parts.
This isn't necessarily words, it's more like groups of connected words.
Second is reassembly, the creation of sensical natural language phrases. In ELISA, this is just
shuffling inputs around. Some words are changed, pronouns are swapped around, and a mostly
reasonable sentence is generated. This is missing that middle piece, though.
Olayza isn't representing language in any meaningful way.
It knows keywords to look out for, but doesn't have any concept of the meaning of those words.
It doesn't understand how they interact with the rest of the sentence.
It knows that the phrase, I am, matters, but it doesn't know what it means to be.
It knows that sad has some kind of meaning, that it can be used in the sentence I am sad,
but it doesn't know what it means to be sad, or even what part of speech that word is.
Eliza doesn't even really care. That's one major limitation of this approach.
Eliza is workable while being so basic that it doesn't actually do much.
It can be configured to generate valid sentences.
That's cool, but not super useful.
It's missing any kind of generic way to model and make sense of words.
That said, we can glean something important from this.
This all comes down to implementation. First, we should look at data typing. I know.
A riveting field. ELISA uses two main data structures, lists and trees lists are very simple it's just like a real world list you have items
that come one after another in a specific order when a sentence is broken down into data eliza
makes it into a list trees are just second order lists lists where each element is its own list. I know, this is the exciting part. These are called trees
because if you draw them out, they're kind of shaped like a tree. Internally, Eliza uses trees
to represent keywords and their associated decomposition rules and reassembly rules.
Trees are cool, and they're also just fancy lists, so you really just need lists here.
The point here is that lists are so important to Eliza that they informed the language it's
written in. Weisenbaum wrote Eliza in a language with a wild name. Mad Slip. That's M-A-D dash S-L-I-P. Revel in that for a moment, if you will.
Okay, are we ready to continue? Mad Slip was a language developed by Weizenbaum specifically
for dealing with lists. Or, at least, it's almost a language. It's more of an extension to existing languages.
The name here kind of gives that away. If you're in the know, at least.
MAD stands for Michigan Algebra Decoder. It's a language that was used in IBM mainframes back
in the day. SLIP stands for Symmetric List Processor. That's the extension
part that Weisenbaum developed. It sits on top of MAD and lets you deal with linked lists. That's
the basic data structure needed for trees and lists and whatnot. From what I can tell, SLIP is
almost Weisenbaum's pass at creating his own Lisp, a more well-known
list processing language.
Now, Lisp is its own language, whereas Slip sits atop existing languages.
This slip-in approach wasn't new in 1964.
There's a weird connection here that I just have to explain.
There's a weird connection here that I just have to explain.
Lisp was developed in the late 1950s specifically for dealing with linked lists.
That data structure is the basic building block of any old-school artificial intelligence.
See my entire series on Lisp for further details.
It's a complicated story.
Anyway. for further details. It's a complicated story. Anyway, Lisp was built in part to replace this
earlier system called FLPL. Now, this is a truly cursed thing. FLPL is short for Fortran List
Processing Language. Like SLIP, it's an extension on top of an existing language, in this case Fortran.
Stock Fortran at that time didn't really provide a reasonable way to build linked lists,
which means it couldn't handle trees.
As programmers started to need trees, better tools were needed.
Fortran would be extended to FLPL, but that wasn't good enough.
Eventually Lisp would come in to save the day.
MadSlip fits into a very similar story. It was developed as a way to add better lists to existing languages. MadSlip is actually just one version of this extension.
There was also a version for Fortran and one for ALGOL. So yeah, there was a FORTRAN slip AND FLPL. This is the first part of the
puzzle. Lists are crucial to AI, so crucial that programmers were willing to develop new languages
or greatly extend existing languages just to add the correct type of lists. NLP, as a type of AI, has this same pinch-on for lists.
The second takeaway from ELISA is this. Weizenbaum had created a rule-based system.
This is one of those programming concepts that sounds more intimidating than it actually is.
A rule-based system is just a system with, say it with me, rules.
In the case of ELISA, these rules are all about text patterns. They say that if a certain pattern
is encountered, then carry out a certain operation. If you see the keyword I am, then break the input
down in a certain way, then rebuild it in a certain way. This might sound like a very small distinction.
Every language can check for a certain condition, then execute some instructions.
We call that an if statement.
It's something that you need to be a real language.
If this, then do that.
Rule-based systems, however, are built specifically for dealing with special types of ifs in smart ways.
ELISA is a good example of this specialization.
Its rules are stored in lists.
As the program churns through rules, it will build up a list of rules that apply to a given input.
Once ELISA's built up this list of possible rules,
it runs some code to figure out which rule it should apply. This is called conflict resolution.
It's when a rule-based system resolves which rule it should actually follow out of a set of possible
good rules. You could do this same thing using a normal language, but it would kind of suck to write.
Here, Weisenbaum is getting around that issue by storing Eliza's rules as internal linked lists, which are then interpreted by Mad Slip.
In that sense, Eliza's scripts are almost their own programming language.
So, if you really wanted to galaxy brain this thing, Weisenbaum ended up creating his own
language specifically for dealing with natural language. That's a trend that I want you to keep
in your head as we move forward. There are some interesting side effects to this rule-based
approach. First of all, rules can be manipulated. Lists are, believe it or not,
just a normal data type. So you put your rule in a list, well, in theory, you could rewrite those
rules on the fly, or write a program that generates its own rule sets. That's getting
us closer to AI, a program that can learn and adapt and tread into the realm of mortals.
Code for rule-based systems also looks radically different than code for normal programming languages.
In a rule-based system, there isn't any concept of line number, at least not in any real sense.
Rules are stored in ordered lists, oftentimes, but that doesn't
always matter. Just because one rule comes after another doesn't mean it's executed after the first.
Rules aren't even executed, at least not in a traditional way. Instead, rules are checked and
then acted upon. There are a lot more steps involved than just run one line after another.
My hope is this is helping to build up some scenery.
We're going to be dealing with languages that don't look or act like a normal language.
They seem totally alien to more commonplace imperative programming languages.
So how does this relate to natural
language? Well, simply put, pretty strong tools were needed to actually deal with human languages.
The middle part that Eliza was missing, the modeling and understanding piece,
has a prerequisite. For a computer to understand something as complex as human language,
it needs a way to reason. It needs a way to make
logical inferences about data. Computers don't really do inference. That's not a very computer
thing. It's not very imperative. Thus, you just have to have special tools. These tools, more often
than not, came in the form of a new type of language, the logical or declarative programming
language. A note on terms here. You will see both of these terms used to refer to languages.
In this episode, I'm going to be using logical and declarative somewhat interchangeably because,
well, they mean close to the same thing in this context. In a declarative language,
you describe, logically, what you want a computer to work out. Eliza's rules are an example of this.
You write a rule that says if you encounter the input I am dot dot dot, you decompose and
reassemble that input in a certain way. You aren't giving a step-by-step way to handle
processing data. Instead, you're declaring some new rule. You're describing what you want done
and when you want it done, but not how to determine if it should be done or how it should be
done. You're not giving every little detail. The computer works that out for you.
The difference between imperative and declarative
languages, at least in my head, come down to allocation of work. In an imperative language,
the programmer has to figure out exactly how to instruct a computer to accomplish a task.
You work up a step-by-step to-do list. In this paradigm, the programmer does almost all of the planning,
and the computer only executes that plan. You give the computer that to-do list, and it does it,
step-by-step-by-step. In the declarative paradigm, the programmer works up logical clauses.
They describe connections, relationships, and how different clauses interact with each other.
The declarative programmer is doing something vastly different than the imperative counterpart here.
The final workload is pushed more onto the computer.
The machine has to figure out how to interpret all of these rules and relations,
how to form a plan of execution and then execute that plan.
The hell sounds pretty theoretical, so let's go right into some praxis.
After all, the theory around declarative languages has always made my head spin.
I'm very much an imperative type of guy.
So allow me to introduce Planner, a logical language that might be simple enough for me
to understand.
Here's something we're going to have to deal with right off the bat.
Planner and its relatives are weird and confusing.
Planner itself was developed in the latter part of the 1960s by Carl Hewitt.
As such, I'm working off his 1969 paper titled Planner,
a language for proving theorems in robots.
So check this out. This comes by way of the introduction, my
favorite place to take the measure of a paper. Quote, the language will be explained by giving
an oversimplified picture and then attempting to correct any misapprehensions that the reader might have gathered from the rough outline. End quote.
I scream internally at this. Hewitt's plan in the paper describing his fancy language is to use a toy model, then give some gentle corrections. That's a sign that perhaps things
are going to get convoluted. Part of this confusion is the fact that Planner is
a little more than a normal programming language. This is something that isn't explicitly addressed
in the paper. It's some background that is kind of needed for a lot of these declarative languages.
Planner is a tool for solving logic problems. It's a system for running proofs. Its interface happens to come
in the form of a programming language. Or actually multiple languages, but we'll get
there. I'm going to start us off with something that's easier to deal with and then build
up from there.
Planar operates as a state machine. That is actually how all computers operate, so this shouldn't come as a huge shock.
What makes Planner different is how it exposes things.
If you haven't heard the term state machine before, then let me give you a quick rundown.
At any instant in time, you can model a computer by describing its state.
That's what's in memory, what's stored in registers, what lights are on and
off, which switches are flipped, that kind of thing. The state is a very static construct. It's
a snapshot in time. If nothing happens, the computer will just hold at that state forever.
Operations can occur to change that state. Writing to some location in memory is a state change
because we can consider memory part of the machine's state. So changing memory is changing
that state. The code that actually changes memory isn't necessarily part of this model,
it's something that exists kind of outside of the overall machine state. So when I say state machine, I mean we're
dealing with a pile of numbers that are operated on over time. When you get down to it, all computers
are state machines of some kind or another. Programming languages give us ways to change
states, to operate on that machine. In that sense, a language is an abstraction on top of a basic state machine.
In an imperative language like C or Fortran, that abstraction doesn't take us too far away
from the machine. You still operate in steps. If you write x equals 1, you're just telling the
compiler that you want to name some location in memory x and set its value to 1.
There's a bit of bookkeeping done here, but at its core, this is just a small change in the machine's state.
In imperative languages, you build up programs from hundreds or thousands or millions of these small tweaks to the machine's state.
Everything is done in little explicit steps.
Planner, on the other hand, doesn't always work that way. The language is much more implicit.
In other words, you aren't specifically instructing the computer to make a lot of small changes.
You can initially set up some state. What's kind of neat here is that the overall state of the machine can be saved as its own variable.
Hewitt calls this a database, which, fair enough, it kind of is.
You start off a planner program by building out this initial state.
That part is roughly similar to how an implicit language does things.
But here's where we deviate from the expected path.
language does things. But here's where we deviate from the expected path. The program itself is composed of a set of logical clauses and theorems. You can also look at these as logical rules. So,
as just a dumb example, you could write a program that says x is a square, as its state setup,
is a square as its state setup. Then has the rule, all squares are rectangles. Now, of course,
an actual program would be more complex. It would be more useful. It would have all kinds of symbolic syntax, but I'm going to gloss over that. I assure you, planner syntax is complex enough that speaking
it aloud may summon something kind of nasty. Okay, so once your
program is loaded up, you can get down to the real point of Planner. That's asking it questions.
And I'm not even joking here. The workflow for Planner is a lot different than a normal language.
It isn't executing one line after another. I know I keep saying that. So here's what that practically means.
Once your program is loaded, it's actually inert.
It's just a set of rules and a state waiting to be acted upon.
Execution starts when you ask Planner to prove something,
or ask it a question about the state or rules of the program that's loaded.
So, for instance, we could tell
Planner that the goal of this run is to attempt to prove that X is a rectangle. You literally would
write something out like, goal is a X rectangle. Planner is Lisp adjacent, you see, so everything's
kind of switched around and backwards. With that, the wheels start turning.
Planner tries to answer the question by working through its rules and its state.
As it does so, the programmer kind of loses control over the machine. Planner starts updating
its internal state as it works. It will also, quote-unquote, discover new rules as it runs.
discover new rules as it runs. This will actually lead to new rules being deduced from existing ones.
In this way, Planner is designed to follow the usual methods used during proofs. Let's say I started off with another rule that states no triangle is a rectangle. As Planner was running,
that states no triangle is a rectangle.
As Planner was running, it might deduce that,
given its existing rule set, no triangle is a square.
The deduction and proof part here meets up with an interesting bit of history.
Now, for this, we're taking a tangent,
which I hope will be informative.
Hewitt makes a citation to this 1959 paper titled
Report on a General Problem-Solving Program. That paper was written at RAND. Now, if you aren't
really deep into Lisp lore, then that title and that name won't ring any bells, so let me weave
you a small tale. Lisp was developed as a modern language for linked list data,
which we've established is a core feature for artificial intelligence programs.
Now, Lisp didn't emerge from a vacuum. It was, in part, a response to existing languages.
These earlier list languages kind of sucked. FLPL was one of these, the horrifying Fortran
list processing language. Another was IPL, the information processing language,
which was developed at RAND in 1956. Now, I can say straight up that IPL is evil. At least,
it appears that way looking back at it. And I know I mentioned
how I'm trying not to be mad at languages and just call them bad, but IPL's a weird transitional
language that doesn't really have a purpose once there's a better language. IPL is just really
primitive. It looks and feels almost like assembly language. It has all these weird implicit rules around variables.
It's just kind of gross.
IPL technically has many of the features of Lisp, but Lisp is just a far better constructed
language.
That said, IPL was ground zero for some of the earliest AI research.
One of the first artificially intelligent programs,
Logic Theorist, was written in IPL in 1956. Logic Theorist was able to generate novel proofs
to logic problems. That is really cool, but this first pass was also very limited.
Logic Theorist was only able to use a handful of very simple methods,
and it was only ever used for a limited number of proofs.
Come 1957, the team at RAND created a new program,
the General Problem Solver.
This was a much more capable program,
and it's what Hewitt cites in his Planner paper.
more capable program, and it's what Hewitt cites in his Planner paper. This puts Planner onto an interesting branch of the programming family tree. I think I've kind of hit this point where
I just need to make my own genealogical trees for programming languages. Anyway, here's the hot take.
Lisp people, you can get mad at me if you want. Planner is a close cousin to Lisp, but in a convoluted way.
Lisp and Planner were both influenced by IPL.
Planner was also influenced by Lisp, or at least an early version of Lisp.
Hewitt's Planner article also cites the Lisp 1.5 Programmer's Manual.
This version of Lisp is a little strange, enough so that Lisp 1.5 should probably be
considered a different language.
The long and the short of it is that 1.5 uses both S and M expressions, whereas later versions
of Lisp only use S expressions.
You can ignore that if you want. The point is
that 1.5 doesn't look or act the same as Lisp circa, say, 1960. To simplify all of this,
Planner was developed a little bit after Lisp was created. It took inspiration from the same
original sources that Lisp did. It has goals similar to Lisp. The timeline even worked out such
that Hewitt could draw inspiration from early versions of Lisp. Thus, we get an almost alien
looking language. As someone who can read Lisp, Planner falls into this uncanny valley. Hewitt
compares Planner to Lisp quite a bit, but there are
significant differences in syntax. Plus, these comparisons are to really early versions of Lisp.
The result is that Planner feels almost alien. I think that's the best way I can describe it.
Like I said earlier, I don't really want to get into the syntax of Planner.
But I will give you a taste.
You know, as a treat.
Allow me to introduce you to Matchless.
The second language used in Planner.
That's right!
Planner!
The whole time has been two languages in one!
It's... It's a little mind-boggling, honestly. Planner itself is the
rules and regulations side, while Matchless is a pattern-matching language. If you're familiar with
modern-ish programming, then I can give this to you in one sentence. Everyone else, close your ears for
a second. All right, is it just us? Matchless is a really bad, really primitive regular expression
engine. It's basically very, very early grok. All right, we can let everyone back in. The rest of you can start listening now.
You will get the long story.
In Matchless, a programmer writes out a pattern that can then be used to decompose text.
Technically, Matchless operates on lists,
but there's a trivial way to map between raw text and lists.
I mean, you just break everything up by the
space. So for my purposes, this is text. Conceptually, MatchList is similar to what we saw
with Eliza, just more complex. A MatchList program starts with a list of variables. These variables
will be used to hold the decomposed text once everything's
said and done. This is pretty nifty because instead of just getting a list of text chunks,
you end up with nicely named variables. Very classy stuff. Matchless is strictly typed,
so each variable has an associated type. That's not too weird for this type of system.
It's actually kind of cushy. MatchList is smart enough to take variable typing into consideration
when performing decomposition. For instance, the word for list in MatchList is segment. If you say
that one of your variables is a segment, then MatchList will only fill it
with multiple chunks of text. Basically, if you ask for a list, you get a list. So far,
this sounds reasonable, and it really is. There are very modern text processing tools that work
in the same way as MatchList At least, conceptually. The syntax,
well, that's a totally different story. The main function used in Matchless is called
assign? That's with a question mark at the end. So, we have a function name, a built-in function name that ends in a question mark.
I shouldn't have to say that's weird, but that's weird.
There's no reason it can't be legal, though, so I guess we're fine.
The syntax for constructing patterns relies on prefixing variable names with operators.
That's, uh, not super standard.
It's more weird than anything, though.
So, as a further example,
the most basic operator is dollar sign underscore,
which just means a sign.
So, if I have a program that has one variable that's a segment called x, and the
pattern is $x, then I'm just looking for x to be a set of anything. Just anything turned into a list.
Fine. Weird, but fine. We also get the beautiful and mind-numbing dollar sign dollar sign. This operation says to match the
value of a variable. For example, let's say I have a program with the variable x, which is an atom,
aka a single value. Dollar sign underscore x, dollar sign dollar signx would match two repeating values.
So it would match 1, 1 or 2, 2, but not 1, 2.
It gets a little bit strange here.
So if you ran that program with the input 1, 1, the value of x at the end would be 1,
which I get is just a little confusing. And if this is sounding like a mess, then you're starting to get it. Matchless is strange. And that
strangeness is indicative of the larger strangeness of Planner. This isn't even getting into more
complex operators that can be used to construct
patterns, or the fact that in this version of Planner, there are curly braces and parentheses
that are mixed. It doesn't feel good to look at. The upshot here is that Matchless provides
a very robust way to break up data. This is the first part of the NLP outline.
Decomposition, parsing, whatever you want to call it. We have a nice pipeline that's growing here.
Matchless can break down text into some model that a computer can work with. It needs a little
help to do the whole text-to-list thing, but we can ignore that. Planner itself can
then be used to build out logical models. These models can be manipulated and prompted by incoming
data. In other words, you can use Planner to develop a natural language processing system.
The system is almost tailor-made for that purpose. So where do we go from here? Well,
if I wanted to take a short route, we'd pick up the Prolog thread. The early history of Prolog
runs in parallel to Hewitt's work on Planner. Prolog isn't initially influenced by Planner,
the crossover comes a little later into the development story of Prolog. That said,
the crossover comes a little later into the development story of Prologue.
That said, Planner does give us an example of how a language can be specialized for NLP.
But, uh, I'm not going to do that.
This is Advent of Computing, after all.
I tend to favor these meandering paths.
Instead, I want to shore up our understanding of natural language processing.
Prologue proper can wait until next episode.
Prologue is all about creating a better language for this type of niche.
So the question is, what were people actually doing in this niche prior to prologue?
What kind of natural language processing was going on in this era?
Luckily, an example is close at hand, and better still, this example uses Planner.
One of the things that I really like about producing Advent of Computing is that it forces me to expand my horizons, so to speak.
I'll be the first to admit that I'm not really a computer science guy. I'm a software developer, which is actually pretty different from a scientist. It's like theory
versus practice, idea versus application. I learn best and I work best when I'm actually,
you know, working. I will learn by doing sort of dude. Running this podcast hasn't made me a computer scientist,
but it has forced me out of my comfort zone pretty regularly. I spend a lot more time reading and
dealing with theory than I ever imagined I would. It's given me a newfound respect and understanding
of computer science. I say this because it's time to return to my comfort zone. It's time to get into my
fancy pillow fort, put on my weighted blanket, and look at some actual software. So far this
episode has been pretty heavy on theory. I want to close out by bringing us back to the ground,
by reducing what we've learned to actual practice. It's time for a case study. The subject is...
I don't know how to say this. Shurdlu. Let me just say, I hate this name. Apparently,
it's a reference to some in-joke about old printing presses that, in my mind,
printing presses, that, in my mind, somehow makes it worse. It's spelled S-H-R-D-L-U,
and I'm going to try and pronounce it Sherdloo, since that's the only reasonable way I can think to deal with this. I think a flub should be just as recognizable as actual good pronunciation anyway, so we're going to be on the same page
here.
All right.
Shirtlew is a native language system developed at MIT by Terry Winograd.
It was written in MIT's very storied AI lab, which means we're looking at a very well-connected
program.
means we're looking at a very well-connected program.
This is the same lab that Lisp itself came out of,
as well as the incompatible timesharing system.
Shirtlew was actually written primarily in Lisp and ran on ITS.
The program is interesting for a few reasons.
While being an NLP, it's actually somewhat simple,
at least in comparison to some larger systems.
This is thanks to some smart scope restrictions.
AssuredLoo wasn't meant to be some all-knowing, all-purpose digital intelligence.
Instead, Winograd focused on just one thing, a room full of blocks.
For this to make sense, I need to pull in a little bit of context from Winograd's 1971 thesis, Procedures as a Representation for Data in
Computer Programs for Understanding Natural Language. Another nice, easy-to-stumble-upon title.
As Winograd explains, a natural language system can only work if a computer actually
understands what's going on. This is very in line with the era's approach to AI.
The core idea is that artificial intelligence must be built as a model of human intelligence.
The program should be written to mimic how humans work and think, at least on a superficial level.
When we apply this train of thought to language, we reach an interesting impasse.
You cannot simply parse natural language.
You can't actually have a decomposition stage that's entirely separate from deeper digital machinations.
To illustrate this fact, Winograd pulls from earlier attempts at computerized translations
of language. To quote,
When a human reader sees a sentence, he uses knowledge to understand it. This includes not
only grammar, but also his knowledge about words, the context of the sentence, and most important, his knowledge about the subject matter.
A computer program supplied with only a grammar for manipulating the syntax of a language
could not produce a translation of reasonable quality.
End quote.
I think that this is, frankly, really cool.
It's one of the points where things started to fall into place for me.
When you read a sentence, you aren't just decomposing it.
You don't think, okay, well, I'll split this by spaces into separate tokenized strings.
Then I'll look up the part of speech for each word, check the syntax to make sure it's a valid sentence.
speech for each word, check the syntax to make sure it's a valid sentence.
That's a bit of an unrealistic and robotic way of understanding language.
A reader will instead pull upon past experience to understand language.
In this way, we can deal with complex, confusing, and subtle words. We can deal with things like jokes or puns or pronouns.
That last one is an example that Winograd comes back to a couple of times in his paper.
We tend to use pronouns pretty liberally, at least in English. It's not always syntactically or grammatically clear what pronouns are referencing, but we just
kind of know. That knowing is really thanks to our experience with the language. We break down
a sentence using internally consistent logic and experience. This means that ELIZA isn't just
missing a step. We couldn't add a middle layer to ELISA and just get an intelligent program.
Rather, it would need to be restructured from the ground up.
Language parsing relies on modeling and reasoning.
In Shurdlu, that reasoning is automated with Planner.
So, here's the rub.
To understand language at all, you have to have some kind of logic between your ears.
You have to know how things function in the real world.
This is needed not just for breaking down language or for things like pronoun references,
but even just to detect nonsense sentences.
Something can be syntactically correct, but mean nothing.
sentences. Something can be syntactically correct, but mean nothing. I could walk up to you on the street and say, hello, I would like to verb adjective noun, and that's a valid sentence,
but it means nothing. This means that any NLP system will have to have some baseline
understanding of the world. They'll have to know that you can't just verb adjective a noun.
of the world. They'll have to know that you can't just verb-adjective a noun.
But the world is a big place. There's no way you could expect to teach a computer everything.
Winograd solves this problem by restricting the computer's world.
He built up a toy model, as we've seen before.
Sherlu only understands a very simplistic realm. Its whole world is a room full of blocks.
I don't mean this as some kind of data structure.
I mean physical blocks.
It has different colored cubes and pyramids and rectangles, a box, and a few other sundry shapes.
All have their own dimensions.
All have rules about how they can
stack and be manipulated. The manipulation is accomplished with a claw, which also has a set
of rules governing its movements. Initially, Sherlu loads up a planner program that defines
this limited world. This toy world, if you will. The simplified world drastically simplified
Winograd's project. Instead of trying to make an all-knowing computer program, he just had to make
a program that knew the logic of playing with blocks. It's smart enough to know that a cube
can't be stacked on top of a pyramid, or that one box is taller than another, but it couldn't
tell you how boxes are made or even what they're made out of. This type of scope restriction is
crucial when tackling huge projects. Without it, Winograd wouldn't really have gotten very far at
all. This brings us to an interesting question. How was knowledge represented in this system?
We have an answer very close at hand, and it's a really neat one.
To quote again,
We represent knowledge in the form of procedures rather than tables of rules or lists of patterns.
By developing special procedural languages for grammar, semantics, and deductive logic,
we gain the flexibility and power of programs
while retaining the regularity and understandability of simpler rule forms.
This is the second place where things kind of clicked for me. Let me try and break this down
and explain why I think it's so cool. Shardloo is made up of a handful of different
programming languages. Lisp is used to orchestrate everything. Parsing is done using this language
called ProGrammar, which I'm not going to dig into. Just know it's a Lisp-looking language
that's used for handling text. Deduction is handled using Planner.
As we already saw with Eliza, data-based rules are kind of clunky.
They work, don't get me wrong.
Eliza was able to get by with simple patterns and rules that were stored as lists.
That said, they aren't dynamic.
That's a problem because natural language is a very dynamic thing.
Actual human intelligence is similarly fluid and dynamic.
Winograd argues that this list approach is way too simplistic.
Instead, he went for programmatic rules.
Let's stick to the deduction side, since that's what we know
best at this point. Sherdlou starts off with an initial set of planner rules. These are the
block world rules that I already mentioned. In addition to rules, Sherdlou also contains
state information. Winograd explains this as a set of relations between objects and
properties of objects. For example, block number one is red would be a property of an object.
Block number one is on top of block number two would be a relation between objects. These all
form a state in Planner. So technically, yes, there is a layer where the state is defined in a pretty static form.
That said, it's actually built up from functions, from little programs.
That's how the state gets modified.
The idea here is that instead of saying something like block1.color equals red,
you'd say red block1. Of course, this is in the Lisp calling
convention. This just means that you're calling a function called red on the block1 object.
For a simple example, this is only a subtle difference. Ultimately, the red function is just setting the block's color, right? Well,
the answer is probably yes, but there is some wiggle room here. Remember, smart code is all
about this wiggle room, this play in the system. Since Winograd is using functions to set everything up, ShardLoo can call up logic. It can call up deductive code at any time.
I'll give you a better example than changing color. Let's say I want to place a pyramid on
top of a block. I'd end up writing something like stack pyramid1 block1. That's just calling
this planner function called stack that looks at pyramid one and block
one and says, okay, I'll update my state so the pyramid one is on top of the block one.
Now, let's flip it. I'll tell Shurdloo that I want to stack a block on top of a pyramid.
That'll be stack block one pyramid one, or something to that effect. Assured Lou won't comply with my
instruction. It would spit out something like, I can't. The stack function calls up deductive code
while it's executed. It doesn't just set the state. It checks to see that the request follows
all known rules.
In this case, there's a planner rule somewhere that says you can't stack anything on top of a pyramid.
The thing's just too pointy.
By encoding data, by encoding knowledge as procedures, it's easier to add in this type of intelligence.
The same could be done without this procedural approach,
but it would take a lot more code,
and it would take really repetitive code, too.
To put this all together, let's move closer to the user side.
A user doesn't actually type in this fancy syntax to stack blocks.
That would kind of defeat the whole purpose of natural language. It would ruin the exercise.
You converse with Shurdlu over a teletype as if you were talking to another person.
That language gets broken down using a combination of programmer and lisp.
Then that's rendered into Planner.
This is the full circle for me.
There is some back and forth during language parsing,
but eventually an input like, please, would you stack the first pyramid on top of the first block
gets turned into the code stack pyramid1 block1. In order to function, Shardlue has to produce
code. It's a program that creates a new program.
There are a few words for a system like this. The earliest term was generative programming.
Grace Hopper called these types of programs compilers, a program that takes code as input
and outputs a different type of code. However, there's a twist. Well, almost a twist. Technically, everything here is Lisp.
Programmer is an extension on top of Lisp. Lisp is, well, I think you can guess. And Planner,
that's the odd one out, but only barely. Winograd was using a version of Planner called MicroPlanner. It's a subset
of the larger language spec that was implemented at MIT's AI lab. Here's the thing of it.
MicroPlanner was implemented in Lisp. Its syntax is, for all purposes, Lisp.
Some parts of Planner's early syntax were more wild and
those have been dropped or coerced a little bit. This means that Shurdloo is technically a Lisp
program that on the fly writes Lisp. This is a task that Lisp was purposely built for. It's able to deal with programs as if it was data and vice
versa. It's also a very special case of generative programming. The $5 word here is homoiconic. It's
a language that can represent and understand itself. This is a final feature that will prove
key to the future of AI and NLP.
Alright, that brings us to the end of this episode.
I want to keep things brief since, well, this isn't actually a conclusion.
This is just a brief intermission.
I hope that at the end here we have some kind of understanding of
natural language processing. I certainly have had my own breakthroughs while preparing this episode.
This is going to be pretty crucial moving forward. Proper NLP requires truly sophisticated software
and sophisticated tools. A toy program like Eliza, while impressive, just won't cut it. You need ways to break apart
language, ways to reason, and ways to deal with context. That all requires some level of
artificial intelligence. Next episode starts with Prologue Proper. We'll be diving headlong
into the early development and emergence of this strange and unique language.
After all this Lisp, I think the main question on everyone's mind should be,
how can another language swoop in? Is Prologue simply more specialized for NLP than Lisp,
or is there some other force at play here?
Thanks for listening to Advent of Computing. I'll be back in two weeks' time with another piece of computing's past.
And hey, if you like the show, there are now a few ways you can support it.
If you know someone else who would like the show, then take a minute to share it with them.
You can also rate and review the podcast on Apple Podcasts and Spotify now.
If you want to be a superfan, you can support the show directly through Advent of Computing merch or signing up as a patron on Patreon.
Patrons get early access to episodes, polls for the direction of the show, and bonus content.
You can find links to everything on my website, adventofcomputing.com.
If you have any comments or suggestions for a future episode, then go ahead and shoot
me a tweet.
I'm at Advent of Comp on Twitter.
And as always, have a great rest of your day.