Advent of Computing - Episode 112 - Prolog, Part I

Starting point is 00:00:00 One of the most enduring digital struggles is the user interface. How should a human use a computer? How should data be presented? How should inputs be given? These are all huge questions that, frankly, don't have a right answer. The simple fact is that computers are alien things. They're these weird number crunchers that have a mind of their own. When I was younger and just getting into programming, I remember reading that computers

Starting point is 00:00:30 are machines that simulate mathematics. Now, I can't remember where I read that, and I can't find it now. If you know where that's from, then please hit me up. I'd love to have a citation for that phrase. That definition has stuck with me, the idea that computers simulate mathematics, specifically because it sounds kind of alien. When I first read that, I stopped and I tried to roll it around in my head. It just didn't make a whole lot of sense to me. It's taken a while for it to have full effect.

Starting point is 00:01:04 didn't make a whole lot of sense to me. It's taken a while for it to have full effect. Computers, on a very basic level, are simulating mathematics. Now, that doesn't mean they have little chalkboards and textbooks. Rather, that a computer can't really do traditional math. It pretends to do math by flipping automatic switches, shunting around current, and making binary decisions. The math done by a computer can, in some cases, have no relation to the pen and paper mathematics that we humans do. These are alien machines, plain and simple. So, back to the point. How are we expected to use these things? Should we bend our brains to fit into these weird digital molds? Should we have big flashy interfaces with custom-built pointing

Starting point is 00:01:53 devices? Or should we bend computers to fit into the mold of the human mind itself? I've looked at a lot of different approaches to this problem. That ranges from hypertext to text-based interfaces to graphical systems. One that I haven't really explored is natural language. That's a wild third path. The idea is that with careful programming, a computer can be made to act more like a human. If done correctly, you could interface with a computer just as you interface with a colleague. All by simple voice, by a natural human language. As with all solutions, this comes with its own set of challenges, and its own set of very unique tools.

Starting point is 00:02:55 unique tools. Welcome back to Advent of Computing. I'm your host, Sean Haas, and this is episode 112, Prologue, part one. We're back to another programming language episode, but this isn't just any language. I've received actually a whole lot of requests to cover Prologue, so I'm finally taking the dive. If you've emailed me or messaged me about this, this one's for you. This is going to be a pretty deep dive for me to take, though. Prologue isn't really like any other language. It's a declarative language. I'll get into the details as the episode drags on, but just know that this makes Prolog different than almost every other language out there. There just aren't a lot of declarative languages in common use today. On its own, that makes

Starting point is 00:03:37 discussing Prolog a little complicated. If possible, I try to at least understand how to use a language when I'm discussing it. And I'm not saying I'm an expert on every language I've ever covered on this show. I'm barely an expert on any programming language. Rather, that I just want to be able to read some code. I want to be able to have a feel for it. I think taking that approach as a type of initiate gives me some insight into how difficult or easy a language is to use. It gives me a pretty good feel for the tongue.

Starting point is 00:04:12 With Prolog, however, that's a bit of a tough task. At least, it's been proving difficult for me. Prolog just looks and acts differently from languages I'm more familiar with. Prologue just looks and acts differently from languages I'm more familiar with. There is another impediment that we run into, or rather, that I run into. Prologue was developed for use in natural language processing. Now, that's a branch of artificial intelligence that deals with understanding human language. Right there we have two things that are pretty opaque to me.

Starting point is 00:04:50 AI and linguistics. This puts us in a fun position where Prologue's background needs some background of its own to make any sense. What I'm getting to is that we have a series on our hands. In fact, I'm calling it right now. I'm going to need some time to come to grips with natural language systems. The reason here is pretty simple, and it comes back to the Hopperian view of languages. I'm sure you're all sick of me saying this. Grace Hopper posited that there can't be one universal programming language. Instead, she argued, there need to be many languages designed for specific niches and applications. I think this

Starting point is 00:05:33 is an important lens to use when discussing programming languages. It helps prevent me from instantly recoiling from something I don't understand, or something that just looks icky. I don't understand, or something that just looks icky. It's kind of a way to suppress my, oh god, oh god, oh no, what is this, please help kind of response. If I see a language and instantly gag, well, that probably doesn't mean it's a bad language. More likely, it means that language is for some niche that I don't understand, or the language is just trying to do something that I'm not familiar with. Each language is tailored in some way to solve a problem. In the case of Prolog, that problem is handling human language, a field which your intrepid host knows next to nothing about. So here we go. We're going to start from zero and

Starting point is 00:06:23 work our way up to why Prolog is so distinct from other languages and what kind of special features it can offer. This may take a bit, but I want to get this right. This episode will be working up to the start of Prolog itself. We'll be discussing natural language systems, how they were programmed in more traditional languages, and the challenges of that approach. That should give us a foundation to start picking at the history of Prologue itself. Along the way, I'm going to try and pull my favorite magic trick. I want to go from recoiling at the mere sight of unusual syntax to being able to read it. At least, maybe read a little bit of it. We'll see how that one goes. The main source and jumping off point for me is a paper presented at the ACM's Second History

Starting point is 00:07:15 of Programming Languages Conference. That paper is titled The Birth of Prologue, and it was written by Alain Colmiguerre and Philippe Rosel. I am so sorry, there are a lot of French names and I'm going to do my best. Now, these were two of the original developers of the language, so we get really good inside detail on prologue. It's not from the era of prologue, which is the 1970s, but it's close enough that it's basically primary. There's enough space even that there's some retrospection going on. This is where I started my research, and it's where I realized that I was in for a bit of a rough time. The paper is very, well, technical. I can usually deal with technical programming stuff. It is my field, after all, but this one threw me for a loop.

Starting point is 00:08:05 There are just a lot of concepts here that I'm not used to. So, to kick things off, I'm going to take you down my meandering journey of understanding. First of all, we need to discuss natural language processing, or NLP. The short story is that NLP is a field concerned with getting computers to understand human language. Those are some really heavy scare quotes around understand. I don't necessarily mean a thinking machine that speaks English with all its subtleties, but think more on the side of a way for natural human languages to be processed, broken down, and used by a computer. There were two key uses for this technology that were being investigated in the 1960s and 70s. The first of these is automatic translation between languages.

Starting point is 00:08:59 In this instance, the NLP program is loaded up with some type of model that describes each language. Then you pass in the input text, the computer breaks it down somehow, does some magic, then it outputs the same text in another language, another human language. The second application is the human-computer interface. This one is a little more subtle than straight-up translating text. The idea here is to create some type of conversational computer interface. Think of the computer on the bridge of the Starship Enterprise. That machine isn't always given concrete instructions.

Starting point is 00:09:42 You could ask it to bring phasers online, but you could also ask it to work out a problem for you. There's some wiggle room that allows for more naturalistic dialogue between operator and machine. For that to work, the computer has to be able to understand human language well enough to both break down inputs and construct outputs. And, you know, to do something with those inputs. So this is a bit of a two-pronged problem. NLP software needs to be able to break down natural language and produce natural language. There's a third unseen part that ties both of these sides together. Somehow the program needs to model and understand language.

Starting point is 00:10:21 It needs to have some internal way of making sense of what it's given. You don't just want a computer to parrot back what you say. That actually doesn't require much fancy programming at all. I can give you a one-liner that does that. A good NLP system will actually do something useful or interesting. This is all pretty vague, so let me hit you with an example. I'm going to start us off with something simple, some light NLP to wet the palette, then work up to more rigorous examples of the art. Our first stop on this whirlwind tour shall be Eliza. This program was written by Joseph Weisenbaum sometime around 1964 at MIT. An interesting side note is technically this was paid for by the US government, since Weisenbaum was working under Project Mac.

Starting point is 00:11:14 Anyway, Eliza presents a user with a simulated conversation. It's technically configurable to do all sorts of conversation styles, but it's most well known for a script that simulates a psychiatrist or a psychotherapist, depending on what nomenclature you want to use. I'm going to borrow a word from the world of physics here. I'd call Eliza a toy model. This means a model that's superficially correct and is backed up by logic, This means a model that's superficially correct and is backed up by logic, but isn't as rigorous as an actual model. Toy models are used by a lot of physicists to get to a qualitative understanding of something. It's a way to give a concise explanation of a phenomenon without getting too far into the details.

Starting point is 00:12:02 One example that I'm guilty of using on the show is the water analogy of electronics. In this model, current is described as the flow of water. A wire is a pipe, a resistor is a crimp in that pipe, and so on. This gives a very limited explanation of how electronics work. It can even be used to conduct simple experiments, but it's not entirely accurate. It breaks down on the edges. It's just a qualitative model that's kind of fun to mess around with, and it may lead you to ask more important questions. ELISA firmly fits into this mold. The program is developed as a simplified NLP system. As you converse with Eliza, it spits back somewhat reasonable responses. It seems a little stilted at times, a little strange, but overall it feels intelligent.

Starting point is 00:12:57 As Weizenbaum explains, however, that's not the case. You see, it's all a trick, all smoke and mirrors. By 1966, he had already tricked a good number of users into thinking Eliza was in fact intelligent. This wasn't on purpose though, Weisenbaum was very upfront with his fakery. But that wasn't enough. He recounts users claiming their terminals could feel emotions or could actually talk back to them. Weizenbaum opens up one paper discussing ELISA with this track, which, I gotta say, is very generally applicable. Quote, It is said that to explain is to explain away.

Starting point is 00:13:40 This maxim is nowhere near so well fulfilled as it is in the area of computer programming, especially in what is called heuristic programming and artificial intelligence. For in those realms, machines are made to behave in wondrous ways, often sufficient to dazzle even the most experienced observer. But once a particular program is unmasked, once its inner workings are explained in language sufficiently plain to induce understanding, its magic crumbles away. It stands revealed as a mere collection of procedures, each quite comprehensible.

Starting point is 00:14:16 End quote. Once a human understands a trick, then that trick kind of loses its shine. The seemingly intelligent terminal is reduced to really just a cool program. ELISA isn't a program that actually understands or models natural language. It's missing a key piece of the puzzle. ELISA doesn't internally model language in some generic or abstract way. It doesn't have any deeper machinations. It's nowhere near that complex.

Starting point is 00:14:42 It doesn't have any deeper machinations. It's nowhere near that complex. Rather, it simply decomposes and recomposes words and sentences. As soon as you load up Eliza, the first puffs of smoke start to appear. It prints, Hello, I am Eliza. I'll be your therapist today. That's how the trick begins. Weissenbaum explains that his first script for Eliza was in the mold of a psychotherapist. This gave him a little bit of wiggle room for weird responses.

Starting point is 00:15:12 Now, let's say you start talking to the compu-therapist. You type, I am sad, and Eliza responds with something like, did you come to me because you are sad? Now, ELISA doesn't know what it means to be sad. It doesn't know that it's actually being used for faux therapy. Rather, it's looking for keywords in its inputs than generating a result that follows a set of rules. It's what's known as a rule-based system. These types of systems can be very complex, but Eliza is not. The key to all the smoke and to a few of the mirrors is smart use of string replaces and patterning. The first layer of this is simple. When Eliza gets an input, it shifts it from the first to the second person. I's become you's, my's become yours, and so on.

Starting point is 00:16:12 This is mainly done to fix up grammar for the next steps. The string is then scanned for keywords. On ElizaScript, the data structure used to configure the program has a list of these magic words. Basically, the program just tries to find one of these keywords in the input. Weizenbaum calls keywords the most important words, with very heavy quotations as well. The string I am sad does get kind of shifted around into different tints, but the bottom line is the keyword here is am or I am. A lot of keywords are verbs, which feeds into the next step.

Starting point is 00:16:51 Once a keyword is identified, Eliza looks up a decomposition rule. This is also configured in the script. Think of decomposition rules as a simplified means of pattern matching. There's something like, match all sentences that have I am followed by any number of words. That would match I am sad, as well as inputs like, I am having a bad day,

Starting point is 00:17:15 or I am in need of validation for my parking ticket. These decomposition patterns also tell Eliza how to, well, decompose a sentence. In the I am blah example, the sentence would be turned into two elements, I am and then the rest of the sentence. That's stored as a list. Now, there is some weirdness here if you read the paper. In programming, zero is usually used to refer to the first element of a list.

Starting point is 00:17:50 It's kind of just how memory works, it's just convention. Eliza's decomposition patterns use numbers to identify placeholder words. So, I am followed by one word would be encoded as I am one. Here's the weirdness for readers. In these decomposition rules, zero is a special number. It means any number of words. So I am zero would mean the phrase I am followed by anything. The paper mixes decomposition rules that use special zeros with rules that use list zeros, so if you look at this paper,

Starting point is 00:18:35 you gotta read kind of carefully. Anyway, back to the actual point. These decomposition rules each come along with reassembly rules. These tell ELISA how to shuffle up your inputs to make an output. In the case of our I am blah example, that's told to be reassembled into, did you come to me because you are blah. The word blah here is a very technical term taken from Weissenbaum 66. The blah in reality will be some kind of list index. This allows you to write some pretty neat scripts for Eliza. You can shuffle up pretty complex sentences into seemingly intelligent responses. But that's the thing of it, dear listener.

Starting point is 00:19:21 This isn't intelligent. There are some more tricks at play that I'm skipping over, but believe me, there isn't some secret part of the program I'm leaving out that suddenly makes Eliza sentient. It reads a string, looks for an important word, then rearranges things around that important word. It's not very sophisticated NLP. That said, it does show two of the three features I'm trying to wrap my head around. First is decomposition. Breaking down language into something a computer can process. In ELISA, this is just pattern matching.

Starting point is 00:19:58 The result is a list of sentence parts. This isn't necessarily words, it's more like groups of connected words. Second is reassembly, the creation of sensical natural language phrases. In ELISA, this is just shuffling inputs around. Some words are changed, pronouns are swapped around, and a mostly reasonable sentence is generated. This is missing that middle piece, though. Olayza isn't representing language in any meaningful way. It knows keywords to look out for, but doesn't have any concept of the meaning of those words. It doesn't understand how they interact with the rest of the sentence.

Starting point is 00:20:41 It knows that the phrase, I am, matters, but it doesn't know what it means to be. It knows that sad has some kind of meaning, that it can be used in the sentence I am sad, but it doesn't know what it means to be sad, or even what part of speech that word is. Eliza doesn't even really care. That's one major limitation of this approach. Eliza is workable while being so basic that it doesn't actually do much. It can be configured to generate valid sentences. That's cool, but not super useful. It's missing any kind of generic way to model and make sense of words.

Starting point is 00:21:23 That said, we can glean something important from this. This all comes down to implementation. First, we should look at data typing. I know. A riveting field. ELISA uses two main data structures, lists and trees lists are very simple it's just like a real world list you have items that come one after another in a specific order when a sentence is broken down into data eliza makes it into a list trees are just second order lists lists where each element is its own list. I know, this is the exciting part. These are called trees because if you draw them out, they're kind of shaped like a tree. Internally, Eliza uses trees to represent keywords and their associated decomposition rules and reassembly rules. Trees are cool, and they're also just fancy lists, so you really just need lists here.

Starting point is 00:22:28 The point here is that lists are so important to Eliza that they informed the language it's written in. Weisenbaum wrote Eliza in a language with a wild name. Mad Slip. That's M-A-D dash S-L-I-P. Revel in that for a moment, if you will. Okay, are we ready to continue? Mad Slip was a language developed by Weizenbaum specifically for dealing with lists. Or, at least, it's almost a language. It's more of an extension to existing languages. The name here kind of gives that away. If you're in the know, at least. MAD stands for Michigan Algebra Decoder. It's a language that was used in IBM mainframes back in the day. SLIP stands for Symmetric List Processor. That's the extension part that Weisenbaum developed. It sits on top of MAD and lets you deal with linked lists. That's

Starting point is 00:23:34 the basic data structure needed for trees and lists and whatnot. From what I can tell, SLIP is almost Weisenbaum's pass at creating his own Lisp, a more well-known list processing language. Now, Lisp is its own language, whereas Slip sits atop existing languages. This slip-in approach wasn't new in 1964. There's a weird connection here that I just have to explain. There's a weird connection here that I just have to explain. Lisp was developed in the late 1950s specifically for dealing with linked lists.

Starting point is 00:24:16 That data structure is the basic building block of any old-school artificial intelligence. See my entire series on Lisp for further details. It's a complicated story. Anyway. for further details. It's a complicated story. Anyway, Lisp was built in part to replace this earlier system called FLPL. Now, this is a truly cursed thing. FLPL is short for Fortran List Processing Language. Like SLIP, it's an extension on top of an existing language, in this case Fortran. Stock Fortran at that time didn't really provide a reasonable way to build linked lists, which means it couldn't handle trees.

Starting point is 00:24:58 As programmers started to need trees, better tools were needed. Fortran would be extended to FLPL, but that wasn't good enough. Eventually Lisp would come in to save the day. MadSlip fits into a very similar story. It was developed as a way to add better lists to existing languages. MadSlip is actually just one version of this extension. There was also a version for Fortran and one for ALGOL. So yeah, there was a FORTRAN slip AND FLPL. This is the first part of the puzzle. Lists are crucial to AI, so crucial that programmers were willing to develop new languages or greatly extend existing languages just to add the correct type of lists. NLP, as a type of AI, has this same pinch-on for lists. The second takeaway from ELISA is this. Weizenbaum had created a rule-based system.

Starting point is 00:25:56 This is one of those programming concepts that sounds more intimidating than it actually is. A rule-based system is just a system with, say it with me, rules. In the case of ELISA, these rules are all about text patterns. They say that if a certain pattern is encountered, then carry out a certain operation. If you see the keyword I am, then break the input down in a certain way, then rebuild it in a certain way. This might sound like a very small distinction. Every language can check for a certain condition, then execute some instructions. We call that an if statement. It's something that you need to be a real language.

Starting point is 00:26:40 If this, then do that. Rule-based systems, however, are built specifically for dealing with special types of ifs in smart ways. ELISA is a good example of this specialization. Its rules are stored in lists. As the program churns through rules, it will build up a list of rules that apply to a given input. Once ELISA's built up this list of possible rules, it runs some code to figure out which rule it should apply. This is called conflict resolution. It's when a rule-based system resolves which rule it should actually follow out of a set of possible

Starting point is 00:27:17 good rules. You could do this same thing using a normal language, but it would kind of suck to write. Here, Weisenbaum is getting around that issue by storing Eliza's rules as internal linked lists, which are then interpreted by Mad Slip. In that sense, Eliza's scripts are almost their own programming language. So, if you really wanted to galaxy brain this thing, Weisenbaum ended up creating his own language specifically for dealing with natural language. That's a trend that I want you to keep in your head as we move forward. There are some interesting side effects to this rule-based approach. First of all, rules can be manipulated. Lists are, believe it or not, just a normal data type. So you put your rule in a list, well, in theory, you could rewrite those

Starting point is 00:28:14 rules on the fly, or write a program that generates its own rule sets. That's getting us closer to AI, a program that can learn and adapt and tread into the realm of mortals. Code for rule-based systems also looks radically different than code for normal programming languages. In a rule-based system, there isn't any concept of line number, at least not in any real sense. Rules are stored in ordered lists, oftentimes, but that doesn't always matter. Just because one rule comes after another doesn't mean it's executed after the first. Rules aren't even executed, at least not in a traditional way. Instead, rules are checked and then acted upon. There are a lot more steps involved than just run one line after another.

Starting point is 00:29:09 My hope is this is helping to build up some scenery. We're going to be dealing with languages that don't look or act like a normal language. They seem totally alien to more commonplace imperative programming languages. So how does this relate to natural language? Well, simply put, pretty strong tools were needed to actually deal with human languages. The middle part that Eliza was missing, the modeling and understanding piece, has a prerequisite. For a computer to understand something as complex as human language, it needs a way to reason. It needs a way to make

Starting point is 00:29:45 logical inferences about data. Computers don't really do inference. That's not a very computer thing. It's not very imperative. Thus, you just have to have special tools. These tools, more often than not, came in the form of a new type of language, the logical or declarative programming language. A note on terms here. You will see both of these terms used to refer to languages. In this episode, I'm going to be using logical and declarative somewhat interchangeably because, well, they mean close to the same thing in this context. In a declarative language, you describe, logically, what you want a computer to work out. Eliza's rules are an example of this. You write a rule that says if you encounter the input I am dot dot dot, you decompose and

Starting point is 00:30:40 reassemble that input in a certain way. You aren't giving a step-by-step way to handle processing data. Instead, you're declaring some new rule. You're describing what you want done and when you want it done, but not how to determine if it should be done or how it should be done. You're not giving every little detail. The computer works that out for you. The difference between imperative and declarative languages, at least in my head, come down to allocation of work. In an imperative language, the programmer has to figure out exactly how to instruct a computer to accomplish a task. You work up a step-by-step to-do list. In this paradigm, the programmer does almost all of the planning,

Starting point is 00:31:27 and the computer only executes that plan. You give the computer that to-do list, and it does it, step-by-step-by-step. In the declarative paradigm, the programmer works up logical clauses. They describe connections, relationships, and how different clauses interact with each other. The declarative programmer is doing something vastly different than the imperative counterpart here. The final workload is pushed more onto the computer. The machine has to figure out how to interpret all of these rules and relations, how to form a plan of execution and then execute that plan. The hell sounds pretty theoretical, so let's go right into some praxis.

Starting point is 00:32:09 After all, the theory around declarative languages has always made my head spin. I'm very much an imperative type of guy. So allow me to introduce Planner, a logical language that might be simple enough for me to understand. Here's something we're going to have to deal with right off the bat. Planner and its relatives are weird and confusing. Planner itself was developed in the latter part of the 1960s by Carl Hewitt. As such, I'm working off his 1969 paper titled Planner,

Starting point is 00:32:42 a language for proving theorems in robots. So check this out. This comes by way of the introduction, my favorite place to take the measure of a paper. Quote, the language will be explained by giving an oversimplified picture and then attempting to correct any misapprehensions that the reader might have gathered from the rough outline. End quote. I scream internally at this. Hewitt's plan in the paper describing his fancy language is to use a toy model, then give some gentle corrections. That's a sign that perhaps things are going to get convoluted. Part of this confusion is the fact that Planner is a little more than a normal programming language. This is something that isn't explicitly addressed in the paper. It's some background that is kind of needed for a lot of these declarative languages.

Starting point is 00:33:38 Planner is a tool for solving logic problems. It's a system for running proofs. Its interface happens to come in the form of a programming language. Or actually multiple languages, but we'll get there. I'm going to start us off with something that's easier to deal with and then build up from there. Planar operates as a state machine. That is actually how all computers operate, so this shouldn't come as a huge shock. What makes Planner different is how it exposes things. If you haven't heard the term state machine before, then let me give you a quick rundown. At any instant in time, you can model a computer by describing its state.

Starting point is 00:34:22 That's what's in memory, what's stored in registers, what lights are on and off, which switches are flipped, that kind of thing. The state is a very static construct. It's a snapshot in time. If nothing happens, the computer will just hold at that state forever. Operations can occur to change that state. Writing to some location in memory is a state change because we can consider memory part of the machine's state. So changing memory is changing that state. The code that actually changes memory isn't necessarily part of this model, it's something that exists kind of outside of the overall machine state. So when I say state machine, I mean we're dealing with a pile of numbers that are operated on over time. When you get down to it, all computers

Starting point is 00:35:13 are state machines of some kind or another. Programming languages give us ways to change states, to operate on that machine. In that sense, a language is an abstraction on top of a basic state machine. In an imperative language like C or Fortran, that abstraction doesn't take us too far away from the machine. You still operate in steps. If you write x equals 1, you're just telling the compiler that you want to name some location in memory x and set its value to 1. There's a bit of bookkeeping done here, but at its core, this is just a small change in the machine's state. In imperative languages, you build up programs from hundreds or thousands or millions of these small tweaks to the machine's state. Everything is done in little explicit steps.

Starting point is 00:36:06 Planner, on the other hand, doesn't always work that way. The language is much more implicit. In other words, you aren't specifically instructing the computer to make a lot of small changes. You can initially set up some state. What's kind of neat here is that the overall state of the machine can be saved as its own variable. Hewitt calls this a database, which, fair enough, it kind of is. You start off a planner program by building out this initial state. That part is roughly similar to how an implicit language does things. But here's where we deviate from the expected path. language does things. But here's where we deviate from the expected path. The program itself is composed of a set of logical clauses and theorems. You can also look at these as logical rules. So,

Starting point is 00:36:55 as just a dumb example, you could write a program that says x is a square, as its state setup, is a square as its state setup. Then has the rule, all squares are rectangles. Now, of course, an actual program would be more complex. It would be more useful. It would have all kinds of symbolic syntax, but I'm going to gloss over that. I assure you, planner syntax is complex enough that speaking it aloud may summon something kind of nasty. Okay, so once your program is loaded up, you can get down to the real point of Planner. That's asking it questions. And I'm not even joking here. The workflow for Planner is a lot different than a normal language. It isn't executing one line after another. I know I keep saying that. So here's what that practically means. Once your program is loaded, it's actually inert.

Starting point is 00:37:51 It's just a set of rules and a state waiting to be acted upon. Execution starts when you ask Planner to prove something, or ask it a question about the state or rules of the program that's loaded. So, for instance, we could tell Planner that the goal of this run is to attempt to prove that X is a rectangle. You literally would write something out like, goal is a X rectangle. Planner is Lisp adjacent, you see, so everything's kind of switched around and backwards. With that, the wheels start turning. Planner tries to answer the question by working through its rules and its state.

Starting point is 00:38:30 As it does so, the programmer kind of loses control over the machine. Planner starts updating its internal state as it works. It will also, quote-unquote, discover new rules as it runs. discover new rules as it runs. This will actually lead to new rules being deduced from existing ones. In this way, Planner is designed to follow the usual methods used during proofs. Let's say I started off with another rule that states no triangle is a rectangle. As Planner was running, that states no triangle is a rectangle. As Planner was running, it might deduce that, given its existing rule set, no triangle is a square. The deduction and proof part here meets up with an interesting bit of history.

Starting point is 00:39:17 Now, for this, we're taking a tangent, which I hope will be informative. Hewitt makes a citation to this 1959 paper titled Report on a General Problem-Solving Program. That paper was written at RAND. Now, if you aren't really deep into Lisp lore, then that title and that name won't ring any bells, so let me weave you a small tale. Lisp was developed as a modern language for linked list data, which we've established is a core feature for artificial intelligence programs. Now, Lisp didn't emerge from a vacuum. It was, in part, a response to existing languages.

Starting point is 00:40:01 These earlier list languages kind of sucked. FLPL was one of these, the horrifying Fortran list processing language. Another was IPL, the information processing language, which was developed at RAND in 1956. Now, I can say straight up that IPL is evil. At least, it appears that way looking back at it. And I know I mentioned how I'm trying not to be mad at languages and just call them bad, but IPL's a weird transitional language that doesn't really have a purpose once there's a better language. IPL is just really primitive. It looks and feels almost like assembly language. It has all these weird implicit rules around variables. It's just kind of gross.

Starting point is 00:40:49 IPL technically has many of the features of Lisp, but Lisp is just a far better constructed language. That said, IPL was ground zero for some of the earliest AI research. One of the first artificially intelligent programs, Logic Theorist, was written in IPL in 1956. Logic Theorist was able to generate novel proofs to logic problems. That is really cool, but this first pass was also very limited. Logic Theorist was only able to use a handful of very simple methods, and it was only ever used for a limited number of proofs.

Starting point is 00:41:32 Come 1957, the team at RAND created a new program, the General Problem Solver. This was a much more capable program, and it's what Hewitt cites in his Planner paper. more capable program, and it's what Hewitt cites in his Planner paper. This puts Planner onto an interesting branch of the programming family tree. I think I've kind of hit this point where I just need to make my own genealogical trees for programming languages. Anyway, here's the hot take. Lisp people, you can get mad at me if you want. Planner is a close cousin to Lisp, but in a convoluted way. Lisp and Planner were both influenced by IPL.

Starting point is 00:42:13 Planner was also influenced by Lisp, or at least an early version of Lisp. Hewitt's Planner article also cites the Lisp 1.5 Programmer's Manual. This version of Lisp is a little strange, enough so that Lisp 1.5 should probably be considered a different language. The long and the short of it is that 1.5 uses both S and M expressions, whereas later versions of Lisp only use S expressions. You can ignore that if you want. The point is that 1.5 doesn't look or act the same as Lisp circa, say, 1960. To simplify all of this,

Starting point is 00:42:54 Planner was developed a little bit after Lisp was created. It took inspiration from the same original sources that Lisp did. It has goals similar to Lisp. The timeline even worked out such that Hewitt could draw inspiration from early versions of Lisp. Thus, we get an almost alien looking language. As someone who can read Lisp, Planner falls into this uncanny valley. Hewitt compares Planner to Lisp quite a bit, but there are significant differences in syntax. Plus, these comparisons are to really early versions of Lisp. The result is that Planner feels almost alien. I think that's the best way I can describe it. Like I said earlier, I don't really want to get into the syntax of Planner.

Starting point is 00:43:45 But I will give you a taste. You know, as a treat. Allow me to introduce you to Matchless. The second language used in Planner. That's right! Planner! The whole time has been two languages in one! It's... It's a little mind-boggling, honestly. Planner itself is the

Starting point is 00:44:11 rules and regulations side, while Matchless is a pattern-matching language. If you're familiar with modern-ish programming, then I can give this to you in one sentence. Everyone else, close your ears for a second. All right, is it just us? Matchless is a really bad, really primitive regular expression engine. It's basically very, very early grok. All right, we can let everyone back in. The rest of you can start listening now. You will get the long story. In Matchless, a programmer writes out a pattern that can then be used to decompose text. Technically, Matchless operates on lists, but there's a trivial way to map between raw text and lists.

Starting point is 00:45:03 I mean, you just break everything up by the space. So for my purposes, this is text. Conceptually, MatchList is similar to what we saw with Eliza, just more complex. A MatchList program starts with a list of variables. These variables will be used to hold the decomposed text once everything's said and done. This is pretty nifty because instead of just getting a list of text chunks, you end up with nicely named variables. Very classy stuff. Matchless is strictly typed, so each variable has an associated type. That's not too weird for this type of system. It's actually kind of cushy. MatchList is smart enough to take variable typing into consideration

Starting point is 00:45:54 when performing decomposition. For instance, the word for list in MatchList is segment. If you say that one of your variables is a segment, then MatchList will only fill it with multiple chunks of text. Basically, if you ask for a list, you get a list. So far, this sounds reasonable, and it really is. There are very modern text processing tools that work in the same way as MatchList At least, conceptually. The syntax, well, that's a totally different story. The main function used in Matchless is called assign? That's with a question mark at the end. So, we have a function name, a built-in function name that ends in a question mark. I shouldn't have to say that's weird, but that's weird.

Starting point is 00:46:51 There's no reason it can't be legal, though, so I guess we're fine. The syntax for constructing patterns relies on prefixing variable names with operators. That's, uh, not super standard. It's more weird than anything, though. So, as a further example, the most basic operator is dollar sign underscore, which just means a sign. So, if I have a program that has one variable that's a segment called x, and the

Starting point is 00:47:26 pattern is $x, then I'm just looking for x to be a set of anything. Just anything turned into a list. Fine. Weird, but fine. We also get the beautiful and mind-numbing dollar sign dollar sign. This operation says to match the value of a variable. For example, let's say I have a program with the variable x, which is an atom, aka a single value. Dollar sign underscore x, dollar sign dollar signx would match two repeating values. So it would match 1, 1 or 2, 2, but not 1, 2. It gets a little bit strange here. So if you ran that program with the input 1, 1, the value of x at the end would be 1, which I get is just a little confusing. And if this is sounding like a mess, then you're starting to get it. Matchless is strange. And that

Starting point is 00:48:36 strangeness is indicative of the larger strangeness of Planner. This isn't even getting into more complex operators that can be used to construct patterns, or the fact that in this version of Planner, there are curly braces and parentheses that are mixed. It doesn't feel good to look at. The upshot here is that Matchless provides a very robust way to break up data. This is the first part of the NLP outline. Decomposition, parsing, whatever you want to call it. We have a nice pipeline that's growing here. Matchless can break down text into some model that a computer can work with. It needs a little help to do the whole text-to-list thing, but we can ignore that. Planner itself can

Starting point is 00:49:26 then be used to build out logical models. These models can be manipulated and prompted by incoming data. In other words, you can use Planner to develop a natural language processing system. The system is almost tailor-made for that purpose. So where do we go from here? Well, if I wanted to take a short route, we'd pick up the Prolog thread. The early history of Prolog runs in parallel to Hewitt's work on Planner. Prolog isn't initially influenced by Planner, the crossover comes a little later into the development story of Prolog. That said, the crossover comes a little later into the development story of Prologue. That said, Planner does give us an example of how a language can be specialized for NLP.

Starting point is 00:50:13 But, uh, I'm not going to do that. This is Advent of Computing, after all. I tend to favor these meandering paths. Instead, I want to shore up our understanding of natural language processing. Prologue proper can wait until next episode. Prologue is all about creating a better language for this type of niche. So the question is, what were people actually doing in this niche prior to prologue? What kind of natural language processing was going on in this era?

Starting point is 00:50:51 Luckily, an example is close at hand, and better still, this example uses Planner. One of the things that I really like about producing Advent of Computing is that it forces me to expand my horizons, so to speak. I'll be the first to admit that I'm not really a computer science guy. I'm a software developer, which is actually pretty different from a scientist. It's like theory versus practice, idea versus application. I learn best and I work best when I'm actually, you know, working. I will learn by doing sort of dude. Running this podcast hasn't made me a computer scientist, but it has forced me out of my comfort zone pretty regularly. I spend a lot more time reading and dealing with theory than I ever imagined I would. It's given me a newfound respect and understanding of computer science. I say this because it's time to return to my comfort zone. It's time to get into my

Starting point is 00:51:48 fancy pillow fort, put on my weighted blanket, and look at some actual software. So far this episode has been pretty heavy on theory. I want to close out by bringing us back to the ground, by reducing what we've learned to actual practice. It's time for a case study. The subject is... I don't know how to say this. Shurdlu. Let me just say, I hate this name. Apparently, it's a reference to some in-joke about old printing presses that, in my mind, printing presses, that, in my mind, somehow makes it worse. It's spelled S-H-R-D-L-U, and I'm going to try and pronounce it Sherdloo, since that's the only reasonable way I can think to deal with this. I think a flub should be just as recognizable as actual good pronunciation anyway, so we're going to be on the same page here.

Starting point is 00:52:47 All right. Shirtlew is a native language system developed at MIT by Terry Winograd. It was written in MIT's very storied AI lab, which means we're looking at a very well-connected program. means we're looking at a very well-connected program. This is the same lab that Lisp itself came out of, as well as the incompatible timesharing system. Shirtlew was actually written primarily in Lisp and ran on ITS.

Starting point is 00:53:20 The program is interesting for a few reasons. While being an NLP, it's actually somewhat simple, at least in comparison to some larger systems. This is thanks to some smart scope restrictions. AssuredLoo wasn't meant to be some all-knowing, all-purpose digital intelligence. Instead, Winograd focused on just one thing, a room full of blocks. For this to make sense, I need to pull in a little bit of context from Winograd's 1971 thesis, Procedures as a Representation for Data in Computer Programs for Understanding Natural Language. Another nice, easy-to-stumble-upon title.

Starting point is 00:54:01 As Winograd explains, a natural language system can only work if a computer actually understands what's going on. This is very in line with the era's approach to AI. The core idea is that artificial intelligence must be built as a model of human intelligence. The program should be written to mimic how humans work and think, at least on a superficial level. When we apply this train of thought to language, we reach an interesting impasse. You cannot simply parse natural language. You can't actually have a decomposition stage that's entirely separate from deeper digital machinations. To illustrate this fact, Winograd pulls from earlier attempts at computerized translations

Starting point is 00:54:52 of language. To quote, When a human reader sees a sentence, he uses knowledge to understand it. This includes not only grammar, but also his knowledge about words, the context of the sentence, and most important, his knowledge about the subject matter. A computer program supplied with only a grammar for manipulating the syntax of a language could not produce a translation of reasonable quality. End quote. I think that this is, frankly, really cool. It's one of the points where things started to fall into place for me.

Starting point is 00:55:28 When you read a sentence, you aren't just decomposing it. You don't think, okay, well, I'll split this by spaces into separate tokenized strings. Then I'll look up the part of speech for each word, check the syntax to make sure it's a valid sentence. speech for each word, check the syntax to make sure it's a valid sentence. That's a bit of an unrealistic and robotic way of understanding language. A reader will instead pull upon past experience to understand language. In this way, we can deal with complex, confusing, and subtle words. We can deal with things like jokes or puns or pronouns. That last one is an example that Winograd comes back to a couple of times in his paper.

Starting point is 00:56:15 We tend to use pronouns pretty liberally, at least in English. It's not always syntactically or grammatically clear what pronouns are referencing, but we just kind of know. That knowing is really thanks to our experience with the language. We break down a sentence using internally consistent logic and experience. This means that ELIZA isn't just missing a step. We couldn't add a middle layer to ELISA and just get an intelligent program. Rather, it would need to be restructured from the ground up. Language parsing relies on modeling and reasoning. In Shurdlu, that reasoning is automated with Planner. So, here's the rub.

Starting point is 00:57:04 To understand language at all, you have to have some kind of logic between your ears. You have to know how things function in the real world. This is needed not just for breaking down language or for things like pronoun references, but even just to detect nonsense sentences. Something can be syntactically correct, but mean nothing. sentences. Something can be syntactically correct, but mean nothing. I could walk up to you on the street and say, hello, I would like to verb adjective noun, and that's a valid sentence, but it means nothing. This means that any NLP system will have to have some baseline understanding of the world. They'll have to know that you can't just verb adjective a noun.

Starting point is 00:57:44 of the world. They'll have to know that you can't just verb-adjective a noun. But the world is a big place. There's no way you could expect to teach a computer everything. Winograd solves this problem by restricting the computer's world. He built up a toy model, as we've seen before. Sherlu only understands a very simplistic realm. Its whole world is a room full of blocks. I don't mean this as some kind of data structure. I mean physical blocks. It has different colored cubes and pyramids and rectangles, a box, and a few other sundry shapes.

Starting point is 00:58:23 All have their own dimensions. All have rules about how they can stack and be manipulated. The manipulation is accomplished with a claw, which also has a set of rules governing its movements. Initially, Sherlu loads up a planner program that defines this limited world. This toy world, if you will. The simplified world drastically simplified Winograd's project. Instead of trying to make an all-knowing computer program, he just had to make a program that knew the logic of playing with blocks. It's smart enough to know that a cube can't be stacked on top of a pyramid, or that one box is taller than another, but it couldn't

Starting point is 00:59:05 tell you how boxes are made or even what they're made out of. This type of scope restriction is crucial when tackling huge projects. Without it, Winograd wouldn't really have gotten very far at all. This brings us to an interesting question. How was knowledge represented in this system? We have an answer very close at hand, and it's a really neat one. To quote again, We represent knowledge in the form of procedures rather than tables of rules or lists of patterns. By developing special procedural languages for grammar, semantics, and deductive logic, we gain the flexibility and power of programs

Starting point is 00:59:46 while retaining the regularity and understandability of simpler rule forms. This is the second place where things kind of clicked for me. Let me try and break this down and explain why I think it's so cool. Shardloo is made up of a handful of different programming languages. Lisp is used to orchestrate everything. Parsing is done using this language called ProGrammar, which I'm not going to dig into. Just know it's a Lisp-looking language that's used for handling text. Deduction is handled using Planner. As we already saw with Eliza, data-based rules are kind of clunky. They work, don't get me wrong.

Starting point is 01:00:34 Eliza was able to get by with simple patterns and rules that were stored as lists. That said, they aren't dynamic. That's a problem because natural language is a very dynamic thing. Actual human intelligence is similarly fluid and dynamic. Winograd argues that this list approach is way too simplistic. Instead, he went for programmatic rules. Let's stick to the deduction side, since that's what we know best at this point. Sherdlou starts off with an initial set of planner rules. These are the

Starting point is 01:01:15 block world rules that I already mentioned. In addition to rules, Sherdlou also contains state information. Winograd explains this as a set of relations between objects and properties of objects. For example, block number one is red would be a property of an object. Block number one is on top of block number two would be a relation between objects. These all form a state in Planner. So technically, yes, there is a layer where the state is defined in a pretty static form. That said, it's actually built up from functions, from little programs. That's how the state gets modified. The idea here is that instead of saying something like block1.color equals red,

Starting point is 01:02:03 you'd say red block1. Of course, this is in the Lisp calling convention. This just means that you're calling a function called red on the block1 object. For a simple example, this is only a subtle difference. Ultimately, the red function is just setting the block's color, right? Well, the answer is probably yes, but there is some wiggle room here. Remember, smart code is all about this wiggle room, this play in the system. Since Winograd is using functions to set everything up, ShardLoo can call up logic. It can call up deductive code at any time. I'll give you a better example than changing color. Let's say I want to place a pyramid on top of a block. I'd end up writing something like stack pyramid1 block1. That's just calling this planner function called stack that looks at pyramid one and block

Starting point is 01:03:07 one and says, okay, I'll update my state so the pyramid one is on top of the block one. Now, let's flip it. I'll tell Shurdloo that I want to stack a block on top of a pyramid. That'll be stack block one pyramid one, or something to that effect. Assured Lou won't comply with my instruction. It would spit out something like, I can't. The stack function calls up deductive code while it's executed. It doesn't just set the state. It checks to see that the request follows all known rules. In this case, there's a planner rule somewhere that says you can't stack anything on top of a pyramid. The thing's just too pointy.

Starting point is 01:04:02 By encoding data, by encoding knowledge as procedures, it's easier to add in this type of intelligence. The same could be done without this procedural approach, but it would take a lot more code, and it would take really repetitive code, too. To put this all together, let's move closer to the user side. A user doesn't actually type in this fancy syntax to stack blocks. That would kind of defeat the whole purpose of natural language. It would ruin the exercise. You converse with Shurdlu over a teletype as if you were talking to another person.

Starting point is 01:04:31 That language gets broken down using a combination of programmer and lisp. Then that's rendered into Planner. This is the full circle for me. There is some back and forth during language parsing, but eventually an input like, please, would you stack the first pyramid on top of the first block gets turned into the code stack pyramid1 block1. In order to function, Shardlue has to produce code. It's a program that creates a new program. There are a few words for a system like this. The earliest term was generative programming.

Starting point is 01:05:13 Grace Hopper called these types of programs compilers, a program that takes code as input and outputs a different type of code. However, there's a twist. Well, almost a twist. Technically, everything here is Lisp. Programmer is an extension on top of Lisp. Lisp is, well, I think you can guess. And Planner, that's the odd one out, but only barely. Winograd was using a version of Planner called MicroPlanner. It's a subset of the larger language spec that was implemented at MIT's AI lab. Here's the thing of it. MicroPlanner was implemented in Lisp. Its syntax is, for all purposes, Lisp. Some parts of Planner's early syntax were more wild and those have been dropped or coerced a little bit. This means that Shurdloo is technically a Lisp

Starting point is 01:06:15 program that on the fly writes Lisp. This is a task that Lisp was purposely built for. It's able to deal with programs as if it was data and vice versa. It's also a very special case of generative programming. The $5 word here is homoiconic. It's a language that can represent and understand itself. This is a final feature that will prove key to the future of AI and NLP. Alright, that brings us to the end of this episode. I want to keep things brief since, well, this isn't actually a conclusion. This is just a brief intermission. I hope that at the end here we have some kind of understanding of

Starting point is 01:07:05 natural language processing. I certainly have had my own breakthroughs while preparing this episode. This is going to be pretty crucial moving forward. Proper NLP requires truly sophisticated software and sophisticated tools. A toy program like Eliza, while impressive, just won't cut it. You need ways to break apart language, ways to reason, and ways to deal with context. That all requires some level of artificial intelligence. Next episode starts with Prologue Proper. We'll be diving headlong into the early development and emergence of this strange and unique language. After all this Lisp, I think the main question on everyone's mind should be, how can another language swoop in? Is Prologue simply more specialized for NLP than Lisp,

Starting point is 01:07:59 or is there some other force at play here? Thanks for listening to Advent of Computing. I'll be back in two weeks' time with another piece of computing's past. And hey, if you like the show, there are now a few ways you can support it. If you know someone else who would like the show, then take a minute to share it with them. You can also rate and review the podcast on Apple Podcasts and Spotify now. If you want to be a superfan, you can support the show directly through Advent of Computing merch or signing up as a patron on Patreon. Patrons get early access to episodes, polls for the direction of the show, and bonus content. You can find links to everything on my website, adventofcomputing.com.

Starting point is 01:08:36 If you have any comments or suggestions for a future episode, then go ahead and shoot me a tweet. I'm at Advent of Comp on Twitter. And as always, have a great rest of your day.

Advent of Computing - Episode 112 - Prolog, Part I

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.