Advent of Computing - Episode 113 - Prolog, Part II

Starting point is 00:00:00 Why would someone want to create a new programming language? Well, the answer to that should be pretty straightforward. We even have a nice theoretical framework to use. We know there are a multitude of programming languages in use, each with their own niche. So, whenever a new niche is found or created, someone just sits down and says, Ah, geez, there isn't a language for that yet. I'd better make one. The creator picks features for that language that would make it suited for that niche. So by the end of the day, or maybe the week, we get a new language perfectly tailor-made for that

Starting point is 00:00:38 one application. Uh, if only life were that easy. Producing this show would be a lot quicker. Heck, I could even reduce whole languages down to baseball cards. Just need the language name, what it's used for a year, photo of the solitary creator, and you're done. But that's just not how the world works. I can throw a pile of counterexamples at you, so let me just take fourth off the stack to start with. That language was initially developed as a handy little tool

Starting point is 00:01:11 for personal use. Its first creator, Charles Moore, then started working at a lab with telescopes in it. The language was adopted for the task and started running scopes and data analysis. adopted for the task and started running scopes and data analysis. Others started to use Forth, the language was further adapted, and contributors brought in their own ideas. Fast forward years later, and Forth is being used for small embedded systems, particularly in aerospace. That language stumbled into a niche thanks to its creator getting a new job. Forth underwent a slow process that turned it into a truly specialized language for a truly special application. That's the short version of the story, at least.

Starting point is 00:01:57 Another good example is Prologue. That language started out as... Well, I guess I'll just have to give you the long version for this one. Welcome back to Advent of Computing. I'm your host, Sean Haas, and this is episode 113, Prologue, part 2. This is the conclusion to my prologue series, as the name should clue you in on. If you haven't listened to part 1, then I greatly suggest you do it. Last episode didn't really get into the history of prologue proper. In fact, I even saw

Starting point is 00:02:41 someone commenting online that it should have been called the Prologue to Prologue. That said, it does discuss the history and concepts behind natural language processing. That's important because, well, Prologue was developed for use in natural language processing software. For those who didn't listen, or those who have forgotten, I'll give a quick rundown of part one. For those who didn't listen, or those who have forgotten, I'll give a quick rundown of part one. Natural Language Processing, or NLP, is a discipline that's concerned with making software that can understand human language. The natural here just means some tongue spoken by the flesh, and not a programming language.

Starting point is 00:03:20 NLP is a branch of artificial intelligence. The connection here is a kind of interesting one. You wind up needing some level of intelligence to understand human language. So even if you have a system that doesn't seem very intelligent, you actually need some reasoning capability in the first place just to break down sentences. The two big uses for NLP in this early period, the 60s and 70s, were automatic translations of text and intelligent user interfaces. These are pretty different applications, but they share the same tooling. You need a way to break apart inputs,

Starting point is 00:03:59 model and reason about the contents of those inputs, then generate outputs. By the late 1960s, it was becoming clear that this was actually a huge task. In 1969, Terry Wintergrad published a thesis on his program called SHRDLU, spelled S-H-R-D-L-U. It's an awful name for some very interesting code. Now, there's a lot to say about the software and the paper. See last episode for that. The short story is that Winograd argued parsing language requires the ability to reason. We humans use our past experience and understanding of how the world works to help process language. So too must a computer.

Starting point is 00:04:44 At least, that's Winograd's theory. In practice, this means that any NLP program needs a way to carry out deductive reasoning at all steps of the process. This comes in the form of rule-based systems, sometimes called a deductive program or a solver. These types of programs are strange, kind of non-linear beasts. In the case of Sherlue, a system called Planner was employed. Now, Planner, like many of its siblings and relations, is a declarative language. We'll be dealing with that paradigm a lot more today, so don't worry if you aren't familiar. The big thing to know going into this is that declarative languages aren't super popular today, so any language that

Starting point is 00:05:31 adheres to this paradigm feels a little strange to the modern programmer. Okay, that should get us all up to speed. So what's this episode going to cover? Simply put, it's the rest of the story. Today we're actually looking at the history of Prologue proper. I'm returning to the scrolls of arcane knowledge that sent me down that NLP path last episode. This time I'm better armed and somewhat able to decode these scrolls' meanings. So let's begin the tale of Prologue. We'll be examining how early experiments carried out by a team of researchers led to a language. It's a tale of the push and pull of specialization on

Starting point is 00:06:11 one hand and generalization on the other, how a specific idea was turned into a tool and then reshaped into something far more useful. So I'm going to just start the episode with Sean's complaint corner. The history of prologue mainly takes place in three places, Marseille, Edinburgh, and Canada. The primary locale is, however, France. As such, most of the older papers that lead up to prologue are in French. This means I'm surviving off later papers, accounts, and some translations where I can get them. Our story begins with Alain Comerauer, a computer scientist and professor. He earned his PhD in the mid-1960s in Grenoble, France. Shortly after, in 1967, Alain became an assistant professor at the University of Montreal

Starting point is 00:07:07 in Canada. During his college days, he was steeped in the big languages of the day, namely Algol and COBOL. One of his first research projects, as recounted by Jacquez Cohen in his tribute to Alain Colmaraur, was a syntax and error checking program for COBOL. So we're dealing with a dude who knew a thing or two about programming and, specifically, the linguistical qualities of code. Colmaraur soon became involved in an interesting endeavor over in Montreal. The project was called METEO, but you'll see a number of names circulating around this one project. The actual team that Alain was part of was known as TAUM. That's an acronym for something in French, I'm not going to try and pronounce it,

Starting point is 00:07:55 but it roughly translates to Automatic Translation at University of Montreal. This is where Alain would cut his teeth on machine translation. Now, the history of machine translation in this period is a little bit fraught. It actually led to a downturn in the larger field of AI research in the United States. While Kohlmerauer was finishing up his education, there was this little group known as the Automatic Language Processing Advisory Committee. They were meeting in Washington DC in this period. The goal of this committee, which can be shortened to ALPAC, was to determine if computerized translations of human language was possible at all. The finalized report was filed in 1966, and the conclusion was kind of bleak. Here's a taste of the final report. Quote, the contention that there has been no machine translation of general

Starting point is 00:08:55 scientific text is supported by the fact that when, after eight years of work, the Georgetown University MT project tried to produce useful outputs in 1962, they had to resort to post-editing. The rest of the text is filled with more bad news. Researchers just weren't really able to crack the machine translation nut, at least not yet. The state of machine translation in the 60s was just bad. It was a new field, and there wasn't much progress that could impress bureaucrats. This report ends up jeopardizing all natural language research in America. Wintergrad's work on Sherlu is in direct response to the failures outlined in the ALPAC report, which is one of the reasons that Wintergrad took a really fresh and drastic approach to natural language processing. To their credit, the ALPAC report does recommend that

Starting point is 00:10:00 funding be kept in place. MT, or machine translation, will take time to bear fruits. But the US government didn't really seem to get the message. Funding for machine translation and AI research in general took quite a hit after the ALPAC report was published. This period is sometimes called the first AI winter, so suffice to say it was traumatic for quite a few people. first AI winter, so suffice to say it was traumatic for quite a few people. But Montreal, you see, that's not actually in the United States of America. The University of Montreal isn't funded by US tax dollars either. Whether he knew it or not, Kohlmerauer was in a very unique position to carry out unique research for the time period. This is where TAUM comes in. During the same time period that the US was cutting funding for AI, the Canadian government was moving in a different

Starting point is 00:10:51 direction. At least, indirectly. As early as 1963, the Canadian government had been moving towards establishing two official national languages, English and French. Part of that push involved making all communications from government agencies available in both languages. That would have a weird domino effect. There was now more call for translation than ever before in Canada. Those translations were crucial for government business and were now legally mandated. Or at least they would soon be legally mandated. There's a bit of a timeline here that we have to think about. The thing is, humans can only translate so much text at any one time. It's a personal failing of our species.

Starting point is 00:11:39 And only so many people are trained as translators. To try and fill the gap, the Canadian National Research Council turned to technology. So pretty soon, there was plenty of funding for machine translation research. All it took was a pressing need. The details are a little hazy as to the exact chain of events. Town was either already operating at U of M, or it was founded as a result of this new funding. Either way, it was up and running by the time Kohlmerauer joined the university.

Starting point is 00:12:12 The purpose of TAUM was pretty simple. Investigate automatic translations from English to French. Their tool of choice in this period was a truly mind-warping language called Q-System. Now, even the name gets a little confusing here. I've seen it referred to in the plural and singular. Some sources call it the Q-Systems, others just call it Q-System. Some sources don't even call it a language, so we're entering into a bit of a mess already. Let's start with the basics. Q-System was developed by Kohlmerauer starting in 1967. It's another one of these logical or declarative languages.

Starting point is 00:12:59 We've already encountered similar systems in this series. Take Planner as a canonical example. That system was designed to carry out logical proofs and do deductive reasoning. Q-system is similar, at least in broad strokes. Instead of running proofs and deductions, however, Q-system is designed to transform trees. A tree in this context is a higher order list. It's a list where each element can be a list itself. Why would the structure matter for linguistics? Well, that comes down to parsing. You can break down a sentence in a few ways. One is to just split every word and make a nice little list. That's easy, but it doesn't actually provide much more information than the sentence itself. The computer just knows where spaces are.

Starting point is 00:13:47 Not too useful. Trees give you a more sophisticated approach to this task. It allows you to group related words while adding in context about what those words are doing in the sentence. Luckily, I can skirt around my lackluster understanding of linguistics here. Trees are also used for processing normal programming languages, which is much more in my wheelhouse. Say you have the expression 1 plus 2.

Starting point is 00:14:16 That's a very normal-looking expression that could show up in any number of languages. That can be turned into a tree pretty easily. Each part of that expression becomes a node. That's the be turned into a tree pretty easily. Each part of that expression becomes a node. That's the basic element of a tree. The root of the tree here is the plus operator, so you get a big node at the top of your sheet of paper with a plus in it. Its arguments, 1 and 2, become children nodes under the plus node. So we get two branches coming down from the plus, the point to 1 and 2. We end up with a very small, very basic tree.

Starting point is 00:14:51 It should look like a little triangle with an open bottom. These trees provide both added meaning to the expression, while also making it easier for computers to process. A program can start at the root, see that it's an operation, and see that it has two arguments. That makes creating an interpreter or compiler much easier, since programs just need to process these well-formed and standardized structures. There's also a whole lot of work that's been done on processing trees, so you can

Starting point is 00:15:20 kind of loop into an existing body of work. The trick is going between text and trees. During language parsing, whatever kind of language we're talking about, you'll have code that breaks up your text and somehow shuffles it into a tree. That somehow is the hard part. There are a pile of different approaches that are each dependent on the programming language in question. For a language like Lisp, this processing is very straightforward since its syntax is very uniform.

Starting point is 00:15:51 For something like JavaScript, well, you have a less uniform time. If we're dealing with a language like English or French or another tongue that we humans like to speak, well, things get even less uniform. All these same principles can be applied to natural language. You have rules that determine how a sentence is structured, how parts of speech can be used, and what words modify or pertain to other words. Of course, this parsing will be connected to some kind of reasoning about the world and about how language functions. You can't just understand a sentence without understanding its subject.

Starting point is 00:16:33 Q-System was designed as a very generic tool, but it will end up being used in a system that translated weather reports. As such, most examples of this system in action are talking about the weather. Here, I'm going to pull specifically from a book called An Introduction to Machine Translation by William Hutchins. He gives one of these tree examples for the phrase, cloudy with a chance of showers. I'm going to do my best to go over this in hopes that perhaps it provides a little more clarity and understanding of a language tree. The root of the tree, the big node that everything else is under, is a C for condition.

Starting point is 00:17:15 At that point, the phrase has been identified as talking about some weather conditions. The first word, cloudy, is identified as an adjective. The first word, cloudy, is identified as an adjective. So a node under condition is added for the adjective, and then a node under that for cloudy itself is added on. The rest of the sentence, with a chance of showers, is a condition modifier. It's saying that there is something beyond just clouds in the forecast. So that's put under a condition modifier node. With a is a preposition,

Starting point is 00:17:55 so it gets its own group. Then chance of showers itself is a condition, so it gets a tiny condition tree of its own. The final result is a tree that groups together parts of the sentence by their meaning and use in the sentence. This all sounds like an eye-glazing exercise, but it has an important purpose. We can, at a glance, see the meaning of cloudy with a chance of showers. If we knew French, we could probably translate that same sentence. But the computer needs a more mechanistic way of doing things. From the tree, the computer can see that we have an adjective and a subcondition, which are linked with a preposition. There's also another trick to these trees. Each node is chosen deliberately to be a single translatable phrase. Using a slightly smart dictionary, each phrase can be translated from English to French. This makes

Starting point is 00:18:46 each node the smallest unit of work possible in the system. It's down to just a lookup in a dictionary and a couple rule checks to make sure everything lines up with endings and tenses. So this gets us back to the whole matter of tree transformation. Each English tree has an equivalent French tree. Sometimes, these have the same shape, but in most cases, you need to switch things around. Verbs don't always go in the same places in English and French sentences. This makes the core task, the big, complex computational challenge, transforming from an English tree to an equivalent French tree. This is the whole point of Q-System. It provides a language for describing these tree transformations. That language is rule-based, which means the Q-System is solidly in the form of other

Starting point is 00:19:39 rule-based declarative languages that we've been looking at. And the syntax is... kinda gross. The overall system is pretty complicated, but I want to focus us on just one aspect of Q's system that I think will prove useful down the line. That's the idea of conflict resolution. This is a topic that comes up in many rule based systems. That is, if you have multiple rules that could each be applied to a given state, how do you pick which rule should be applied? Kohlmerauer's solution was to just apply every single rule. At least, that's step one. If you read anything about Q-System, you will undoubtedly run into these wild-looking flowcharts that are supposed to explain how these languages work. This was one of the reasons that I had to take a step back from

Starting point is 00:20:32 the Birth of Prologue article that I was reading that started this whole deep dive. There's a section on Q-System in that that says, oh, Q-System transformations are explained by this diagram, oh, Q-system transformations are explained by this diagram, and then it has what appears to be an incomprehensible flowchart. They're a little off-putting, to say the least. These graphs use dots to represent choices, or divergence points. From each dot springs a series of arcs that represent possible actions. These arcs eventually converge onto a final dot. So these trees aren't really trees so much. They look like weird closed off formations. And following traditional plant naming schemes, I'd kind of like to call them bush graphs

Starting point is 00:21:21 because they almost look like topiary bushes. Anyway, the idea is that each arc represents the outcome of one valid rule. In the TALM example, these arcs are each one possible translation of a phrase or word. Here, Kohlmerauer is drawing from earlier work done on graph-based analysis. At least, that's what the sources will tell you. I think it's a pretty opaque way of explaining these bush graphs. So, here's a better one. Each of these graphs is a state diagram. And I know, I know, more ways to make your eyes glazeth over.

Starting point is 00:22:05 So let me explain. We should all be familiar with a state machine from last episode. It's just a machine that has some data, aka a state, and that data can change. A state diagram represents how you can change that data. Traditionally, each node in these graphs represents a state. These nodes are then connected with arrows that show the different ways a state change can occur. If you only have one value and all you can do is increment it, then your diagram would actually just look like a line of dots. Each dot would be an increasing value starting from 0 going to 1 to 2 to 3 and on up.

Starting point is 00:22:47 Each dot would be connected by an arrow pointing forward. State diagrams are one of those things that look intimidating, but they're actually pretty simple. Q-system graphs follow a roughly analogous idea. They just don't always show arrows. Each arc represents a change in the state of the Q system, while each dot is just a shorthand for something similar to a state. In practice, a Q system state is a pile of trees, so you can't really draw that on a diagram in a way that's concise. There are some differences since the dots aren't strictly states, but

Starting point is 00:23:26 we're dealing with something that's really close to a state diagram. So here's how it works. Q-system is given a tree. It then goes through its rules and sees which may apply to different sections of that tree. Each of these rules carries with it a transformation. That's to say, an action to be applied to the tree. Those are the arcs. As Q-System finds rules that may apply, it places these associated actions onto the graph. Each of the arcs in this graph isn't so much a translation in this example, but rather an action that somehow changes the tree. Some of these rules are simple, just replace one type of preposition with another, move verb after noun in some cases. Some are more complex, like fully shuffling a tree into a

Starting point is 00:24:19 totally new shape. But remember, more than one rule might apply. The example used in an introduction to machine translation is the phrase heavy. That one word can be translated into multiple different French words depending on the meaning and its placement in a sentence. So, Q-System has to put in multiple arts, multiple possible outcomes. The next step is to just brute force the whole thing. This is truly my favorite approach. Q-System just applies all of these possible branches to the tree. The program will actually take multiple branches down these graphs. It will try and transform the tree one way and then check if it reduces to a valid tree. There's already rules in place that say what constitutes a valid French tree,

Starting point is 00:25:10 so it's pretty quick to check that sort of thing. In the town case, these are just rules for French grammar and sentence construction. That's all stuff that you can pull from a dictionary or just a little knowledge of linguistics. Any set of arcs that leads to an invalid French tree is marked as bad. Through this process of elimination, Q-System can build a valid sentence. From there, it's just a matter of turning the tree back into text and boom. You can now talk about the weather in English or French. That's the intelligent part about the whole system. It's smart enough to figure out what works with its given rule set. As it goes down the graph reducing arcs and applying transformations, it keeps track of

Starting point is 00:25:58 where it's been and where it's about to go. This is all possible because Q-System models how language works. It models the rules of language itself, so it knows what's a viable sentence and what's nonsense. However, unlike Shurdloo and other deductive systems, Q-System is much more abstract. Kohlmerauer's tool would make its way into production, which is a mark of really useful software. Meteo is the name of the resultant project, which ends up being used to translate weather forecasts for years. This is one of the first big wins for practical machine translation. But we aren't really concerned with the weather here. We're more interested in how Q-System paved the way for Prologue. Talm's translator was really cool, but it was pretty simple. The fact is that it only had a

Starting point is 00:26:52 very limited scope. It really only translated simple sentences that described the weather. That's a really, really tight scope, which helped set it up for success. That's nice for a single project, but not nice in the abstract. So where do we go from here? Well, for Alain, it was back to France. In 1970, he left the University of Montreal and moved back to Marseille. This is where a team forms. As Kohlmerauer explains in The Birth of Prologue, this group consisted of four programmers. Each of these programmers are pretty enigmatic, save for Kohlmerauer himself. Gene Trudell had actually followed Kohlmerauer from Montreal. He was involved with more of the deductive programming, along with Philippe Rossell. We don't have a whole lot of information about these two.

Starting point is 00:27:43 They focused on automatic theorem proving. The fourth man, Robert Pissarro, was on the natural language side of things along with Kohlmerauer. The first project the team took on in 1971 was to create a program that could, quote, make deductions based on texts written in French. The finer details of this program are, well, written in French. What I can gather is this. It was actually written in two different languages. QSystem was used for parsing input text and generating outputs in French. So this part of the system

Starting point is 00:28:19 is exactly what we ran into with TAUM. I'd be willing to bet that it actually shared some code with TAUM's forecast translators, since that's often how projects go. The middle part, the deductive part, was written in ALGOLW. This isn't all too strange. At this point, the team of Marseille knew nothing about Sherdlou, but they were following a similar path. Shurdloo used separate tools for separate components. This story-deducing program did the same thing. The approach has its pros and cons. Sometimes you just end up needing specialized tools, simple as that. In some cases, isolating different parts of a larger program can make things more safe. By this I mean that isolation prevents a

Starting point is 00:29:05 certain class of bugs from recurring. It doesn't protect you from the anger of the computer itself. The downsides, though, are pretty annoying. For one, interfacing multiple programs together can get messy. This really depends a lot on how you do things. I don't want to get deep into the inter-process communication strategy kind of talk. Instead, let's just think of this as added complexity. You have to have some way for programs to talk to each other. If you don't get this right, or if one of your programs changes unexpectedly, then communication will break down. The result here is you get a whole new class of bugs to deal with. So just keep this in mind as we move forward. These multi-part systems have some added complexity and

Starting point is 00:29:51 perhaps danger. We've already talked about the queue system part at length, so let's look at the middle part, this automatic theorem proof, or the deductive system. This was written in ALGOLW, but it was just the implementation language. That's the engine that runs the deductive show. Logic was actually programmed in its own language. We're basically looking at an interpreter that sits between two Q system programs. The birth of Prolog has a snippet of this deductive language. This is probably the earliest ancestor of Prolog itself. In other words, we've finally made it. We're on the bus.

Starting point is 00:30:37 This doesn't, however, mean we're looking at a full-on big fancy language. We aren't even looking at a very nice language. Now, this is one of the spots where I hit speculation, so be warned. The birth of Prolog, which is the main article that I've been using to put together this whole series, is a little vague about this random chunk of code. It doesn't say if it's pseudocode, actual source code, or just a fun little example. I wasn't really able to find evidence one way or the other, so I'm just going to assume that this is source code. If that's the case, then we're looking at something very interesting. This code doesn't use a normal character set. It has wild characters like the upside down capital A and the lower caret and the implies arrow. These characters are normally used in formal logic.

Starting point is 00:31:28 There's this whole set of symbols that people with PhDs in logic use to write out theorems. These wacky characters show up right next to normal calling conventions. We have square brackets holding lists and functions invoked with parentheses. I'm probably reading too much into this, but to me, this feels like a transitionary language, a transitionary technology. This isn't all too strange to see at an early stage. Even Lisp, the granddaddy of strange languages, had this weird transitionary period in its evolution. The earliest papers on Lisp sport syntax that's taken directly from formal mathematics. So this kind of transitionary syntax, partway between formal theorems and

Starting point is 00:32:11 programming norms, isn't all that strange. I'd like to point out something else. This proto-proto-prologue looks nothing like Lisp. Now, you may be thinking to yourself, wow, Sean, that's a bit of a non-sequitur. I mean, of course it doesn't look like Lisp. It's a different language, after all. The programmers at Marseille weren't Lisp hackers. They used Algol, which is just a different language. This may seem like a silly tangent at first, but this is actually a really important distinction because it puts Prolog on a different part of the programming family tree. Planner, which we discussed at length last episode,

Starting point is 00:32:55 is firmly part of the Lisp lineage. The developer of Planner, Carl Hewitt, was a Lisp programmer. Prolog, when we finally get there, is not related to Lisp. This is, in part, because it was developed outside the existing Lisp ecosystem. I'm harping on this because it's really interesting to me. This is a really neat example of how languages form into lineages. It's not always some firm and fast connection. In a lot of cases, it comes down to what programmers are using and the crowds they run in. That puts a very human face to things. Prologue isn't so distinct from Lisp because someone sat down and said,

Starting point is 00:33:38 I don't want any of that Lisp stuff in my language. No, it's different from Lisp because the people who developed Prolog didn't use Lisp. Alright, to digress a little bit, let's talk about the deductive system, that weird logical code. That was the heart of this story understanding program. From what I've read, it seems like a pretty basic program. More an experiment than a useful product. The program would be given a simple story. Then the user could ask it questions pertaining to that story. All interactions were carried out in normal French, or at least pretty normal French. It wasn't all caps, but that's no fault of the program.

Starting point is 00:34:21 We're looking at something really, really cool here. In our earlier examples, Sherdlou and the TALM forecast translator, the programs were pre-supplied with a certain level of logic and information. Sherdlou knew about a room full of boxes and how they could be manipulated. TALM's system knew a thing or two when it came to talking about the weather. Helm's system knew a thing or two when it came to talking about the weather. In this new program, the user supplies all knowledge and relations in the form of natural language text. You tell the program a story.

Starting point is 00:34:54 Q-System breaks that down into something the computer can handle, into a set of trees. The prover gets past state and relational data from Q-System. There is a more technical level at play, but, at least from the outside, the knowledge here all starts off as human language. The program only needs to know how language can be mapped into logical statements. That, sure, is specific, but it's far more abstract than a program that knows how to talk about the weather. That means that we're looking at a more powerful and more general approach to language understanding. It's during

Starting point is 00:35:30 this proto-proto-prologue period that Scotland enters the picture. From the birth of prologue, quote, while continuing his research on automated theorem proving, Gene Trudell came across a very interesting method, SL resolution. He persuaded us to invite one of its inventors, Robert Kowalski, who came to visit us for a week in June 1971. Later that year, Trudell would run into Kowalski again at a conference in Edinburgh, where Kowalski worked at the time as a computer science researcher. Now, this is one of those times where I get a little out of my depth. Prior to this meeting, the programmers at Merce were kind of out of circulation, so to speak. They weren't privy to what other researchers in the field were actually doing. Birth of Prologue even confirms

Starting point is 00:36:22 that they didn't know what Lisp was until the Edinburgh trip. Likewise, the team wasn't really in the loop about other automatic provers. They didn't talk to other people in the field. That is, until 71. It sounds like their introduction to Kowalski was a bit of a revelation. His big contribution moving forward is so-called SL resolution, selective linear resolution. This is a strategy used for resolving logical proofs. I've been trying to understand the ins and outs of SL, but I haven't really gotten super adept

Starting point is 00:36:59 at logic in general. As such, you're going to get the Sean filtered view. The important part of the method, at least as near as I can tell, is that it's linear. That means that it goes one step at a time. You start off with a set of clauses, the equivalent of a statement in a programming language. Then you have goals. These are the questions that the programmer asks of the computer. You're basically just saying, given these clauses, given this set of rules, I want to find a way to reach this goal. You pass in a pile of facts and relations, then ask the computer what it all means. If the computer doesn't have enough information,

Starting point is 00:37:39 it then tries to derive new clauses until it can find an answer for you. This is the deductive part. Given a set of clauses, you can, in theory, make new clauses that are equally as valid. In SL resolution, this is done in a series of discrete steps. At each step, there may be multiple ways forward. The computer could be faced with multiple choices. In other words, SL resolution forms a tree of possible solutions. This lends itself well to the digital method of choice, brute force.

Starting point is 00:38:12 You can construct a digital SL prover somewhat easily while still following the method's formal definition. Just take your logical clauses and goals, make a tree, and then start down one of the branches. If you run into a branch that isn't correct, and then start down one of the branches. If you run into a branch that isn't correct, then backtrack to the last decision point. With the judicious use of computer time, this is a pretty nice approach. It also jives with earlier tree-inspired work that Kolmarauer was familiar with. SL resolution ends up being integral to the development of Prolog itself. The final language ends up being integral to the development of prologue itself.

Starting point is 00:38:45 The final language ends up using something pretty similar to Kowalski's formulation. Now, we can keep traveling along the timeline from here. In 1972, the next step towards prologue occurs. This happens during the development of un système de communication comme machine et francois. Sounds intimidating, no? And hopefully I didn't destroy the language too bad.

Starting point is 00:39:16 So here's the deal. This is the first paper that describes prologue. And it's in French. Normally this is where I get stuck, but I actually have a way out of this bind. I was able to talk one of my friends who does know French into translating parts of the paper for me. So here's the story, constructed from the birth of Prologue and a man-machine communication system in French. In 1972, the team at Marseille receives a new round of funding. In 1972, the team at Marseille receives a new round of funding.

Starting point is 00:39:50 This was said to be used in the development of a smarter human-computer interface. It would be something similar to the previous story deducer, but more complex. Ultimately, this research could lead to a new style of text interface, something smarter and more user-friendly than the usual blinking cursor. Here's where it gets interesting. For this new project, it was decided that a new tool was needed. The earlier story deducer used multiple components in multiple languages, which was a bit of a hassle. It would be much better to just have one unified interface. This is what the 1972 paper has to say on the matter. This is how Prologue is first described publicly. From the translation, Prologue was conceived as a programming language to bridge first-order logic and languages like

Starting point is 00:40:41 Lisp or Planner, for example. It is, at its core, an automated deducer that allows the easy programming of formal data such as algebraic formulas, trees, lists, natural language, etc. The paper continues. Most automated deducers that exist are often, because they are so far from the programming language, hardly effective outside of mathematical problems. The user, in effect, has only little means to control the execution of their program, and moreover, the syntax of the languages accepted by these deducers, in general composed of formulas written in functional form, is too rudimentary to allow for a synthetic vision of the objects being processed. These are the pitfalls that we wanted to avoid in creating Prolog.

Starting point is 00:41:31 There are a couple things to note here. First is that the paper name-checks Lisp and Planner. That's important because it confirms the timeline given to us in the birth of Prologue. By 72, Kohlmerauer and his colleagues had gotten a look at other logical languages. They didn't see these languages as a good fit for their specific project. It's implied here that Lisp and Planner are a little too far removed from formal logic, a little too refined, you could say. I think this is especially the case with Planner. That language is just a deducer. It's programmed in such a hands-off way

Starting point is 00:42:13 that the programmer basically loses control when you get down to execution time. It's not really a full-on language. Prologue is described as something to fill the gap. It's going to be a programming language that can express formal logic, but it will also be a programming language that can be controlled. That's a key distinction here. From everything I've read about Planner, the language feels very, well, implicit is the word. It just kind of does its own thing after a certain point. It's less of a language and more of an interface for a deductive system. Prologue, on the other

Starting point is 00:42:52 hand, is going to be more of a language for deduction, if that makes any sense. It's going to have all the deductive jazz plus the usual tools that programmers love and crave. There's also something important here that, at least in my opinion, is inspired by Sherglu, or maybe Lisp, from the birth of Prologue. The objective we had set for ourselves was extremely ambitious. To have a tool for the syntactic and semantic analysis of natural language End quote. Let me rephrase that for you. Early on, it was decided that in Prolog, code and data needed to be represented and treated the same way.

Starting point is 00:43:45 They needed to be interchangeable. Specifically, the same syntax and tooling used for defining logic would also be used for defining data. The same is true in the inverse. This, dear listener, is what the Hacker's Dictionary refers to as Kung Fu. The Sherdloo paper that I covered last episode talked about doing the same thing. Terry Winograd described Sherdloo as storing data in the form of code. Data as functions. Now, either the boys in Marseille were pulling from Winograd, they were using a lot more Lisp than I first assumed,

Starting point is 00:44:22 or they came up with the idea independently. Whichever the case, this is some cool stuff. The word that you use for a language like this is homoiconic. It means that code can be treated as data and data can be treated as code. In other words, a homoiconic language can modify itself in a reasonable way. A homoiconic language can modify itself in a reasonable way. It can produce new code in its own language, alter old code, and analyze itself. This is similar to what Winograd was going for with Sherlu.

Starting point is 00:45:00 By storing data and code in the same way, by tracking knowledge as executable code, Winograd gained a wild amount of flexibility. Prologue as a language was intended to have a similar type of flexibility. But before we continue, I do want to clear something up. We're going to be getting into Prologue itself, into the nitty-gritty. I want to put forward a question before we go any further. That is, should we consider Prologue a programming language? That is, should we consider Prolog a programming language? This is another one of those kind of obvious questions that I like to ask. The gut reaction here is an emphatic yes.

Starting point is 00:45:38 But if you think for a second, things start to get slippery. Fundamentally, a programming language is just a way to tell a computer what to do. It's a rigid notation for controlling a machine. Now, don't get me wrong, Prolog has that. It has syntax, semantics, the whole nine yards. But it also does more than a traditional language. Prolog, much like other logic languages, is still relatively implicit. It can be set to run and just do its own thing, creating deductions and working up new logic along the way. Kohlmerauer and Roussel refer to prologue, at least in this earlier version, as a tool. They say that prologue was first developed more as a tool than a language. What do they mean by that? Well,

Starting point is 00:46:26 this is where I have to reveal a secret about programmers. Actually useful code only makes up part of your output. If you're serious about programming, then you spend at least some time developing tools that help you program. This can range from software that automatically tests your program's outputs all the way up to interfaces that let you control your software. Prolog, at least at first, was a tool for controlling a larger NLP system. It was intended as a way to configure the team's natural language interface project. This was new code put together for deduction as well as string processing. So, naturally, some way to configure everything was needed. Something that could tie input to logic to output. Prolog was developed as the solution to that. This, in my mind, puts Prolog in an

Starting point is 00:47:21 interesting case. It's a programming language that was intended for a very specific task, and it works with a very specific implementation. That last part is a little tricky, so I'm going to quickly elaborate. Most languages are defined as a standard. That standard will describe things like syntax, how the language looks, and semantics, basically what different parts of the language do. These standards usually don't describe how the language should be implemented. They don't say that memory allocation should use a certain algorithm, or that vector addition must

Starting point is 00:47:57 be implemented using special processor extensions. That kind of stuff is left as an exercise to the reader, or the programmer, rather. Prolog, on the other hand, specifies how certain operations should be carried out. Deduction is part of the spec, which is in itself a very specific algorithm. So Prolog is a little more than a language. It's kind of like a language plus a very specific technology. With that all said, I think it's time we actually get to the language itself. Tool, language, interface, or whatever you want to call it. In 1972, Prologue is developed for this one specific human-computer interface project. In 1973, things were looking up for the team in Marseille, now officially named the Man-Machine Dialogue in Natural Language Lab. This was bolstered by the success of their program and

Starting point is 00:48:54 a tour of colleges in the UK and America. This is where the final jump occurred, and Prologue shifted to becoming a more capable language. From the birth of Prologue again. Quote, Between February and April 1973, at the invitation of Robert Kowalski, Philippe visited the School of Artificial Intelligence at the University of Edinburgh. The paper continues, The result of this visit and the laboratory's need to acquire a true programming language prompted our decision to lay the foundation for a second prologue. End quote. Lisp and Planner, despite being perfectly fine languages,

Starting point is 00:49:32 just weren't the right language for the jobs dreamt up in Marseille and Edinburgh. Lisp, despite being the lingua franca of AI, wasn't specialized enough for logic. It was a general-purpose language, after all. Planner, on the other hand, was a little too pigeonholed. It couldn't work as a full-on language. This new Prologue thing was looking pretty slick, if it could be given the right push. These are the nuts and bolts of how language specialization occurs. These are the nuts and bolts of how language specialization occurs. So, what did prologue become? What is this second prologue?

Starting point is 00:50:17 The initial prologue, the first prologue, well, it's a bit of a mess. I don't think we can blame anyone for that. It was a first draft, and it was still meant as a tool at that point. The bones are all there, but certain details are a little bit funky. The syntax is especially, frankly, just messy. For instance, a line of proto-prologue could end in any of four ways. Dot dot, dot semicolon, semicolon dot, or semicolon semicolon. It's a bit of a tongue twister. Each of these had an effect on how that line was interpreted. It shakes out to changing how deduction occurred and paths were searched. Birth of Prologue even admits that these suffixes

Starting point is 00:51:00 were useless, confusing, and needed to be removed. The syntax just, in general, isn't very tight yet. Blocks of code end in the word amen, for example. None of this is awful, per se, but to me it feels like something that would show up in an internal tool, which makes sense. That's just kind of my two cents about the syntax of early Prolog switching into modern second Prolog. Now, onto a larger part of the language. The core unit of code in Prolog is a clause. This is also the core unit of data, since data and code are interchangeable.

Starting point is 00:51:39 I won't be pointing that out every time, you just gotta remember that equivalence. This isn't to say that a clause is the smallest possible thing in prologue. There are parts to a clause, and there are variables and whatnot that are smaller. But in general, each line of prologue that you write will be some type of clause. The entire language is geared towards handling clauses one way or another. In a way, there's almost this ideological similarity here with Lisp, the list processing language. In that language, the core unit of everything is the list, and the language is built towards working with that construct. Does that imply a connection? Well, perhaps. I keep bringing up the fact that Prolog is unique, in part because it's not a relative of Lisp. So then why is there this similarity?

Starting point is 00:52:31 It could be that the crew at Marseille was partly inspired by Lisp itself, that's the simple answer, but it's also kind of boring. The more interesting possibility is that both languages share this ideology because they're both homo-iconic. Both Lisp and Prolog can treat code and data interchangeably. Each language can produce and run code in its own tongue. In order to do that, both Prolog and Lisp do the whole code-data boogie, and are each built around some core data structure. That seems to just be how you make a whole iconic language. I'm sure there's other ways to do it, but I'm racking my brain to think of any. Now, these clauses come in a few forms. Facts and rules. Facts are the more simple case. It's essentially like setting a variable in a normal language, just prologue style. You're essentially telling

Starting point is 00:53:26 prologue that something is true about the world. The usual example is to describe that something belongs to a set. You could say square x, which is written out as square and then in parentheses x. That tells prologue that x is a square. The convention here looks almost identical to a function call, with parentheses in everything. Rules are full-on logical clauses. This is where you actually build up the logic and rules that Prolog will later make deductions about. You're setting up your digital world. Rules are written a little backwards from what you'd expect. The first part of the rule, the head, is the outcome you want. The second part, the body, is the condition. So if you wanted to write a rule that meant all squares are rectangles,

Starting point is 00:54:23 you'd write something like rectangle y is implied by square y. The middle part, the implied by, is actually a colon dash. It's the prologue version of an equivalence, sort of. You can also read that clause as y is a rectangle if y is a square. Once again, just to point out, the syntax here looks a lot different than a more commonly used language. Now, here's the neat part. That's pretty much all of Prologue's syntax, right there. It's a fully unified syntax, which means that everything is a clause, and all clauses are written in the same way. This is the part where you should be a little confused, perhaps. I went through the same feelings, I do swear. I just said there are two types of clauses,

Starting point is 00:55:14 which are written in a slightly different way. That doesn't sound very unified or uniform to me. You see, there isn't actually such thing as a fact in prologue. I've tricked you. I've pulled the wool over your eyes. A fact is actually a shorthand for a rule that's always true. Internally, prologue sees a fact as a head, the outcome you want, with a body that's just set to true. So if you get down to it, there's actually only one type of clause. When I realized that, I felt like I hit the next level of programming. I hit the next turtle as I was going all the way down. This puts Prologue into a very small club of languages. I like to think of these as first principles languages since they're essentially derived from scratch.

Starting point is 00:56:05 You got Lisp in the club, of course. That's the progenitor of all of this mindset. You have Forth, which is for stacks. And you have Prolog, the clause champion. For me, this realization really puts Prolog into a different light. I said last episode that the first time I saw some prologue code, I kind of recoiled. I couldn't make heads or tails of it. That's one of the reasons that I've been so slow and methodical in these episodes. The papers describing prologue read, in large part, like formal logical proofs. That's intimidating, but it's not unique. Early Lisp papers read in much the same way.

Starting point is 00:56:54 The same goes for 4th. These kinds of languages are just so refined, so distilled down to their essence, that they almost look like, for lack of a better term, an alien tongue. This core logic is what Prologue is constructed out of. The clause is the building block of the entire language. But here's the crucial part that makes the second prologue special. Not all clauses are fully logical constructs. Some are non-logical or extra-logical. Instead of creating a rule, these extra-logical clauses have side effects. At least, that's how Kohlmerauer and company word it. These are, by and large, input-output functions and parsing tools. Prior to Prolog, the team had been using Q-System to handle these tasks. They had to get raw text into some type of data structure that could be useful.

Starting point is 00:57:47 Prolog essentially subsumed that responsibility. This had two impacts on the language. First, it made Prolog fully useful. You only needed to know one language to do NLP. You only had to deal with one system if you were using Prolog. Second, it means that Prolog isn't entirely declarative. Some parts of the program actually run in the order they're written in. That's a normal thing. That's a trait of imperative programming languages like Fortran or C or BASIC or anything you've been exposed to in the past. To get into that, we need to go back to clauses. There is a smaller unit of code than a clause, a predicate. In Prologue, a predicate is that thing that looks

Starting point is 00:58:32 like a function call. It's a name followed by an argument that's enclosed in parentheses. But looks can be deceiving, at least sometimes. A predicate isn't necessarily the same as a function. When you call a function in an imperative language, you are, and here's the fancy word for it, invoking a section of your program. You're saying, hey computer, I have this chunk of code that has a special name. I want you to execute it right now. Prolog predicates look like functions, but they don't act as functions. That's another thing that tripped me up about this language. A predicate is actually just telling Prolog something about the predicate's arguments. You also, in a lot of cases, just define predicates willy-nilly on the fly. So square of x is telling Prolog that you have something called

Starting point is 00:59:27 x and it is a square. It's almost like an assignment, kind of. Like I was saying, most of the time you're creating new predicates on the fly. You don't have to go and define anything to use square on x. You just do it and Prolog works that out on its own. These are purely logical predicates, as in, their only effect is to create a new rule. These types of predicates have an impact on Prolog's internal logic. Sometimes, however, you run into a predicate that's already defined by Prolog. These built-in predicates are called primitives, since they are, well, the most primitive tools you can start a program with. Each primitive has a predefined behavior. Some are still logical. Diff, for instance, will tell you if its two arguments are the same.

Starting point is 01:00:20 That is pretty logical, and it can be chained with other predicates to make useful rules. Crucially, primitives like diff have no side effects. They don't do anything to the overall state of your program besides saying, oh yeah, A and B, they're the same. Some primitives, however, come with side effects. These are non-logical or, as I've seen it written in some papers, extralogical predicates. I just love that name. It's very ominous. Behold, I have brought you a gift of extralogical predicates. An extralogical predicate simply has some kind of side effect. It changes the state of your program, and it does so directly. This is, dear listener, a bit of imperative programming dropped right into Prolog. And thus, the purely declarative dominoes all topple.

Starting point is 01:01:19 By adding in these state-based predicates, Prolog has to support imperative programming. It has to be able to run one instruction after another, since some of those instructions care about the current state of the program. As an example, let me tell you about the delete primitive. This will remove a rule. This can also remove data from a list, but that's the same as a rule anyway. For delete to make any sense, it has to be run on a list or a program that's in a known state. It wouldn't be very useful if you just have delete run whenever Prolog gets around to it. That would lead to unpredictable and useless code. Thus, delete is this special case where, when Prolog sees it, it runs it right away. You

Starting point is 01:02:07 are actually telling Prolog, hey, I got some of this code, it's called delete, run it right now! Whereas normally, Prolog just stores rules and saves them up for later. Far from compromising Prolog's logical vision, this splash of imperativeness actually makes for a better language. This is how Prologue was able to replace multi-part NLP systems. You can use purely deductive programming to do deduction, and you can also drop in some imperative code to handle tricky situations. That could even be something as simple as running a deduction, then asking for some inputs from the keyboard, and then running a deduction. That's imperative, but it's also doing deductive work. It's a beautiful combination of two different types of programming. This is exactly what made Prologue

Starting point is 01:02:58 such an exciting new language. As we've seen, it's possible to go pretty deep into a niche. language. As we've seen, it's possible to go pretty deep into a niche. Q-System and Planner did just that. Q-System only worked for parsing and translating text, and even then only for really doing tree-based translations. That was its whole world. Planner, despite theoretically having text parsing capabilities, was only ever used for just logic. It could only handle deduction. When it came down to actual NLP, these types of systems had to be chained together to do anything useful. In other words, you had to have discrete steps that came one after the other. So despite using fancy declarative languages with all kinds of cool, implicit, automatic things, you still had to do things one step at a time.

Starting point is 01:03:48 You had to parse, you had to deduce, you had to spit out. There might be more intervening steps, but there's still an imperativeness to it. You still need a little bit of that state machine going on. We see this exact DNA in Prologue. It's an interesting case of despecialization. The team at Marseille went from specialized languages to a more general-purpose one. To do so, they had to make compromises, like adding extra-logical predicates. Prologue is not purely declarative, but that doesn't really matter.

Starting point is 01:04:23 Hard distinctions like that only have actual meaning if you're talking theory. In the real world, saying a language is declarative is more like a shorthand. It gives you an idea of what you're kind of dealing with, but it doesn't explain a whole lot. By including some imperativeness, Prologue became practical as a one-stop tool, a full programming language. For me, this is getting close to an elephant in the room. This is never addressed explicitly anywhere in the sources I've read, but I think it's something we should discuss. The team working on Prolog were old hands at another language called Algol.

Starting point is 01:05:08 That language, initially called the International Algebraic Language, was the first attempt at a universal programming standard. This was an international effort, but ALGOL didn't get much traction in America, and ended up being much more popular in Europe. But even then, adoption was slow. ALGOL was first formalized in 1958. It took a number of years for implementations to hit the scene. One of the big problems with ALGOL was that it was too abstract. It was designed as a language that could run on any computer. The specification was totally separate from any implementation details.

Starting point is 01:05:40 At the time, that was a pretty new approach. This is good if you're up in theory world, but at a certain point, you kind of have to deal with the real world. Early ALGOL specifications were so theoretically pure that they didn't include anything about handling inputs or outputs. You know, two of the most important things a computer can do. That was all hardware dependent, after all, so the committee designing ALGOL just left it out of the spec. That rendered per-spec ALGOL basically useless. Its early implementations varied quite a bit from the standard, because when it came down to it, the standard wasn't for a real language. It was for a theoretical language. To me, the mixed nature of Prolog, the fact that it's

Starting point is 01:06:24 mostly deductive except where things would be impractical, that invokes this ghost of algol past. The team that developed prologue were familiar with algol. In fact, proto-prologue was implemented using a version of algol. In my head, it feels like they might have learned something from this older language. Maybe it taught them that to create a truly useful language, you had to be a little bit pragmatic. You couldn't lean totally into theory. Alright, that brings us to the end of the month of Prologue.

Starting point is 01:07:01 And I gotta say, I'm pretty beat after all of this. These episodes took a lot more work than usual to prepare for. Next time is going to be something a little easier, I hope. To close out, I want to come full circle. The first time I saw prologue code, I kinda gagged. I actually remember telling one of my friends that I thought it looked like an esoteric language, because I just couldn't understand it at all. That's a weird feeling for me. I can usually suss out most programming languages. I can at least get an idea of what the code is doing. Prolog threw me for a total loop. It's not really like other languages. The syntax is different, and the actual purpose of the code is totally different. The reason for this is simple. Prologue is a very unique language. It didn't evolve from

Starting point is 01:07:53 C or Lisp or Fortran. Rather, it formed as a tool for deductive reasoning and natural language processing. That tool was expanded and adapted into a full language. There was inspiration from other languages along the way, but Prologue largely remained true to its roots. This was, in part, due to the relative isolation that Prologue formed in. Its creators even admit that they didn't even know about Lisp until they were well on the way to creating Prologue. Another factor is the niche the Prolog falls into. The language was developed to be a one-stop shop for natural language processing. It was designed to handle deduction and string processing. That's something that hadn't really

Starting point is 01:08:37 been done in a single language before. At least, not well. As Proue was forming, there wasn't any one language to look at as a blueprint. It took some good ideas from a number of languages and tools, and it added that to the distinctive base that was Prologue. We can look at this story in a number of ways, but I have my personal takeaway. I think the history of Prologue is a fascinating example of how languages develop. history of Prologue is a fascinating example of how languages develop. In this, we can see, in pretty good detail, how a language becomes specialized to a niche. We can see how choices are driven by what people know, what they're working with, and who they're in contact with. People don't just sit down and decide to develop a new language for no reason. There are human factors behind everything.

Starting point is 01:09:26 Thanks for listening to Advent of Computing. I'll be back in two weeks' time with another piece of computing's past. If you like the show, there are a few ways you can support it. If you know someone else who'd be interested in the history of computing, then please take a minute to share the show with them. You can also rate and review on Apple Podcasts. If you want to support the show directly, you can do that through buying Advent of Computing merch or signing up as a patron on Patreon. Patrons get early access to episodes, polls for the direction of the show, and bonus content. I'm working on a new bonus episode on the oldest computer in production right now,

Starting point is 01:10:01 so keep your eyes peeled for that if you're interested. You can find links to that and everything else on my website, adrenofcomputing.com. If you have any comments or suggestions for a future episode, then go ahead and shoot me a tweet. I'm at adrenofcomp on Twitter. And as always, have a great rest of your day.

Your Ad Here

Advent of Computing - Episode 113 - Prolog, Part II

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.