Advent of Computing - Episode 184 - What Is A Programming Language?

Starting point is 00:00:00 In 1945, Vannever Bush publishes an article in the Atlantic called As We May Think. It's best known for its description of the mimics, this theoretical machine that goes on to inspire the development of hypertext. But that's only part of the ink. The article is full of wild predictions for the near future. One that's always struck me as, well, a little funny, is the idea that human language will develop to be closer to some future machine. language. In discussing the possibility of

Starting point is 00:00:34 automated speech-to-text machines, Bush wrote this. Quote, our present languages are not especially adapted to this sort of mechanization. It is true. It is strange that the inventors of universal languages have not seized upon the idea of producing one which better fitted the technique for transmitting and recording speech. Mechanization may yet force the issue, especially in the scientific fields, whereupon scientific jargon would become less intelligible to the layman. End quote.

Starting point is 00:01:07 I think you can see why I find that funny, right? What's going to happen? The kids are going to start speaking computer at you, beep, boo-bo-bo-bop-bo-bop-bub-bop style? But there is something actually prescient here. Take out the recording aspect and consider it this way. Would it be possible to develop a language that both a... a human and a computer could understand. If that happened, then you'd run into cases where two humans could end up conversing in that language. You could also have cases where two computers

Starting point is 00:01:42 would converse in the same tongue. Ring any bells? That, dear listener, is what I call a programming language. We usually talk about programming as using some tool to translate a human-readable language into something a computer can directly understand. But there's a flip side to that. To use a programming language, a human has to wire themselves up to understand said language. I've found myself shouting snippets of code across the room at coworkers before. Some languages like Basic were even designed to be read aloud.

Starting point is 00:02:19 There's a given take here. When we find a digital middle ground, it's bound to be influenced by its surroundings. That means some of these new languages are heavily warped by the closest computer, and others are made in the form of still older technologies. Welcome back to Advent of Computing. I'm your host, Sean Hass, and this is episode 184. What is a programming language?

Starting point is 00:02:54 Before we get started, I have my usual plugs here, but I'll keep it short. If you like the show, you should go listen to Adjunct of Computing. It's the video after show for advent of computing. It's where me and my co-host Joe answer questions that you might have had about the show, fill in some of the gaps, and just discuss the topic in a less formalized atmosphere. Also, first week in August, I'm going back to BCF West at the Computer History Museum in Mountain View, California. Show up. I'd love to meet any listeners that arrive. That's the end of the plugs.

Starting point is 00:03:33 So, let's get down to the episode. I've finally escaped the 1940s, and that hasn't helped me much. I've blundered right into another super confusing topic. Today we'll be looking at APL, and yes, that is short for a programming language. You may know of APL as that weird programming language that uses a custom character set, or that initially APL was only really usable if you had an IBM Selectric typewriter with a special font ball installed in it. In fact, you might know of APL because keyboard nerds love to get special keycaps that have APL's symbols emblazoned on their surface. That used to be all I knew about APL,

Starting point is 00:04:22 but, dear listener, I've been hitting the books. I've been trying to understand APL itself, how to program it, and why it is the way it is. What I found is a mix of baffling, fascinating, and frustrating. Let me start off with the character thing. APL doesn't look like any other language, and I don't just mean it's, you know, a little quirky. I really, really mean it looks unique. Everything in APL is a single symbol,

Starting point is 00:04:55 and very few of those symbols are from the normal ASkey character set. An APL keyboard has one or two special symbols for every single key, and some of those symbols are composed of multiple characters smashed together. We get, for instance, the Christmas tree, as well as the upside-down Christmas tree. I'd really like to know why APL does that. If you encounter some of this code in the wild, it's impossible to read, at least to the uninitiated. It straight up does not look like software. It looks more like secret math formulas.

Starting point is 00:05:36 Or I guess a secret formula would really be something like an alchemical formula or a mystical incantation. Even once you get past the whole symbol thing, assuming you can, The mystery of APL unfolds almost like a fractal. So get this. APL is a functional programming language. You're expected to make programs by combining functions and operating on functions. Its core data type is the array, you know, a list. That sounds a little familiar, doesn't it?

Starting point is 00:06:13 Lisp, the smash hit from 1960, is also a functional programming language that's written to process lists. I mean, it's in the name. Lisp is short for list processor. One would assume that since LISP is one of the first programming languages and it's functional and list-based, that any later language that has those features must have some relationship to LISP, right?

Starting point is 00:06:44 Now, that assumption is, in fact, incorrect in this case. APL is very much its own thing. It doesn't have one ounce of Lisp in it. That on its own is interesting, right? How can two languages have the same top-line description but be totally different? That's something I really want to explore here. I think it speaks to something deep about context, doesn't it? How intent can shape decisions or something like that.

Starting point is 00:07:14 APL is also highly standards-based. In fact, it took some seven or eight years from the first drafts of the language to its first implementation. Along the way, a committee of sorts formed to work on the specification for APL. That also sounds familiar, almost like Algole, the classic language from around 1960 that was based around standards formed by a committee. But again, this was approached in a totally different way with a totally different outcome. And then we get to timing.

Starting point is 00:07:49 When was APL created? Well, would you believe me if I said 1957? That's when the first inklings of the language appear. If we take 57 as a birth date, then that makes APL the same age as Fortran. That age alone should make APL pretty foundational, but I'm willing to bet few of you listeners have ever used it, let alone read it. So let me dive in, and I will say that APL is truly unlike any other language I've ever encountered.

Starting point is 00:08:28 Have you ever felt like you're on the outside of something trying to look in? Like you're just seeing the edge of a whole culture that you don't quite understand? It's like hearing a joke, seeing someone laugh, but not understanding it yourself. It's almost like a miniature version of culture shock. That's how I've begun to feel about APL. The language is like a closed circle that I'm outside of. In trying to learn APL and its history, I've constantly ran into the edge of jokes,

Starting point is 00:09:05 or found myself just catching the tail end of some conversation that's been going on for years. In retrospect, I probably started from the wrong place. ACM's History of Programming Languages, Volume 1, has a session on the history of APL, written and delivered by some of the languages creators. History of programming languages, when available, is usually my starting point for programming language history. It's good history and its creator's own words. The issue, and also the benefit is that last part. You can tell a lot about someone by how they write.

Starting point is 00:09:44 what they choose to say and how they say it. You can also catch a lot of information by how information is delivered. The APL session is an odd one for these proceedings, and that speaks volumes. This session doesn't spend a lot of time on the early history of APL. Rather, it concerns itself more with how APL evolved once it was implemented. It pays special attention to how the language was simplified and generalized, or, to put another way, how the language became more terse and more sophisticated. It also makes aggressive, nearly violent use of citations.

Starting point is 00:10:26 Instead of repeating or rephrasing accounts of APL's development, or actually talking about APL's development and then citing a source, it just says you can learn more in citation 12. As a result, the session is just a very very very very very. very well-annitated bibliography. It's the tail end of conversation that's been had in other places. That, on its own, should give you a taste of what we're in for. The history of APL is very insular, but it is very well documented. That is, if you can find the documents.

Starting point is 00:11:05 To properly tell the tale, we have to start in 1950. In that year, Ken Everson started graduate school at Harvard. Harvard. His thesis advisor would end up being one Howard Aiken, a name some of you may recognize. Aiken was the force behind the Harvard Mark I, one of the first electronic digital computers to come online. That is important, you know, it's one of the first computers. I realize I don't talk enough about the Harvard Mark one on the podcast, and at this point, I kind of just want to start the joke that I don't like the Harvard Mark one. That's not true, but let's just run with it. Advent of computing doesn't talk about the Harvard Mark I. Instead,

Starting point is 00:11:49 we're going to talk about something actually important that Howard Aiken did. Oh, I don't actually mean that. Aiken had a more pressing role at Harvard that I think is often overlooked. As part of Harvard's faculty, Aiken was thesis advisor for many students earning their PhDs. He would force Everson and many others to learn to program. And in his role as advisor, he pushed Everson to take on a project that involved programming heavily. The result is that Ken was transformed from a math nerd working on a degree to a computer programmer. The transformation was total. In 1954, Everson graduated and Aiken was able to secure him a teaching position at Harvard. This is where the earliest appearance of APL happens.

Starting point is 00:12:45 So, what exactly was Everson teaching? Well, it wasn't quite computer science, but it was very close. Technically speaking, the first degree in computer science was issued in this period, around 53 or so. But the name computer science wasn't really in vogue. One of the first degree programs, this one at Cambridge, used the name Numeric Annalia. analysis in automatic computing. At Harvard, the name was automatic data processing.

Starting point is 00:13:16 Sadly, I don't have the syllabus for this class. What we do know is the new position would be both frustrating for Everson and serve as a source of inspiration. To quote from an unfinished autobiography by Everson himself,

Starting point is 00:13:31 although Aiken had mapped out a broad program that included economics, business applications, switching theory, operations research, numerical analysis and computer programming, it was largely left up to us green graduate students to flesh out the courses. I was appalled to find that the mathematical notation on which I had been raised failed to fill the needs of the courses I was assigned, and I began work on extensions to notation that might serve. In particular, I adopted the matrix algebra used in my thesis work.

Starting point is 00:14:05 The systematic use of matrices and higher dimension arrays, almost learned in a course in tensor analysis, rashly taken in my third year at Queens, and eventually the notation of operators in the sense introduced in Heavside in the treatment of Maxwell's equations, end quote. That, that shuts down a little crazy, right? Everson, freshly minted as a doctor, decides the best way to teach about computers is to invent a new form of mathematical notation.

Starting point is 00:14:38 He even uses half-remembered things he learned in classes that he only kind of took. If I were his student, I'd have some very harsh words for him. This sounds kind of like a college nightmare story. But there was actually good reason to do this. Iverson isn't just going crazy. He was teaching one of the first courses in computer science. We didn't really have the tools for talking about machines with good precision. At least, we didn't have a unified mathematical way to talk about machines.

Starting point is 00:15:16 You could get by using logic predicates. Some people did, but that has its own issues, not least of which is how painful it would be to describe a computer operation using logic predicates. It would basically be description from first principles. If you're teaching biology, you probably don't start by talking about how quarks bind together to form atoms. If a compsci professor ever did try this on for size, try to explain how, say, a loop works from logic predicates, I'd also have some strong words for them. There's also something hidden in here that you may not catch.

Starting point is 00:16:02 What Everson is actually talking about is a programming language, or at least it's a cousin to a programming language. Think about that for a second. What is a language, if not a specialized form of mathematical notation? Take Fortran as an example. That language is, at its core, a notation to describe a series of operations. It's pretty abstracted from pure mathematics, but it's filling the same role. You could even do mathematical proofs in Fortran if you were really hard-pressed. Assembly language is, all things considered, another form of mathematical notation.

Starting point is 00:16:44 It's not expressed in any traditional way, but it has a grammar and a series of rules. It's used to describe operations. You can even describe expressions and equations in assembly language. I think if you were smart enough, you could even do a problem. proof and assembly. At least, I can see how you could prove that one program is equivalent to another given some very tight constraints, but I'm getting ahead of myself. Fortran and assembly languages develop in very practical ways. You have to have some way to tell a computer what to do, so we developed systems to make that easier. There's also this kind of artifact

Starting point is 00:17:25 that comes along for the ride, because computers are technically machines that simulate mathematics, there are a lot of the little fancies of traditional mathematics that get shoved into programming languages and into concepts about programming. But that, in many cases, is incidental. Everson is going about things in a very different direction. He's teaching students how these new computers work. Part of that includes a pile of mathematics. Those mathematics need to be very precise and general purpose. Crucially, traditional math is actually missing a lot of operations that a computer can do. So Everson decides to make some mathematics of his own. He decides to work up notation for expressing things a computer can do. This is done separately from programming.

Starting point is 00:18:19 This is done as a way to teach about the capabilities and the design of a computer. Now, I said we don't have a syllabus for this course, but we have something similar. We have a textbook that Everson wrote with his teaching assistant that's based off the course. The book comes later, though, so it's not quite a primary source. Again, the syllabus would be nice, but this is good enough. The book is called Automatic Data Processing, and the teaching aid is one, Fred Brooks. Yes, the same one that later authors the mythical man month. The notations used in automatic data processing are, well, mixed.

Starting point is 00:19:06 The book uses very standard mathematical notation in some parts, logical notation in others, and then in the back half it uses something new. One interesting note is that the book deals pretty heavily in linear algebra. You know, your favorite type of algebra, right? You must be familiar. Or maybe not. You might be a Luddite. I think most people aren't actually used to linear algebra,

Starting point is 00:19:33 which I'd say is a shame. Computers are basically machines that do linear algebra very quickly. In linear, you deal with data structures. For us, that's going to be the most relatable definition, I think. You have normal numbers, which are, called scalers. You have lists of numbers which are called vectors. Then you have lists of vectors, which are called matrices. You can basically think of a matrix as a grid full of numbers. Linear algebra is a set of rules and operations for manipulating those three kinds of data structures.

Starting point is 00:20:13 These rules can be simple, like addition. If you add two vectors, then you just line them up and add each element across. If you add, say, to five element vectors, then you end up with a new five element vector where each element is the sum of the corresponding element in the other two vectors. That just follows normal convention from lesser mathematics. And by lesser, I don't mean less cool. I mean less complicated. There are rules that are also more specific to linear, things like the dot product, aka the inner product. are only defined for vectors. That one is the sum of the products of all elements of two vectors.

Starting point is 00:20:56 It straight up has no use in normal mathematics, but is used very often in linear because it's a way for you to operate on these larger data structures. Computing tends to be pretty closely related to linear algebra for a number of reasons. One is just the data aspect. Computers are really good at handling lists of numbers. In fact, that's kind of just what memory is.

Starting point is 00:21:24 Linear algebra describes how to work with lists of numbers. In general, it's handy to have some kind of math or theory to back up what you're doing. Hence, the importance of linear. If you're in college right now, go take a course. It's worth it. If you know how to program, it's even pretty easy. Just take this advice. Linear algebra is basically all loops. Everson's new notation actually shows up when he starts talking about a specific computer.

Starting point is 00:21:55 The book uses the IBM 650 as its victim of choice, since, you know, specifics do help when teaching, it turns out. Everson introduces this notation to describe programs and algorithms that could run on the 650. Now, this doesn't exactly look like a programming language. It looks like something between math notation and pseudocode. These programs are presented in a box, almost like an inset figure. The box has numbered lines of code. That's the earliest example of proto-APL, and, well, it's rough.

Starting point is 00:22:36 So what exactly does it look like? Well, first of all, it doesn't use equals, like at all. For assignment, it uses an arrow. That's not too crazy. There are other languages that do a similar thing, and there was definitely early programming notation that used arrows. What's odd is the direction. An assignment arrow will point to the left,

Starting point is 00:23:01 so you'd write a line like X, arrow pointing to X, 1. This points to one of the more crucial idiosyncrasies that shows up in APL. The language is read, right to left, not left to right. It's not super clear in this early notation, but the arrow implies it. At least, to me, it implies the reading direction. It's not x equals one. It's more like assign the value one to the variable x, or, I guess to be entirely correct, one functional assignment to x. What about comparison? Well, Everson doesn't use equals, that's for sure.

Starting point is 00:23:48 In this early form, we get a comparison operator, the colon. It's used to compare two values and then branch based off the result of that comparison. The branching, well, that's where things get pretty rough. To noteate a branch, Everson uses a line pointing from the comparison to the target. The line then has the actual comparator written next to it. So you get things like a line that says equals right on the side. It ends up looking a bit like a spider web drawn across a textbook. Now this isn't horrible, just like assignment arrows.

Starting point is 00:24:28 This is something that shows up in other places. It's just very simplistic. Call it, well, you can call it unrefined, if you wish. It will get the point across, but not in a very subtle way. It also leads to a practical issue. This early version of APL could not be used on a computer. The conditional arrows are one issue. A computer can't read that, or put more practically,

Starting point is 00:24:55 you can't encode that information on a punch card. It's not in the character set that a computer would accept or recognize. Automatic data processing's notation also uses subscripts and superscripts. You know, the little numbers that appear above normal texture below it. That's fine on paper because you can draw anything you want. But it can't be encoded on a punch card, at least not easily. The character sets of the 50s just didn't allow for that. You'd have to have some special way of telling the computer the next character is super or subscript.

Starting point is 00:25:30 At that point, you might as well just not use those kinds of scripts at all. That brings us back to assignment arrows. At the time the notation was developed, the mid to late 50s, character sets were very restrictive. Most machines would just support numbers and uppercase letters, plus maybe a few punctuation characters. These limitations end up shaping how many early languages look. What these character sets did not include, however, was a left-facing arrow. That just wasn't a thing. These factors combined mean that Everson's early notation couldn't be used on a computer.

Starting point is 00:26:12 But that was fine because Everson didn't want to. He wanted the notation for teaching and communications between people. He didn't view it as a programming language. It was a notation. The fact that it couldn't be encoded for a computer didn't matter at all. That 1954 class would be the first time the notation was ever used. used, and it would only be seen publicly once automatic data processing was published in the early 60s. But between those two events, Everson's code was used for one other thing. In 1960, Everson took

Starting point is 00:26:50 a sabbatical from teaching to do some real-world work. This would lead to an ill-fated project and the first real-world use of Proto-APL. Everson's job, really just a summer gig, was at McKinsey and company. He would be dropped directly in the middle of a project. A representative of Hawaiian Sugar had been trying to get a programmer to write software for them. The software was supposed to run on IBM 650. The issue, however, was one of negotiation. The sugar man knew what he wanted out of the program and how the business worked. The programmer knew how to make a 650 sing, but they didn't speak the same language. Metaphorically, I mean, they all spoke. English. Anyway, Everson stepped in as an interpreter of sorts. He quickly realized that he had an

Starting point is 00:27:44 easy-to-use solution, Proto-A-PL. Over the course of a few months, he taught both the suit and the programmer how to use this new mathematical notation. Then the two were able to solve their problems together. Business acumen was reduced to formalized mathematics, and that math be converted into a 650 program. There was still no compiler or interpreter or any kind of automation. This is all done by hand. The notation is just a means for communication, almost like a more precise form of a flowchart. But it was enough. To quote Everson, in a few months, the 650 programs were nearly complete and we Ford eagerly anticipated their use, whereupon the project was abruptly terminated, end quote. Disappointing the near term, I guess, but encouraging in the

Starting point is 00:28:42 abstract. This was proof positive that Everson was on to something. The idea of mathematical program notation showed promise, but where would it go from here? This is the part where I really need a spooky sound drop. Ever onward, ever onward, every episode has to include IBM. In 19, in In 1956, Fred Brooks left Harvard and put on his blue suit. This was just after he helped Everson teach that class on automatic data processing. It was only a matter of time before Everson would also leave Harvard. In 1960, he'd moved over to IBM Research, the same part of the org that Brooks had joined. This move worked out well for him.

Starting point is 00:29:31 During his last years at Harvard, he had been working on a book about his new notation, while at IBM, he had time to finish and publish that text. That book was simply titled A-Programming Language. It's where APL gets its name from, but the language in the book isn't APL, not quite. This is also how we end up with automatic data processing the book, not the class. Brooks and Everson collaborated on that while at IBM, so in a twist, the movie, from academia to corporate, well, it gave us better sourcing. These two books are actually tied at the hip from a programming language.

Starting point is 00:30:15 The unusually large contribution by Dr. Brooks arose as follows. Several chapters of the present work were originally prepared for inclusion in a joint work which eventually passed the bounds of a single book and evolved into our joint automatic data processing and the present volume. Before the split, several drafts of these chapters had received careful review at the hands of Dr. Brooks, reviews which contributed many valuable ideas on organization, presentation, and direction of investigation, as well as numerous specific suggestions, end quote. This really makes tracking lineage odd.

Starting point is 00:30:55 I also think this is why there's confusion over what year should be the birthday for APL. The first hint of APL we really get are from the APL book and the ADP book, but those are created with input from earlier college courses, work at McKinsey, and some work at IBM. So we don't have a perfect lens into the earliest days of proto-APL. Everything's been kind of polished through this collaborative editing process. This is also where we see the closed circle start to form. At IBM research, Everson is able to surround himself with co-conspirators. Brooks is in on the conspiracy since day one.

Starting point is 00:31:41 Aiding Falcoph is another early collaborator. Many later papers on APL are co-authored by Falcoph. In fact, he would give the language its name. Falcoph suggested using the title of the first book as the name for the language. Thus, the notation used in A programming language, became entangled with the later APL. This early IBM era is crucial because it's where APL transforms from a notation into something much closer to a language.

Starting point is 00:32:17 If you're looking for the roots of what makes APL so strange, then this is the place. I also want to point out just how odd this route to a language is. We're in the very early 60s, so let's see. take a comparison point, shall we? Let's say LISP is close enough for first approximation. That language starts out as a notation all the way back in 1958. It's not implemented on a computer until 1960. In that two-year gap, the notation shifts around and is adjusted. But, and this is the crucial thing, when LISP was first drafted, it was intended to be a programming language. The drafts even come with internal data representation ideas.

Starting point is 00:33:05 It was always supposed to run on a computer, at least one day. APL, on the other hand, didn't start with that intention. It was meant as a communications tool and a teaching tool. There is discussion of implementing APL as a programming language on a computer as early as 1962, but the real work doesn't get started until 64 or 65. So how did we get a true blue language? And how did the language transform during this process? This is the point where I've read enough of the early sources

Starting point is 00:33:43 that I can finally start using other sources. I've done the background reading so I can actually untangle citations now. In 1974, Everson and Falcoff publish a paper called The Evolution of APL. This gives us a good outline of how the language changed. They chose to call the era we've just discussed the period of academic use. That era of APL history ends when the forming cabal at IBM Research does something odd. They start using APL notation to describe computers. This is something I did not really expect to run into, but it does make a lot of sense.

Starting point is 00:34:27 Up to this point, APL has been described as a notation to formally describe mathematics. Specifically, the kind of mathematics done on computers. Linear, discrete, that sort of thing. It's being used mainly to describe programs and algorithms, but I ask you, what is a computer but a box full of algorithms that are ran on demand? There is a certain pureness of this new project, too, as explained in evolution, quote, Falcoff agreed to undertake a formal description of the machine language, largely as a vehicle for demonstrating how parallel processes could be rigorously represented. End quote.

Starting point is 00:35:10 APL had been used to describe some earlier IBM machines. In fact, Everson himself had used it when discussing the workings of the 650. But Falcalf's project was more ambitious. His target was the new System 360 family of computers. There's something hidden in that choice that made the idea of a formal description very exciting. System 360 was a turning point for IBM and for hardware at large. IBM would spearhead a number of new ideas with this line of computers, the central theme being compatibility.

Starting point is 00:35:47 There wasn't one System 360 computer that came in different sizes. It was a true family of machines. each machine was compatible with each other at the instruction set level. A program for one model of 360 would run about the same on any other model of 360, the only difference, in theory being speed. The underlying hardware, however, could vary wildly, and in fact it did. Different models used totally different internal designs, totally different technologies,

Starting point is 00:36:20 but they all exposed the same instruction set. architecture. This made that instruction set supremely important. For compatibility to work, there had to be a rigorous definition of that instruction set. APL was uniquely positioned just for this task. It was designed for describing algorithms in formal mathematical detail. This also leads to a funny issue. You see, APL itself doesn't yet have a stable or well-known specification. There's the notation used in a programming language, and that's about it. That's not a standard. And since the publication of the APL book, the rough specification, if we can even call it that, had already changed quite a bit. So Falcoff's formal description of the system 360 starts with a

Starting point is 00:37:14 primer on APL, as it stood in 1964. That's pretty funny to see. The The first few pages of the paper are basically a quick reference guide, and the rest is the actual formal description. This strikes me as funny because even the evolution of APL says to follow the changes in the language, you have to look at this primer that's stapled to the start of this one other paper. I think it's clear to see why this is needed, right? APL only barely exists at this point. It's a notation, but there's no.

Starting point is 00:37:53 formal spec or definition, no implementation. You can't actually run an APL program in 64. That said, it's become pretty well defined as a language despite the fuzzy situation. This primer is, I think, the most clear explanation of APL we get up to this point. The APL book is far more complete, but its length makes it, well, let's say it makes it pretty ponderous. Also, I tend to be an application guy over in a theory guy, so what can I say? A concise explanation of how to apply APL, that really helps me. Let me give you an example. Falcalf opens with this overarching description of the syntax, quote, statements are of two major types called specification and branch. A, specification statement incorporates a left-pointing arrow that implies that its execution

Starting point is 00:38:54 re-specifies the value of the variable to the left of the arrow by the value of the expression to the right of the arrow." Branch lines use a right-facing arrow to specify where to go next. Now, I want to be super careful here. I just outed myself as an applications guy, and this is some pretty subtle theory. APL is called a functional programming language, but what does that actually mean? The easy way to think about this is to go back to mathematics. So, a function, strictly speaking, is a rule that applies to some arguments and results in some output.

Starting point is 00:39:38 This is one of those things that sounds very general, but is actually very specific. A true function will always return some value. It also won't go and do something else. In other words, it won't have side effects. Saying F of X doesn't mean turn off the lights. It means give me a number back. This is in contrast to imperative programming languages. These are the more familiar tongues.

Starting point is 00:40:05 Fortran, C, even assembly language. In these languages, you give a list of steps to accomplish some task. Those steps can do anything. Those steps, crucially, do not have to return values. They can also have side effects. They're just steps for the computer to carry out. To use fancier terms, they mutate state. You're setting up a certain state in the computer, a certain set of data, and then telling

Starting point is 00:40:35 it to make some changes to that state. Functional languages just work differently. You write code by chaining the output of one function to the input of another function. You build up something closer to a mathematical formula than a list of steps. I guess to really simplify it, it's like the difference between the instructions for baking a cake and the chemical formula for a cake. But there are some oddities when you try to be purely functional on a computer. Computers are, at their heart, imperative.

Starting point is 00:41:10 Machine code is just a list of instructions to carry out. This is where we get back to our primer. Consider specification, or, to use the more common term, assignment. X equals 1, or X left arrow 1, that mutates state, and it's pretty necessary when you're dealing with a computer. In an imperative language, assignment is special. It's a type of statement that mutates the computer's state. it sets some region of memory to some value, or juggle some pointers or references.

Starting point is 00:41:46 APL treats this differently. Specification is a function. It does have the side effect of, well, setting the value of a location in memory, but it also returns a value. That means it's an honest-to-goodness function. This also means you can change specification and other functions. This, I think, Points us towards a few of the weirder points of APL. It's perfectly valid to say, using equals here for clarity, X equals Y equals 1. Both variables will be set to 1, or 1 plus x equals 1. That will return the value 2.

Starting point is 00:42:29 That kind of statement makes no sense mathematically. But because everything in APL is a function, that's perfectly valid code. You got to remember that APL is a function. own notation that sits outside of traditional mathematics. Branching is where this becomes most apparent. I also think this is where early APL is at its least functional. Math doesn't have a real concept of do this equation, then do this other equation. Line numbers don't appear in trad math.

Starting point is 00:43:05 And I think I'm just going to start calling it that. trad math. That implies a pile of funny things that aren't true, so I like it. APL has a concept of a branch. Now, there is an argument to be made that you can have a turn-complete language without explicit branching. You can create a purely functional language. But in practice, you're kind of required to have a conditional branch. That means you have to have a way to identify parts of your program, which means labels or line numbers. That, I think, is actually one of the more familiar things in APL. So let's put this together.

Starting point is 00:43:46 APL is composed of specifications and branches. That means that each line of code will either set a variable or jump somewhere. That sounds pretty imperative until you consider the functionality here. specification and branching are just two endpoints for chains of functions. So align will always say run this pile of functions, execute this abstract arcane formula, then either throw the result into a variable or jump to another part of the program. There's another trick that becomes, I think, a lot more clear in this version of APL. That's how it hides functions in plain sight.

Starting point is 00:44:32 APL functions don't look like other functions. In most languages, a function is a name followed by a list of arguments, usually inside parentheses. That also happens to be used in Trad Math. Think of something like F of X or even print Hello World. If you see that in print, it will have parentheses around its arguments. That syntax is pretty wasteful. You have to name the function and then use some special symbols to say, all right, time to call it. Not very concise. And as we know, APL's main design goal is to be concise.

Starting point is 00:45:13 I'm still getting to the point of figuring out why, but it is emphatically repeated as a design goal. A good language should be concise. There's an ancillary issue here. A language like Fortran supports two different things that do work. Functions are just one of them. The other are operators. Things like plus and minus are operators. They work differently than functions. But there's still things that do work on data.

Starting point is 00:45:45 This split operation design is carried over from math. Trad Math has operators and functions that exist as two different classes of entities. that can get pretty confusing. It's a special case. And in general, special cases lead to a whole host of issues. They lead to whole classes of bugs. Wouldn't it be nice if there is only one type of thing that worked on values and returned values?

Starting point is 00:46:14 The trick that APL pulls is making everything a function. Well, everything that's not a literal value or a variable. There are no operators in the traditional sense, Even functions, well, functions in APL aren't very traditional. So let me get the cat out the bag. In APL, a function is represented by a single character. Functions either have one or two arguments. In APL terms, these are called monadic or diatic functions, respectively.

Starting point is 00:46:49 Note how that's another layer of insulation around the language. that there are these terms that a normal programmer would just never recognize without a lexicon or a dictionary or something. For a monadic function, a function with a single argument, you write the symbol followed by its argument. An example of this would be the minus sign. If you write minus five, then that will return the value negative five. For diatic functions, one with two arguments, you use infix notation. That is, you write it like a math operator. Think of something like 1 plus 1. In APL, that's calling the function plus with the arguments 1 and 1.

Starting point is 00:47:36 Now, almost all APL functions can be used both monadically or diatically. Again, note how specific and special this language is. As an example for this dual use, take the minus function. You can use that monadically to return the negative of a value, or you can use it diatically to subtract two values. Functions also tend to operate on whatever you give them. An APL function will be defined for scalers, single valuables, or arrays. So a single function can do a lot of different things. That's pretty economical design.

Starting point is 00:48:22 APL basically does a two-for-one here. You get two functions out of one symbol, plus all the different possible input combinations. Also, note that there's an assumption here. Each function is a single symbol. That means you have to be economical because there are only so many symbols you can think of. At this point, 1964, APL still is a paper-only thing.

Starting point is 00:48:48 But that changes very soon. For completeness, I also have to come back to the point I made about operators. APL does technically have something called operators, but there are actually functions that take functions as arguments and return functions. I think there's technically a specific difference between an operator and a true function, but for our purposes, these are just function functions. They don't bear any resemblance to operational. operators used in trad-math or a more conventional language like Fortran.

Starting point is 00:49:27 Did you know that IBM had an architect on its staff back in the day? I have to be clear here, I don't mean a computer architect. I mean a building architect, you know, classico style. In 1956, Elliot Noyze, a prominent architect of the day, was brought on to create a corporate style guide for IBM. Noise ended up working with a number of designers and artists to create this style guide. It solidified IBM's visual language for decades to come. A result of this collaboration hit the scene in 1961, the Selectric Typewriter.

Starting point is 00:50:06 Noise handled the industrial design in line with his larger IBM brand guide. The Selectric is iconic, both for its super period 60s design, and a slick piece of new technology. Most early typewriters used a single arm for striking each character. The Selectric, well, it used a ball. The actual typing element was a golf ball-sized device studded in characters, all cast in metal. When the Selectric went to pound out a character, it first spun and angled the ball to select the correct spot, and then it hammered down. Most importantly, the ball could be removed and replaced.

Starting point is 00:50:51 That meant that the Selectric allowed users to select different fonts, truly wild tech for the period, and technology that could be exploited by a very tight circle of nerds within IBM, from Evolution of APL. Quote, in 1964, a number of factors conspire to turn our attention seriously to the problem of implementation. One was the fact that the language was now sufficiently well-defined to give us some confidence in its suitability for implementation. The second was the interest of Mr. John L. Lawrence, who, after managing the publication of our description of System 360,

Starting point is 00:51:33 asked for our consultation in utilizing the language as a tool with his new responsibilities for developing the use of computers in education. end quote. Some very noble reasons. The language was ready and someone wanted to use it to teach again. There's a third motivation here. Another programmer inside IBM had attempted to implement APL, but it didn't go very well. Now, that may be a little too harsh. The story is a little more complicated than just that. Herbert Hellerman had attempted to implement APL at some point prior to 64. but didn't finish the project. In July of 64, he published a paper called Experimental Personalized Array Translator System, which describes a language inspired by APL,

Starting point is 00:52:25 but distinct from it. The paper even specifically says the language is a subset of early APL notation. I bring this up because Hellermann's language, often abbreviated to PAT, gives us a preview of things to come. For one, Pat isn't just a language. Pellerman describes it as something more encompassing.

Starting point is 00:52:49 Quote, The basic function of a personalized data processing system is to provide the user with a convenient, quickly accessible, private pseudocomputer, end quote. That's an antiquated language to be sure, but this is a concept that we should all understand pretty easily. It also sticks around in the APL circle. Once implemented, APL runs in this interpretive mode.

Starting point is 00:53:17 A user can run it on a time-sharing system, and each other gets their own private APL session. Those sessions are called Workspaces. Basic, also developed in 1964, follows the same scheme. Joss, a language eerily similar to Basic, also follows this scheme. Three makes a pattern, and best of all, these three languages are developed independently. Nothing deeper here, just a note that this idea of a quote-unquote private pseudo-computer must have really been in the air in the early 60s.

Starting point is 00:53:55 This is also, oddly enough, where we get the most clear explanation of why a language has to be terse. From Hellermann, quote, conciseness is particularly necessary to reduce the volume and complexity of typing key punching required of the user. To be clear, this rationale is specific to implementation. APL still isn't implemented at this point in 64, so got to be careful here. But I think we can back something out. Hellerman was inside the APL circle. He was privy to what other APL nerds were forming consensus about in this period.

Starting point is 00:54:39 And we do know the conciseness was a core design goal of APL since the very beginning. There's this much later paper called the design of APL that spells it out like this. The actual operative principle guiding design of any complex system must be a few and broad. In the present instance, we believe these are, principles to be simplicity and practicality. Simplicity enters in four guises, uniformity, generality, familiarity, and brevity. It has an ellipses, brevity is also economy of expression. APL was meant to be terse, and PAT was meant to be terse. PAT was this early brush with implementation. Hellerman specifically says that he wanted to make it

Starting point is 00:55:33 easier to punch his language into a computer, hence the value of brevity. I think that APL is following a very similar approach. It's terse because that, in theory, makes it easy to understand and deal with. At least, if you're really, really into mathematics, it's easier to draw out a Dell than to write gradient of value. But Hellerman ran into a bit of an issue here. P.A.T. You see, wasn't very terse. It didn't use single character functions. Rather, it rendered functions in Latin characters and a few standard special cares. That actually makes the whole design fall apart. And, well, is not something I ever expected. Let's take the equal sign as a core example. PAT uses no custom symbols. Assignment takes the form of the usual equals sign.

Starting point is 00:56:37 But what about conditionals? How do you test if two values are equal? PAT has a function for that. All functions are a at sign followed by their name. Those names have to be in Latin, characters, or numbers. And crucially, PAT allows you to abbreviate a function name by using its first character. The function for testing equivalence is at 0-01.

Starting point is 00:57:07 That's because E was taken. E was used for the exponent function. It turns out there aren't enough unique letters to describe a complex language, at least not if you allow for single-letter representation of a function. That doesn't work. Honestly, the Hellerman paper has made me come around to APL's way of thinking. at least somewhat. So then, how did APL's first implementation get around the issues that PAT faced?

Starting point is 00:57:39 Well, it all comes down to the Selectic. In 1963, IBM released the 1050 terminal. That was essentially a terminal built from a Selectric, typeball included. Moving forward, IBM had the capability to punch custom characters out of a computer. That meant it was possible to use special characters if you took time to define the character set. To quote Falcalf, the design of this character set exercised a surprising degree of influence on the development of the language, end quote. Just because they could define a type ball didn't mean they could do anything they wanted.

Starting point is 00:58:21 It actually imposed an odd restriction. A selectric ball could strike 88 unique characters. Note, I don't use the word encode here. The limit wasn't actually about character encoding, but the number of unique characters it would fit on a typeball. In theory, a System 360 or any contemporary machine could encode quite a few more characters. There just wasn't a way to interact with the machine in an extended character set.

Starting point is 00:58:55 APL had to change in order to be compatible, not with a computer, but with an IBM Selectric typeball. One immediate change was that the language became linear. At least, that's how all of these papers describe that change. It basically meant that superscripts and subscripts were out. The Selectric couldn't print those. Why does that matter? Well, in Trad Math, subs and soups are used for a few things.

Starting point is 00:59:26 superscripts are more commonly used for exponents. Subscripts and superscripts are also used to denote indexing, as in to specify locations in a vector or matrix. APL is all about manipulating that specific kind of data, so early versions used soups and subs all over the place. Dropping that notation represented a very large material change in how APL worked. This change, however, would prove to be an improvement. By linearizing indexes, that's a mouthful, by getting rid of superscripts and subscripts,

Starting point is 01:00:06 this made indexing more general purpose. Trad notation for indexing is, oh, it's actually pretty cumbersome. I still have nightmares of working with matrices. It can be hard to specify a single number inside something like a tensor, a fourth-degree matrix. But with linear indexing, it's easy. It's now closer to addressing a cell in memory. It also meant that it was easier to plug in a function as an index. That's some general purpose power, to be sure. This is the larger trend that came with the move to the selectric. APL became linear. The changes made to accomplish that also made APL more generalized. Operators,

Starting point is 01:00:53 are another example. Early APL expressed operators, functions of functions, as hats. And that is actually a math term. An example is you'd put a circle above a function to say you wanted to take the inner product using that function. By dropping the circle down, the notation became more clear and more flexible. You could now clearly show you were operating on functions. You can also use that operator in more situations. You could chain operators now. If you were doing that with hats, that wouldn't work. You can't put a hat on a hat, but you can put operators next to each other. That works. This is also where the APL character set gets truly wild. It's where we get the 100 plus characters that baffle outsiders. It turns out that even with some smart thinking,

Starting point is 01:01:50 88 symbols aren't enough. Enter over strike. So, typewriters used to have this thing called the backspace button. It's different than what we know today because it doesn't delete the previous character. In fact, it couldn't, since the character had already been struck onto paper. Backspace just moved the typehead back over what you just struck. You could draw more characters by striking down a second time.

Starting point is 01:02:20 For example, an O overstruck with a vertical line becomes a Greek fai. A triangle overstruck with a line becomes a Christmas tree. Using this trick, APL was able to have more than 88 symbols to play with. The first working implementation of APL sprang to life on an IBM 7090 in 1965. I say first, but it was technically two versions? Kind of. one used punch cards as input and ran in the old batch mode. One was actually interactive.

Starting point is 01:02:55 It had a time-shared workspace, very similar to that described in the PAT paper. Now, I'm not going to go too far into the syntax here. That's better left as an exercise to the reader. If you want to actually learn APL, I highly suggest try APL.org. That site has a full online system for using APL and a number of lessons on the length. Rather, I want to close with an implementation detail that I find, well, let's say, suggestive. The first implementation of APL was an interpreter, not a compiler. The evolution of APL says this was done for a few reasons. A large factor was convenience. It tends to be easier to write an interpreter

Starting point is 01:03:42 than a compiler. It also makes abstraction easier. What do I mean by that? Well, APL was meant to be as removed from the computer as possible. We'd call that portability these days. If you write a compiler, then you have to worry a lot about machine-level nastiness. A compiler translates a language into machine code. So there's this implicit idea that your language can map into machine instructions. There are a lot of tricks to get around this. You can actually compile anything that's turning complete.

Starting point is 01:04:18 But what I'm saying is more that there's a certain mindset that you can slip into when working with a compiler. An interpreter, for lack of a better term, can be more loose with it. And since it's easier to write an interpreter, it's also easier to change. This is a far better choice for experimentation, which was very much the state of APL in 65. The trade-off is always speed. Interpreters tend to run more slowly than a compiled program. But the APL crew didn't care about that so much. Things were still experimental.

Starting point is 01:04:54 If they needed more speed, that could come later. It turned out that speed would never be a problem. APL implementations, to this day, are largely interpretive. And that makes sense, right? APL is rendered as this interactive environment. You type in some code, then it gets executed. This is suggestive because of basic. In its original most pristine form,

Starting point is 01:05:21 Basic was a compiled language, but it presented an interpretive environment very similar to APL's workspace. This was done for speed, in a sense. Back to Basic, a book on the language written by its creators, Kurtz and Kimini, explains exactly why Basic used a compiler, to quote,

Starting point is 01:05:42 The key to a fast response is a program that quickly attempts to translate your basic program into machine language, and in the process discovers anything that's illegal, end quote. Basic was developed as a tool to induct non-programmers into the dark arts. It's educational, and its specific end goal is that you've now learned to program. Congratulations. A central tenet of this type of teaching is the idea of instant or rapid response. It turns out that it really helps learning outcomes if you have instant feedback.

Starting point is 01:06:19 You're immediately told if you're correct or incorrect. Kurtz and Kimmany worked out that the best way to get fast feedback was to try to compile your code and then throw an error if that failed. Hence, Basic was built around a compiler. APL is also in part an educational language, but it's more than just educational, and it's educational really, are different from those of basic. APL comes out of graduate-level classrooms.

Starting point is 01:06:50 It's first used to teach computer science. And APL isn't designed by teachers. It will be used by teachers, and it will be studied in classrooms, but it's designed by researchers. All right, thus ends my dive into APL. I've left this one a little bit open. It would be almost impossible to cover all the places APL goes.

Starting point is 01:07:19 Instead, I want to farm a foundation. Now, when we go forth, we can keep an eye open for APL in the wild. Hopefully, we have some more understanding and appreciation for it when it does turn up. APL is many things. It's a nearly unreadable tongue until you know its secrets. I think that makes its community all the more insular. But once you start to decode those secrets, it makes so much sense. That's kind of where I've landed on this one.

Starting point is 01:07:51 On the surface, APL is almost inscrutable. It's full of symbols that are seen nowhere else. It has its own mathematical precedence. It's formatted like nothing else. You even read it backwards. Taken together, it's baffling. But every feature has a reason for being. The evolution of APL explains that the language was designed

Starting point is 01:08:16 by quote-unquote Quaker Consensus. Everyone working on the language had to agree for a language change to occur. As a result, each change was meaningful. It's designed by committee, but this was a small and dedicated group. APL has a certain kind of elegance to it that is different than programmatic elegance I've seen other places. It's nothing like the wild consistency of Lisp or the, minimalism of fourth. It just feels different. For instance, did you know you can do actual proof

Starting point is 01:08:54 by induction in APL, as in full-on mathematical proofs, but in a turning complete language? You can't do that very easily in other languages. But for APL, well, that's a cinch. And I think that's the rub, right? APL is don't think. excited to see where I run into it next. Thanks for listening to Advent of Computing. As always, you can find links to everything at advent of computing.com. Go listen to Adjunct of Computing if you like the show. And as always, have a great rest of your day.

Advent of Computing - Episode 184 - What Is A Programming Language?

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.