Advent of Computing - Episode 89 - Forth
Episode Date: August 21, 2022What language has two stacks? What language is used on satellites and in home computers? What language deals in words? Why, Forth, of course! Forth is a highly unique language developed in the 60s ...by Chuck Moore. And when I say unique, I mean unique. Forth uses reverse polish notation for all operations, along with a dedicated data stack for passing parameters. But it's not just unique for the fun of it, Forth's design is highly deliberate. It offers a level of simplicity and power that's rarely seen in programming languages. Selected Sources: http://www.forth.org/POL.pdf - Moore's Programming a Problem Oriented Language https://www.1strecon.org/downloads/Forth_Resources/CM_ForthLanguageInteractiveComputing_1970.pdf - Early paper discussing Forth https://archive.org/details/1985-10-dr-dobbs-journal/page/41/mode/1up - Dobb's Journal article on the NX4000
Transcript
Discussion (0)
I'm not sure why you're reading this book.
It's taken me a while to discover why I'm writing it.
Let's examine the title.
Programming a Problem-Oriented Language.
The key word is programming.
I've written many programs over the years.
I've tried to write good programs, and I've observed the manner in which I write them
rather critically.
and I've observed the manner in which I write them rather critically.
My goal has been to decrease the effort required and increase the quality produced.
That's the opening passage from a book written by Chuck Moore.
He was in a weird spot in his life when he wrote this.
Moore, a programmer, had just quit his job.
The project he had been working on up until very recently had been cancelled, so there was no sense in sticking around. He was out looking for
greener pastures, so to speak, a place to practice his own special brand of programming.
Central to all that, Moore was looking for a place to hone a new programming language.
Moore had been slowly building a new language, and I do mean slowly here.
When he wrote that passage, he was about ten years into the project.
This new language was called Forth, and it was designed to be simple.
language was called FORTH, and it was designed to be simple. Reaching that design goal, however,
was no easy task, hence the long development cycle. Simplicity is hard to design for.
Moore had moved from one job to another, bringing primordial versions of FORTH along for the ride.
His last employer, a big carpet company,
just didn't seem to understand how important Moore's work was.
That's a very frustrating experience, but that wasn't enough to stop Moore. Welcome back to Advent of Computing.
I'm your host, Sean Haas, and this is episode 89, Fourth.
Today, we'll be exploring a language that seems to be a fan favorite.
At least, I've received a handful of requests to cover Fourth.
So, why the excitement?
handful of requests to cover Forth. So why the excitement? Well, to start with, Forth is a really unique language with a really unique background. But what piqued my interest is the fact that this
language has hardware implementations. The plural here is especially interesting. There are actually a number of microprocessors that have been developed specifically for use with Forth.
This includes at least one chip designed by Chuck Moore, the creator of Forth himself.
This distinction puts Forth in something of rarefied air.
forth in something of rarefied air. The special class here includes, off the top of my head at least, Lisp, Jovial, Ada, kinda, and Java. I'm sure there are a few more, but the point is,
we're looking at maybe a handful of other languages that have hardware implementations.
Maybe call it a dozen if you want to hedge the bet.
The case of Lisp is probably the most well-known. Back in the day, a number of Lisp machines were
purpose-built for running the language. However, those machines never really took off at any scale.
Jovial and Ada are weird cases. Both of these languages were used extensively with a
standardized military chip. That chip, the MIL-STD-1750A, was partly designed with features
to help those languages run. Ada also had something similar to a physical implementation at Intel, the IAPX 432, but
that chip kind of crashed and burned.
The only notable aspect is that the IAPX 432 had hardware support for object orientation.
Java, well, that example is a bit more of a lark.
Java already uses a virtual machine for executing compiled bytecode.
And there have been a few physical versions off that spec, but they're not huge serious
chips.
For instance, there is a commemorative ring that has a tiny Java VM embedded in it.
But Forth offers a slightly different case than all of these.
The Forth chips, best known among them being the Harris RTX 2000 series, have a very specific
set of use cases.
Within that niche, they're still in use, and we're talking really small embedded industrial control systems
here. More specifically, Forth has become something of a space-based language. That's right, Forth is
often used on space-bound computers. Rosetta, Chandra, Cassini, and even parts of the space shuttle flew with Forth chips on board.
I mean, that's just cool. That's a huge level of success, but Forth does keep delivering.
The language does, in fact, have some terrestrial use cases. Forth shows up in many other small embedded systems, but also as a component of larger
computers.
Some open firmwares, the boot-up software burned onto motherboards, use Forth for their
internal code and their user interface.
So you might be using a computer with Forth built in right now and not even know it. This beautiful resume leads to some interesting questions for me, and I just gotta reiterate,
the whole language-specific chip thing really gets me excited.
But how do you even go about designing a chip for a specific language?
I mean, processors have been influenced by programming languages in general, and vice versa.
There's always been a give-and-take kind of relationship there.
But how do you get to this higher level of full-on hardware implementation?
When does that become something you're even interested in doing?
There are still a lot of questions to answer outside the hardware world.
Forth has this kind of endurance that's rarely seen in programming languages.
Most languages die out in obscurity.
Now here I don't just mean, you know, generally all languages just disappear.
There's a bit of a pattern here, an established modus operandi if we want to use that. I've actually covered a few
Lisp-inspired languages over on Patreon that fit into this very specific pattern of decline.
For some reason, my benefactors keep wanting me to talk about failed Lisp clones, but anyway,
I digress. Many of these dead languages are either too niche to spread or just plain bad
languages that no one wants to use. If you make a worse Lisp, people are just gonna use Lisp.
The same goes for any language inspired by another. You have to actually beat what's out
there for people to want to use your language. On the other end of the
spectrum, you have superstars. You got your Fortrans, your Cs and its many derivatives,
Lisps, and monsters like PHP and Java. Those languages got huge, and despite sometimes
very great effort, they won't be going away anytime soon.
The TIOB index tracks the popularity of programming languages, or
at least attempts to estimate the overall market share of different languages.
Out of the top 20, only four languages were created in the 21st century.
Once a language is well- well established, it almost has guaranteed
longevity. Part of this has to do with technical debt. There's just so much code in C that it will
never go away. The same, in a bizarre way, goes for languages like Fortran and COBOL.
Now, 4th isn't even in the top 10, or the top 50 most used languages for that matter.
TIOB actually lists 4th in a footnote.
The rankings stop at 50, so 4th is somewhere between 51 and 100th most used language.
As, just another fun analysis, none of the dead Lisp clones I've talked about
show up anywhere on any rankings. So Forth isn't still alive thanks to technical debt, at least
probably not, and it hasn't died out because there's a better version out there. Fourth is able to keep on seeing use in its very particular niches.
So what makes a language endure like that? What makes this third kind of path possible?
And finally, what's the deal with these embedded systems anyway? If embedded computers are just
small machines, then why not run normal software on them?
Why would one language be preferred on small systems over another?
Why has Forth fallen into this niche to begin with?
This episode, I'm attempting to answer all this and maybe more, I hope.
But very quickly before we dive into the episode proper, I have the announcement
corner. The first one is, as I keep discussing every couple episodes, and I will keep discussing,
I'm currently putting out a call for authors for a new publication called Notes on Computer History. You can find information about that over at
history.computer, my new favorite top-level domain. Now, Notes on Computer History is going to be a
journal about the history of computers. Shocker, I know. In general, I'm looking to have a community-driven
publication that covers anything under that wide
umbrella of comp history. And I don't want this to be a super academic outlet. I want it to be
somewhere where anyone can write. So, and I'm talking to you, whoever you are listening, if you
have a passing interest in computer history, if there's something you want to research, first research it, then go to history.computer,
read up on how to submit. It's very simple. You email me a draft and then get in touch. I'll hook
you up with one of my editors and we'll be able to get going towards our first issue. And the second
announcement is I was actually a guest over on another show recently. I've mentioned Eastern Border on here
occasionally. It's a great show ran by one of my friends, and it covers the history of the Soviet
Union and post-Soviet politics. He interviewed me on his show to talk about embedded systems,
oddly kind of relevant to today's topic, drones, and cybersecurity slash cyber
warfare. So that episode, I think will either be out when this episode's out or in the next few
days if you're listening to this when it releases. So if that sounds interesting, then head over to
the eastern border and give it a listen. It's everywhere you can find this podcast as well.
With that out of the way, let's get into the actual episode. I played around a little bit
with breaking up the chronology here when I was working on the outline for this episode.
I really want to get right into fourth processors, but I decided to put that off. For those chips to make any sort of sense, we have to have a
grounding in Forth the language. If this was something like C, well, then I could probably
get away with a temporal twist. But here's the deal. Forth operates on different principles than
more well-known languages. It's not anything like C.
It's not anything like basic. It has its own way of doing things that, if you're not familiar with,
are just going to be confounding. This is also a wonderful case where we have a whole lot of
resources to work with. Chuck Moore, Forth's creator, has written a lot on the language
and his ideas on programming in general. And I really do mean a lot. We have multiple books to
draw from, plus essays and papers and some interviews. All are written at different stages
in our story, so I think it's fair to start with some words from the man himself.
The primary source in this section is Moore's contribution to the ACM's History of Programming
Languages 2 conference. And as an aside, if you haven't read ACM's History of Programming
Languages series, it is a fantastic source, and I highly recommend it. Anyway,
Moore opens his discussion of Forth with this passage, quote,
Forth evolved during the decade of the 60s across America within university, business,
and laboratory amongst established languages. During this period, I was its only programmer,
and it had no name until the end. This account is retrieved from memory,
prompted by sparse documentation and surviving listings. End quote.
Now, this, dear listener, is what I love to see.
Moore gives us fine-grained details down to the year, so I feel like Moore is a bit of a kindred spirit for me.
For one, he has a degree in physics. That instantly puts someone in my good graces.
For two, he's a space guy.
Even before graduating MIT, he was working at the Smithsonian Astrophysical Observatory. And for three, he bridged the astro and physics part of his background with some slick programming.
That's something that I can really emphasize with and really get behind.
His first brush with real-world programming was at this aforementioned
observatory. This started in 1958, so we're right at the whole dawn of programming languages era.
Lisp, something that we're going to be talking about quite a bit, was just about to emerge from the depths of MIT. Fortran is only a scant handful of years old
at this point. This is also where more ran into the problem that all programmers of this era faced.
Computer time was a very precious commodity. Batch processing was the order of the day.
Timesharing had been theorized but wasn't seeing any practical use, so the life of the day. Timesharing had been theorized, but wasn't seeing any practical use.
So the life of a programmer ended up following certain temporal patterns. Moore was working
with Fortran, a compiled language. So he needed a computer for two operations, compilation and
execution. First, he would actually write out his Fortran code on punch cards.
Each 80-column card held a single line of text, encoded as a pattern of rectangular holes.
Then it was time to get on the schedule.
More needed to get computer time scheduled to feed in the Fortran compiler than to actually compile his new code. As he recalls,
that process took something like 30 minutes. If there was a bug in your code, then, well,
that 30 minutes was wasted. You'd never get that back. All you could do is correct your code,
hope it was right this time, and schedule some more time. After that process, Moore was
left with an executable program, probably in the form of some more punch cards. Then he'd have to
schedule more time for actually running the thing. This first program he worked on was designed to
calculate satellite orbits, so the final output would be something like a set of coordinates and
times for a satellite. This is, notably, something that could be observed. Therefore,
an error in programming could be found by making physical observations. That means that you'd have
to have another schedule compile, schedule run loop while trying to fix the error.
The same loop would need to occur if they wanted to track a different satellite or a different type of orbital geometry.
That ends up being a lot of downtime where you're just waiting.
You're just doing nothing.
Even with a fancy high-level language like Fortran, All the administrative overhead just really bogs you down.
So Moore decided to do something about it. What if the recompilations could just be skipped over
entirely? Moore's solution was to program a thing called an interpreter, or at least to take a step
in the direction of creating an interpreter.
As Moore describes it, he modified his orbital calculation program to accept commands from a set of punch cards.
Basically, his new program would change its calculations based off some extra inputs.
Moore's program was using a series approximation to run its calculations. That just means that he had to
reduce the fancy orbital equations down to a set of weighted sums. It sounds like he was using his
interpreter to modify those weighted values based off inputs, as well as certain other parameters.
Now, you could do something like this with stock standard Fortran.
The language can read in formatted punch cards as well as other data sources. So in theory,
more could have gone the traditional route, if we can call something this young traditional.
But that does present the problem of waste and inflexibility as a side note.
You see, Fortran is a very inflexible language.
To read inputs, it normally requires a strict format for that data,
and it assumes that the data is structured as a table of values.
You could easily write a Fortran program that read in some cards and then use
those as part of a larger calculation. That was commonplace back in the day. The issue is that
you would have to always give a table with the same structure. And if you didn't, well,
the program wouldn't necessarily crash per se, but it wouldn't work right.
If a field had the wrong data type, then that would lead to disaster.
Moore's interpretive code was a little smarter than the norm.
It was able to read each row of input and act on it instead of just throwing the data
into some variable.
By adding a simple set of instructions,
Moore had created a more flexible program. Once the interpreter was compiled, he was free from
a whole half of the programming loop. He no longer needed to recompile to track new satellites or to
adjust for errors in his math. As he put it, he programmed himself out of a job.
I can't help but feel a twinge of Hopperian self-deprecation there. I think it's a streak
that runs through most computer programmers. Grace Hopper once said that she didn't develop
the first compilers because she was the smartest person at the keyboard, but rather because she was
the laziest person she knew. That's an ethos I can really get behind. This simple interpreter
idea would travel along with Moore as he moved from gig to gig. In 1961, he started work at the Stanford Linear Accelerator, aka Slack.
And so too did his interpreter.
The initial code was rewritten, a number of features were added,
and the program became more complex.
This time at Slack would cause the next big step towards fourth proper,
the pushdown stack.
For the whole stack stuff to make sense, we need to talk about reverse
Polish notation, or RPN. A shorthand here is that RPN is basically the opposite of LISP.
Your arguments go before your operator. If that makes your eyes glazeth over, then let me give an example.
If you want to add 1 and 2, you'd normally write 1 plus 2.
Simple and direct, we're used to that.
In Lisp land, you use prefix notation, so you end up typing an expression like plus 1 2.
In this case, plus is the operator, and the following numbers are the operands, the arguments. Lisp was oriented this way, at least initially, to make it easier
for a computer to interpret. The machine sees a plus, knows it's about to do addition, and so it knows what to expect next and how to deal with the upcoming
arguments. In RPN, you'd type 1, 2, plus. The arguments come first followed by the operation.
Of course, there's also a separator here. On paper or terminals, that separator is usually a space. For some calculators, it's an inter.
Lisp's prefix notation has a justification, as I've explained. The standard notation is,
well, it's standard. It's easy to understand arguments for. But what about RPN? Well,
that comes down to a data structure called the stack.
A stack is probably one of the most usable and abusable data structures in computer science.
In practice, it's sublimely simple.
You allocate some chunk of memory and you have some variable called a stack pointer.
That pointer is a reference to the current top of the stack, very aptly named.
At least, top is the usual parlance used that's supposed to evoke the idea of a stack of paper or a stack of cards. This is a total side note, but in a lot of computer science resources,
In a lot of computer science resources, memory is talked about in a vertical sense.
Zero, the base of memory, starts at the bottom of a diagram.
Then the larger addresses are depicted higher and higher up on the chart.
Anyway, that's just something to note and something that I keep trying to look into and I should actually dive into one day.
Tangent done.
So you start using your stack by pushing on some data. This will copy a number to the location referenced
by the stack pointer, then the stack pointer will advance to the next free memory location on the
stack. Usually this is just the next continuous address in memory, but you can have funky implementations,
so user beware.
You can then retrieve data by popping it off the stack.
This will decrement the stack pointer, then return the value it points to.
In other words, you get the top of the stack.
This can also be called a first in,-last-out stack because the first value
you put in is going to be the last value you retrieve. Stacks are so supremely useful to us
computer nerds that most processors natively support this one data structure. That means the
physical computer will have a small region dedicated to a stack pointer,
plus at least some commands for pushing and popping data to and from the stack.
This is one of those neat ways that hardware and software have influenced each other and worked together.
So, the stack is good, it's useful, and it's eminently available.
Here's where Moore factors in. While at Slack,
he decided to use a stack for passing arguments to functions. Now, some languages do this kind
of on the down low. C, for example, will load arguments onto the stack, then it calls a function. The function is expected to pop off those arguments.
But in C, you use special syntax that obscures the lower-level processing going on.
In Moore's new language, which we can basically call forth at this point,
a programmer interacts more directly with the stack. Let's take the expression for adding 1 and 2 as
the canonical example here. In 4th, that expression will be typed as 1, 2, plus. The interpreter
takes that string from left to right. It sees a 1, that's a number, so it goes onto the data stack.
to right. It sees a 1, that's a number, so it goes onto the data stack. It then sees a 2,
also a number, so it also gets pushed onto the stack. Then it hits a plus, which is an operation.
The code to handle addition is found in some table deep in the computer. Then it's called.
The plus operation then pops its arguments off the stack and presto.
And that's the core of how Forth works.
For a programming language, that's remarkably simple.
The basic concept of the language can be expressed in a few minutes.
The syntax is, well, almost non-existent.
We'll get into syntax later, but in general, you just have words or data separated by spaces. This compares very favorably with, say, Fortran or C or APL.
I can continue. Forth is fascinating in this respect because its syntax is vastly reduced.
I think Lisp is the only language that comes close, but even Lisp has some weirdness that involves closing parentheses.
a pile of different characters, the formatting for functions versus operations, how lines need to end, how lines are labeled, data types, but not with Forth. You put arguments on the stack,
you put an operation there, and then the operation uses those arguments. Done. That's the whole
theory of operation. This is, as far as I'm concerned, Forth's biggest selling point. Much like Lisp,
we're dealing here with a very simplified language. But that simplicity doesn't mean a lack
of power. Far from it. You can express anything you want in Forth. It just does so without the
added syntactic sugar of larger languages. So the stack, Moore's calling card,
is implemented at Slack. But it would take another step for Forth to gain its next two big features,
multithreading and a second stack. That's right, it's stacks all the way down today.
After Slack, Moore started working at Mohasco.
We're looking at a time frame around 1968.
Mohasco was a carpet company.
I know, a logical career move after working with a linear accelerator.
Jokes aside, I think this pushed Moore as a programmer.
He was going from a research programmer to working
out in the field for a big business. He was now in a position of solving more real-world problems
with real-world constraints. The fact that Mahasko seemed to have good funding for the
computer department, at least for a time, seems to have helped matters.
Three big events occur here. First, Moore's little interpreter officially got a name.
It's the one we've been using so far, Forth. The name here was due to technical limitations, actually. You see, he wanted to call it fourth, as in the number.
The logic here is that the language was for the fourth generation of computers,
basically mini-computers with keyboards and live outputs.
The machine he was using at Mahasko only supported five-character file names,
so the U was dropped, giving us F-O-R-T-H.
Now, the name's cool and all, but that's only really one aspect of this new stage in Forth's development.
Once at Mahasko, Moore started working with smaller computers.
These aren't personal computers per se, these were mini-computers. Machines small enough to be operated by a single human, but not so small as to be possessed by a single human, at least not yet. The important factor here is
that this environment is much more conducive to interactive use. No more batch processing.
The digital interface was now closer to what we can recognize, a display and a keyboard.
was now closer to what we can recognize, a display and a keyboard. Of course, there are still punch cards and tape reels kicking around somewhere, but the primary interface had become much more
human-centric instead of computer-centric. In addition to dropping the U, 4th was also
pushed more towards interactivity. This becomes another tentpole of the language.
The second big event at Mahasko was the introduction of multitasking.
This is another feature that really sets Forth apart from any other languages.
Forth natively supports processes, as in, it can do more than one thing at once.
We're talking about parallelism here.
Now, this is a quick spot where I need to point out a pothole.
You may be tempted to call this threading, but don't.
Threading means something else in the fourth context, which we'll get to.
Now, as with everything, multitasking and forth follows the principle of simplicity.
This is such a central tenet that Moore even opens one of his books, Programming a Problem-Oriented
Language, with this note, quote, So to offer guidance when the trade-offs become obscure,
I'm going to define the basic principle. Keep it simple. End quote. Recall that Moore,
at this point at least, is in the business world. It doesn't benefit him to write some
groundbreaking multitasking code if it doesn't get the job done. So he went with the simplest
solution. Cooperative round-robin multitasking. In general, computer multitasking works by
switching between tasks, giving each separate task a small slice of time to run. Many trees,
perhaps entire forests, have been killed in service of discussing how to switch between tasks.
Should you have some higher entity that decides when a task should stop? Should each task
get a set amount of time to run? What about priority? Should some tasks run more often than
others? Each consideration and each solution comes with its own set of ancillary problems to solve.
Cooperative multitasking, as implemented by Moore, is dead simple.
You have a table of tasks to run.
You start at the top of the list.
That task runs until it says it's ready to yield the computer to other tasks.
Then the next task on the list starts executing.
This continues until you get back to the top of the list.
If a task completes, then it's removed from the list.
Done. That's the full description.
Contrast this with preemptive multitasking, the premier scheme for abusing silicon.
This involves a separate chunk of code that fires off at even intervals of time. Every time this
code runs, it checks the task table and decides which task to
stop and which task to resume. This takes more code, but since it's all controlled by a central
process, you can do things like priority levels and more complex record keeping. But it's also
more complicated. Tucked away in there is the assumption that you have a computer
that supports interrupt routines and has a programmable interrupt timer. Those are standard
nowadays on most normal chips, but not all older machines have these features. Today,
there are even some smaller systems that lack things like an interrupt timer that's internal to the chip. Now, there are risks and downsides to cooperative multitasking. I mean,
it's kind of right there in the name. Your tasks must be willing to cooperate. In theory,
a task can just decide, no, I'm not going to share. I'm not going to yield. I'm just going to keep all these resources for myself.
A programmer can decide to never yield, as is a programmer's prerogative.
A true programmer never yields, after all.
Or, more realistically, a task can get stuck as the result of a bug.
Maybe a yield is placed right after loop, but but well, that loop just keeps on looping and
the program never reaches the yield. Careful programming and some tricks can get around
these issues. Just remember that the simplicity in this case comes with some risks. So why was
Moore implementing multitasking in his young language? Well, it all comes down to practicality. The code Moore was
writing at Mahasko was dedicated to handling financial transactions. This was a business job,
after all. Most of Mahasko's code base was written in COBOL, the de facto business language.
But, you know, COBOL had its issues. Instead of totally rewriting all that business logic code which was already known to be good,
Moore got tricky.
He actually built a fourth-based dispatcher, so to speak.
This was a program that could take inputs and then call up the proper COBOL code to
handle incoming data.
This was all done in a multitasking environment, so he was able to
make very efficient use of company resources. The final big addition to the language was the
return stack. This is the last tentpole that makes the fourth platform. We've already seen the data
stack, plus interactivity, plus multitasking. And now stack number two.
The concept of the return stack is, once again, simple and pretty well based in established norms.
To get why this matters, I need to introduce one of Forth's syntactic idiosyncrasies,
the word. In Forth, a word is roughly equivalent to a function. There are some
differences of implementation, but we're basically dealing with something like functions here.
Each word can be defined by an expression, which in turn may contain other words. The plus from my
earlier example? Well, that's actually a fourth word. I will admit
the terminology is a little weird, but I think we can get past that. Now, as I hinted at, this
wouldn't really be much of a language if a word had to stand on its own. So fittingly, a word can
be defined using other words. In other words, if you like. Put a more familiar way, fourth functions can be
composed of calls to other functions. In practice, word definitions are, once again, a little
idiosyncratic. A definition starts with a colon followed by the name of the word, then its contents
formatted as an expression, then a semicolon.
In this way, you can define both variables and functions as words.
That's part of what I meant when I said words aren't 100% the same as functions.
It's a bit of an abstraction.
The nestability, the function calling a function, is what we need to focus on.
Going to the low level, a function works like this. You have some address where your code is stored. You tell the computer to call that
function. The function runs, and when it's done, execution returns to the instruction
after the initial call. There is only one ambiguity in this process. How does the computer know where to return to?
Normally, this is handled by the stack.
A call instruction actually tells the computer to push a return address onto the stack and then jump to the function.
At least, that's how most processors I am familiar with implement a call.
Maybe you can see the problem that this introduces.
Combine this with stack-based calling conventions that I mentioned earlier. You wind up with the
stack being used for arguments and a return address. Worse still, the return address is at
the top of the stack, followed by arguments.
In practice, this means that you have to pop off the return address and store it, then
grab your arguments, and finally put the return address back on the stack.
When you return, when you actually type return or whatever the processor uses for that operation,
it pops the return address off the stack and then jumps back. So you have
to make sure the top thing on your stack is the return address and not some other data.
This isn't the end of the world. It's relatively easy to deal with. However, this is annoying,
and it's something that you have to do for each and every function.
In this regime, a function that takes arguments on the stack just has to start by doing a little stack dance.
It slightly increases complexity and redundancy.
In the worst case, it makes your software less accessible to newcomers.
Why do we have to shuffle the stack in every function? Well, my friend, it's tradition!
Moore does away with this traditional dance by creating this thing called the return stack,
thus leading to fourth containing two separate stacks. The main stack is used for arguments and
data, while the second stack is used only for these return
values. That means each time a word is executed, a return address is pushed onto the return stack.
There's not much to say about this besides it really simplifies the low-level part of Forth.
A stack is a super simple data structure, so it's not that hard to add another
one to your codebase. The shuffle that most languages require just gets thrown out the window.
If I can editorialize for a minute, and let's be clear, there's actually no way you can stop me.
This whole dual stack thing reminds me of Lisp, and I don't mean in any technical sense.
Lisp isn't really big on stacks, and there's not a whole lot of crossover between Forth and Lisp.
I mean that adding the return stack sounds like a weird decision, but it actually leads to a
remarkable simplification. I'll always stand by the fact that Lisp is about as close to a pure distillation of
programming as you can get. Well, it's starting to look like Forth is entering this same reduced
pantheon for me. Maybe this is my own personal life seeping in. I've been writing a lot of
Fortran lately for a side project, which I swear we'll see in the light of day eventually.
lately for a side project, which I swear we'll see in the light of day eventually.
That language, Fortran, the high-level language that started it all, is the antithesis of simplification. It's complex, it's confusing, and it just feels old. For instance, in Fortran,
you have to define all your variables at the top of your program. I mean, physically at the top. You have
a line that starts your program, any imports or the obligatory implicit none that every program
written after 1977 has to have, and then you have to declare variables. As soon as you write any
active code, like an assignment or any math, boom. Now the Fortran compiler will accept no more
declarations. There's no syntax telling you that you're out of this variable declaration chunk of
code. The compiler just kicks into another gear without telling you. As a programmer, you just
have to know that that's how Fortran works. You have to remember this weird
idiosyncrasy. This is a holdover from earlier in the language's life. If I'm remembering correctly
from all my research, then this was done to aid in some of the many optimizations the early Fortran
compilers included. But it's no longer needed. It hasn't been needed since we've had reasonable modern
computers. There's no reason to keep this besides, well, the T word. Tradition. By contrast,
Forth and Lisp don't have these kinds of weird artifacts hidden in their compilers. At least, if they do, I have yet to see them. Now, putting all that aside,
the fourth big event that happened at Mahasko was kind of a meta-event. Mahasko canned the
fancy computer project that Moore was working on, and subsequently, he quit. In Moore's own words,
quote, after giving notice, I wrote an angry poem and a book that has never been published.
It described how to develop Forth software and encouraged simplicity and innovation.
It also described indirect threaded code, but the first implementation was at NRAO, end quote.
Never suffer fools when it comes to good programming.
Now, I'm not sure if this program has ever seen the light of day, but the book actually has.
That's the problem-oriented language text that I've been using in smatterings all over this episode.
This is also something that I really like about Moore.
He has written a lot about Forth.
These writings are technical, theoretical, and deeply personal.
This is a level of sourcing that I don't usually see, so I'm enjoying my time here.
For instance, we can say that Forth was, in fact, inspired by Lisp.
At least, Moore was inspired by the older language.
Moore graduated MIT in 1961, which puts him at the university just as Lisp was being formulated.
He actually took a course on the language from McCarthy, the creator of Lisp himself.
I point this out mainly to show the level of granularity we're dealing with here.
This is also where we enter into the realm of a fully realized fourth.
After Mahasko, more went back to his roots, so to speak.
He got a gig programming at NRAO, the National Radio Astronomy Observatory.
Once again, he's a man truly after my own heart.
In a past life, I spent many sleepless nights going over data from radioscopes.
Anyway, when people give a short summary of FORTH, it usually starts here.
They'll say that it's a language developed at NRAO for working with telescopes, and
while that was a step along the way,
I think it's plain to see that Forth had been brewing well before this point.
The best evidence we get for this is the fact that Moore was publishing on Forth prior to joining
NRAO. He penned a paper on Forth in 1970, that's a year before he left Mahasko. So maybe we should be
talking about this language as a tool used by Big Carpet. Anyway, in 1970, the language was
fully formed. The Mahasko paper shows all the tentpoles of Moore's platform that we've discussed.
It also outlines something crucial about Forth. The language is tied pretty closely
to its implementation. The whole first section of the 70 paper deals almost exclusively with
the implementation of 4th. That's followed by a brief interlude and then a discussion of the
language itself. I think this brings up an interesting point to consider, or
multiple points perhaps. Let's just look at what we've hit on so far. The data stack,
the return stack, multitasking, and interactivity. Those are all really vague features.
We've talked about their implementation details, but those details aren't exclusive to any one language.
Forth uses cooperative multitasking.
Well, so does Classic Mac OS.
It has a return stack built in.
So does the Intel 8008.
All the core features that make Forth, well, Forth, are tied up in its implementation.
This is something that I find really interesting.
This is, as Moore would put it, a problem-oriented approach to programming a language.
In practice, everything about programming comes down to implementation.
It's nice to have a fancy theoretical language to work from.
It's cool to have meta- theoretical language to work from. It's
cool to have meta-languages to describe your language. The best example of this abstract
approach might well be ALGOL. That language was designed as a standard first, and implementation
second. So much so that it took years to catch on, and even then, it never had widespread adoption.
Early standards for ALGOL don't even contain recommendations on how to print data.
By contrast, FORTH is all about practical detail.
How do you handle input and output?
You have an interactive shell.
That has nothing to do with the syntax or semantics of the language.
It's part of the implementation of the language. How do you handle calling functions? You have a return stack and a
data stack. Those are details that matter for, say, a compiler. Normally, this level of detail
wouldn't even come up in the language description. C passes arguments on the stack, but you won't
find that in most manuals. More is just taking a different approach. And even while designing
this language around actual problems, it remains high level. This isn't like an assembly language
where each line translates directly into an instruction for the processor. Forth is still abstracted above
that. Moore, in designing this language, is writing a fine line that I find fascinating.
So, how does Forth really factor into this new NRAO job? Well, as is Moore's prerogative,
he didn't want to work in any other language. He explains that NRAO was using Fortran at the time.
He had to correct that.
Since Forth really took its final form at Mahasco,
this was Moore's first attempt to port the system to a new platform.
Mahasco was a Univac shop.
Prior to that, parts of early Forth had been written for the IBM 360. NRAO had a
Honeywall computer. Suffice to say, Moore is dealing with an eclectic milieu. There were a
number of techniques that Moore used during the porting process, including fancy cross-compilation.
However, the biggest change that happened during this port was the implementation of threaded code.
Once again, we reach what many consider a central feature of Forth that has nothing to do with the actual language itself.
Up to this point, Forth had existed as an interpreter.
This means that Forth code was never converted into executable instructions.
Instead, a program sat between your code and the computer and interpreted what your code meant.
At NRAO, more was moving towards a more compiled language, but with some caveats.
The interactive nature means that we aren't looking at a traditional compiler.
Now, a compiler is a more tricky beast to program than an interpreter. A compiler needs to take your
code and transform it, transmogrify it, into a series of instructions that the computer can then
run. Your final output is a pile of binary that can be run directly on the target computer.
No interpretation needed. The result is usually a faster program at the expense of more upfront
work. You have to actually build the compiler first, you know.
Compilers can be horrendously complicated beasts. I've tried my hand at a few over the years and always ran into
issues. Your own mileage may vary. The original Fortran compiler even gives us a whiff of this
complexity. That compiler used multiple passes to generate optimized code. There were even entire
phases just dedicated to deciding which processor registers to use for which variables.
This definitely violates Moore's dictum to keep things simple.
Forth uses a trick to get around a lot of this.
Call it one weird trick that compiler designers hate.
Moore used a technique called indirect threaded code.
It's an intimidating name, to be sure.
The threaded part is the easiest to understand.
Normally, a single expression will compile down into multiple machine instructions.
Something like assignment with a simple math operation might turn into 10 or 12 instructions
in machine code.
Threaded code is different because instead
of taking the direct-to-instruction route, each operation compiles down to a call to some special
function. In practice, this means you first build up a library of these basic primitive functions.
You might have functions for putting data on the stack or popping data from the stack,
You might have functions for putting data on the stack or popping data from the stack,
assigning words, or subtraction.
When the compiler runs into code that, say, pushes to the stack, it just has to make a call to the proper routine.
If you were to look at threaded code as assembly language, then almost every line would start
with the same instruction.
Call.
The indirect part here is a neat sleight of hand that more adds in.
Instead of making a direct call to a function,
each line of code makes an indirect call by looking up a function in a table.
This is a small difference, but allows for some added flexibility.
That table can be changed, thus updating what each call actually does.
It's also possible to have that table point to some runtime library that's loaded separately
from the actual program.
Thus, your compiled code wouldn't necessarily need to carry around the entire library with it.
Now, threaded code is a pretty cool technique. I can't refute that fact. It also lets you pull a neat little cheat when you're
trying to port your compiler. All you have to do is rewrite your call library and change the opcode
that initiates call and boom, you can now target a new platform. Simplicity itself. But there is a cost here,
as always. Threaded code can, in some cases, be slower than normally compiled code.
You have the theoretical overhead of a memory access, a call, and a return for each expression.
In practice, there are ways around this, but the bottom line is that there is always a theoretical performance hit.
Once Forth was running at NRAO in all its threaded glory, more could actually get down to the real work.
He went on to develop programs for managing radio telescopes and running data analysis.
When fresh data comes in off a radioscope, it's actually kind of useless.
It has to be massaged, normalized, and converted into meaningful information. That was all done
at NRAO using FORTH. One of the reasons that this NRAO period is sometimes touted as the
starting point for the language is because because the first fourth manual was written
at the observatory. Interestingly enough, Moore didn't write the manual. Its author was one
Elizabeth Rather, one of Moore's co-workers and, as near as we can tell, the second fourth
programmer ever. The 1970 paper did give a pretty complete description of the language, but
once the manual hits, we have a pretty set-in-stone version of Forth. This NRAO period wouldn't last
forever. From Morrigan, quote, NRAO appreciated what I had wrought. They had an arrangement with
a consulting firm to identify spinoff technology.
The issue of patenting Forth was discussed at length,
but since software patents were controversial and might involve a Supreme Court,
NRAO declined to pursue the matter.
Whereupon, Reitz reverted to me.
I don't think ideas should be patentable.
Hindsight agrees that Forth's only chance lay in the public domain, where it has flourished." This is also where things transition away from Moore himself.
At least a little bit.
In 1973, he and Rather left NRAO and founded Forth Incorporated.
The company would work on porting Forth to various platforms for a number of years.
This is where some of the early microcomputer ports come from. But that's all software stuff. I think we've talked enough about software. It's time to make the jump over to silicon.
Embedded systems are really the secret sauce to the current digital regime. These are tiny, often underpowered and very cheap computers that are jammed into electronics
to give them some sort of smarts.
Embedded systems differ from fully fledged machines in a number of ways.
These small systems usually don't support a full user interface.
No keyboard, no mouse, or displays are present. You usually
don't see actual disk drives, but instead storage is handled by ROM or some type of flash storage.
Crucially, embedded systems usually offer more interface options than a standard computer.
We're talking something like general purposepurpose input-output pins,
ways for the computer to talk to the outside world, control devices, or pull in data.
The overall package enables things like industrial control systems or, like I said earlier,
just controlling dumb machines with smart chips.
I think the reason for embedded systems is pretty clear.
There's a good rationale.
Sometimes you need highly custom electronic control of a device,
and you want it done quickly and cheaply.
Let's say it's a radio telescope, since that seems topical, doesn't it?
You usually need a way to steer those scopes around, but you need to have a way to prevent oversteering. You don't want to accidentally tell your dish to run
into something. You also need a way to bring up the antenna and start reading data, and some way
to send that data off to a larger computer. You could build up custom circuits in some dumb
interface that just plugs into a big computer.
That'll work fine.
Or you could throw a processor in the scope and just write some code to handle everything.
That second approach is the embedded route.
Another factor that makes this second route even more compelling is the longevity of some solutions.
Your observatory probably upgrades
computers on the regular. If you had a dumb interface, just a really big cable from the
scope to your machine, then you'd need to port your control software every time you get a new
kind of computer in the lab. That's a lot of busy work that could be avoided. Using an embedded system, you can create a standard interface.
Then the control software on the client side could literally just send a serial command to the scope.
You might just need to write a program that can send the word move dish and collect data.
That saves you a lot of effort.
There is, of course, a certain break-even point
involved here. You have to consider things like the cost of development of the embedded solution
in the first place. It doesn't make sense to develop an embedded computer for a radio telescope
if the project itself takes years of work. You want a quick turnaround with maximum profit, no matter if
you're talking financial profit or just time savings. So ideally, you need a simple embedded
system that you can program in a comfy language. You wouldn't, say, build an embedded system that
used a multi-billion dollar IBM part and was programmed in some arcane dialect of Fortran.
That'd be a recipe for a bad time. In other words, in the embedded world, there's a tight
connection between your language or your environment and the implementation. You need
a language that can be used to fully express your system, something that can handle normal programming plus all this weird I.O. you may encounter.
You also need a small software environment, since everything will be living in ROM or
some other relatively small storage device.
Many embedded systems are just programmed in assembly language, but that presents its
own unique problems, especially since embedded
computers don't always use familiar chipsets. I've done a little bit of embedded work myself,
and I've run everything from Z80s to Intel 8186s to ARMs to 8085s. Take an Arduino, for instance.
That's a microcontroller that's pretty common, but it's in this whole realm of little machines.
It uses an AVR chip, which is a Harvard Architecture RISC machine.
It's very rare to see anything that uses the Harvard Architecture in the 21st century.
That's a machine that keeps code and data in separate memory spaces.
That was in use prior to 1945.
It's not very common.
That's just not how computers usually work nowadays.
All this is to say that small systems, I mean really small systems,
are a bit of a wild west.
An ancillary point here is that the toolchain really matters. It can make the difference between a nice experience and a kind
of impractical one. So what about Forth? How does that all fit in? Hopefully I've definitely led you
to one conclusion. Forth would probably be a good language for embedded systems.
This is helped by a fact that I haven't really mentioned yet.
In addition to being simple, implementations of Forth also tend to be very small.
The language itself is so simple that you can get away with a tiny interpreter
or a tiny runtime, even a tiny compiler.
This is helped along by the fact that, for all intents and purposes, Forth can be fully described as a virtual
machine. The spec describes a computer with two stacks, some memory, and a very small
set of basic instructions. That's easy to implement in software, so you end up with a fourth runtime that's
kilobytes. But that's kind of boring. We can go a lot further with this idea. This is where we get
into fourth machines. Now, it's actually a little hard to track the early history of physical
implementations of fourth. According to Fourth Programming Language, History and Evolution,
a book written by FORTH Inc. itself, the first attempt was in England.
The first such effort was made in 1973 by John Davies of the
Jodrell Bank Radio Astronomy Observatory near Manchester, England.
Bank Radio Astronomy Observatory near Manchester, England. Davies' approach was to redesign a Ferranti computer that had gone out of production to optimize its instruction set for fourth.
End quote. There are a number of references to this story in other texts, but,
you know, no actual references, no citations to papers, no actual papers I can find, and no details on the modified Ferranti machine.
The same paper also mentions a 1976 implementation by Standard Logic, but once again, I hit the source wall on that one.
Once again, I hit the source wall on that one.
The first well-documented fourth machine was the poetically named R65F11.
You get the joke?
I don't. It's an awful string of numbers and letters.
The chip was developed in the early 80s, with our main source here being a 1984 paper written by Randy Dumps. In the text, Dumps describes his
creation as a fourth computer on a chip. I think that's an apt way to look at it, but there is some
trickery going on here. You see, the R65F11 isn't using a special processor. Instead, this is basically a system on a chip with a built-in
program in ROM. The program is a small implementation of the fourth virtual machine.
The actual CPU is just a slightly modified MOS6502 with a chunk of read-only memory on the same die.
only memory on the same die. That ROM contains the simple program, only 2.5 kilobytes, that provides a low-level fourth environment. In theory, this is everything you need to use fourth for embedded
applications. You could even use this as the centerpiece for a full computer. In practice,
you would first use a specialized compiler to turn your 4th into bytecode.
That's what the little program, the 2.5 kilobytes on ROM, actually can read.
From what I've read, it sounds like this is almost a straight conversion from 4th words to a set of numbers.
You just need to get your code into a format for that tiny implementation of Forth.
Then you load that onto a separate ROM chip outboard that connects up to this processor,
and you let it run.
DUMS even describes how the microprocessor can be used as a standalone computer,
provided you add some RAM.
The chip is actually pretty slick.
It has facilities for handling keyboard
inputs and serial outputs. So in that sense, it really is just a tiny computer that's programmed
to run Forth. Well, cool. We can keep going deeper. The RF65F11 is really just software
in a fancy set of silicon shoes. What about real hardware implementations
of Forth? Well, this is also where I hit a little bit of frustration. You see, Moore was actually
involved with the first true Forth chip. The story of that processor was first chronicled in the
article, Fast Processor Chip Takes Instructions Directly from Forth,
which was printed in the magazine Electronic Design in 1985.
I can't for the life of me find a copy of that specific article
or this specific edition of Electronic Design.
It's just not around for me to find.
But hey, my heart will beat on.
Here's what I've put together from a series of later sources. Sometime in 1981,
Moore started developing the idea of a hardware implementation of Forth. The goal here being a
chip that could run Forth-like bytecode, so some silicon-level implementation of the already-specified fourth
virtual machine. The result of this research was a chip called the Novix NC4000, created by Moore
at a new company called, fittingly enough, Novix. The chip was completed in 1983, and we start to see press about that feat in the following
years.
So how exactly do you tailor a processor to a language?
And why would you even want to do this?
The first part here is relatively easy to answer.
Not all programming languages really scream out for custom hardware.
Take C for example. C is pretty well
suited to very pedestrian computers. It runs operations, it handles addressing code in a
normal random access memory space, all that jazz. I think you'd be hard-pressed to make a better
processor for running C code. Even just looking at Strux, the fancy
custom data type supported by C, it's clear to see that the language has been geared towards
conventional computer architectures. Fourth, on the other hand, is very much its own beast.
Most processors don't support two stacks. At least, that's a very rare decision for processor designs.
Most processors also aren't stack-based. There are a few, but they're exceptions.
Moore's NC4000 is tailored for fourth in two big ways.
in two big ways. The obvious part here is that the chip has two hardware stacks. That is,
it has two on-die stack pointers, one for the data stack and one for the return stack.
That's an easy win to open things up with, a very smart gambit to start. The cooler aspect of the NC4000 is its hardware-level support for fourth primitives and threaded code.
It's a pretty jargony explanation, I will admit, so let me break this up a little.
When you go down to the root of a word definition, that is, when you follow back the definition of
functions in fourth, you will eventually arrive at a small subset of words used to describe
everything else.
Those are called primitives.
Most languages have something like that.
In Lisp, you have a scant handful of functions that are used to build up the rest of the
language, for instance.
In the case of Forth, there are around 50 or so primitives to work with. In a traditional implementation,
Forth code will compile down to these primitives, which can then be executed by a virtual machine
or further compiled down to native code. These are small instructions, things like the operations to
access memory at a given location, add numbers, or to define a word. If you squint
a little or use your large imagination, I bet you can guess where this is going. You have a simple
instruction set, much like a processor's instruction set. Moore took this set of primitives and ported it straight over to silicon
I think that's pretty cool
He already had this existing specification used during compilation of Forth
Something deep in the here be dragons zone
So why not turn that virtual spec into a physical device?
It's the kind of thing programmers dream about, honestly.
The stack and physical primitives are two big ways that the NC4000 embodies Forth.
And I think they're the most clear examples of how you would make a processor better suited to
a language. I mean, it doesn't get much more clear than implementing part of a language
as literal silicon on a physical die. But there are more subtle parts of Novix's design.
Moore also built the NC4000 to be especially good at executing threaded code. In fact,
you could say the chip was built specifically to run threaded code.
To quote an article on the chip in Dobbs Journal,
One of the bits, the most significant bit, of the 16-bit opcode is reserved exclusively to
indicate a subroutine call. For all ordinary machine instructions, this bit is 1. But if this is 0, the NC4000 affects a
call to the subroutine whose address is specified in the remaining 15 bits. End quote. That's a bit
technical, but it's a concise explanation. Down on the processor level, each instruction, each tiny operation that the chip can do,
is encoded as a set of binary numbers.
For the NC4000, a 16-bit chip, those instructions are 16 bits long.
That's 16 ones and zeros.
How execution normally works is the chip fetches the next instruction from memory.
The instruction is put into the
instruction decoder, which tries to figure out what the instruction is asking for. It decodes
those ones and zeros. The efficiency of this decoding process is crucial for the overall
performance of the chip. Everything has to be decoded, so it affects the speed of every single operation.
A bit-by-bit comparison on silicon can be costly, so it's better to have simple instructions that are easy to parse.
In this case, Moore went with a really simple approach.
If the first bit, the first binary number of an instruction, is zero, well, the processor knows it's a call.
Parsing done. Instruction executed. Moving on.
Recall that in threaded code, almost every instruction is a call.
So this is a pretty sweet optimization.
Thus, calls are able to be executed in a single cycle.
That just means that calls were as fast as possible on the NC4000.
So, FORTH, once compiled for the special chip, could reach peak performance. That same Dobbs
article I quoted from claims that the NC4000 could run certain algorithms 10 times faster
than conventional chips. This Novak's design saw some production, but the real spread came from copies of the
chip.
Moore sold his plans to other companies, which then either produced the same design or made
their own improvements.
Most of the space computers I mentioned earlier use Harris RTX series chips, which are direct descendants from the NC4000.
Alright, it's time to set the stack aside. This brings us to the end of our discussion of 4th.
All things considered, I think this is a fascinating little language to look at.
Just the historic perspective alone is
interesting enough. Add in the technical details and, well, Forth becomes a language you can't
easily ignore. I think that's part of Forth's secret to success. It's a fascinating and truly
functional language. Its attributes help it fill a niche that I don't think any other language can
fit in quite the same way. It's small, agile, and more than a little idiosyncratic. I think these
factors, however, work in Forth's favor. It's unique, almost by design. But smart design keeps
that uniqueness from being a bad thing. The other really interesting point to note here
is how long Forth's development cycle really was. This wasn't a project that took, say,
a human year to hammer out. The language evolved as Moore gained more experience and moved jobs.
Lisp is this pure distillation of programming because it was carefully carved from earlier
work and existing theory.
Forth, in a similar way, shows a very raw and refined view of software development.
A key difference is that Moore slowly brewed his language, folding in new experiences along
the way.
I think that has led to a well-rounded and well-loved programming language.
Thanks for listening to Advent of Computing.
I'll be back in two weeks' time with another piece of computing's past.
And hey, if you like the show, there are now a few ways you can support it.
If you know someone else who'd be interested in the history of computing,
then why not take a minute to share the show with them?
You can also rate and review on Apple Podcasts and Spotify.
And if you want to be a super fan,
then you can support the show directly through Advent of Computing merch
or signing up as a patron on Patreon.
Patrons get early access to episodes,
polls for the direction of the show,
and bonus content.
You can find links to everything on my website,
adventofcomputing.com.
If you have any comments or suggestions for a future episode, then go ahead and shoot me a tweet.
I'm at adventofcomp on Twitter.
And as always, have a great rest of your day.