Advent of Computing - Episode 168 - Halt and Catch Fire

Starting point is 00:00:00 Computer trainers were this unique type of machine that became popular way back at the start of the microprocessor era. They're just really simple computers meant to get folk familiar with some new chip and excited for what's going to come out next. Trainers were single-board computers with a processor, a little bit of RAM, and some exposed buses. Your window into the computer was a simple numeric display, usually a seven-segment affair, and a numeric keypad. You also got a few extra buttons and switches to send control signals directly to the processor. So if you had a reset button, chances are that was wired directly to the reset pin of the processor on the board. If you've ever taken an electrical engineering course, there's a good chance you've built something like this. Learning the art of electrical engineering, the classic textbook for intro to EE courses,

Starting point is 00:00:55 has students build up one of these trainer-style machines. I can't remember if the text ever calls it a trainer, but that's what you end up with by the end of the book. Trainers are a little rough to use, but they're great educational tools. From your keypad and numeric display, you can surf and edit memory. You program by punching in machine code, by hand, in hexadecimal. It's just like the good old days when programmers were programmers, and no one had ever heard of rat for. trainers are also cheap, so that's a big plus. As undeniably cool as trainers are, they're subject to a frustrating failure mode.

Starting point is 00:01:38 A mistype would result in writing invalid machine code directly into memory. That's compounded by the fact that these cheap machines use usually pretty cheap keypad, so it can be easy to mistype. Add in the fact that these are really simple machines with no protection, and it really is just like the good old days. This can result in a crucial lesson for Acolytes. Not all code works. Remember, there are no guardrails.

Starting point is 00:02:10 If you slip up, your machine will irrevocably halt. Type a 9D and 79C. Well, it's all over for your session. Hope you have a button tied to the reset line or you can unplug the board easily. But this is a unique kind of bug for trainers, right? I mean, this would only happen on a pretty primitive computer with no guardrails. One where you have to type in machine code with a bad keypad, right?

Starting point is 00:02:37 It's not like your desktop could just die if it encountered the wrong number, right? Welcome back to Spook Month here on Advent of Computing. I'm your host, Sean Hasse, and this is episode 168, Halt and Catch Fire. Let me start with a question. How do you feel about undocumented features? Central to today's episode is the concept of undocumented features laying in weight inside a computer. I'm asking the question because, for me, the answer is kind of complicated. At one point, I was a professional Perl programmer.

Starting point is 00:03:23 That could also be the start of a horror. a story. Pearl, for those I don't know, is a language with a reputation for being incomprehensible. Some call it a write-only language, because once most code is written, no one can read it again. Pearl is complicated enough that it has this interesting kind of emergent phenomenon. Officially, it's called Pearl Secrets, unintended combinations of features that lead to unexpected consequences. Pearl secrets are so pervasive in the culture and in the code that some of them are even enshrined in official documentation.

Starting point is 00:04:04 But initially, these were undocumented features. Back when I was a young Pearl Hacker, I thought this was the coolest thing in the world. I mean, the language literally has secrets, and you could even discover new ones if you looked hard enough. I'm sure some of you are recoiling already, but never fear. My opinions have changed. I like to think I've matured as a programmer in the intervening years. This kind of undocumented behavior can be pretty neat to run across,

Starting point is 00:04:35 but during normal operation, it can be horrifying. Documentation is meant to be a map to a system. If you sail off that map, especially if you do so intentionally, because it looks so neat, well, you're kind of on your own. That can lead to all sorts of, shall we say, unique issues. Today, we're going to be looking at how an undocumented feature made its way into the world and what consequences it had. Specifically, we'll be taking a look at one of those features that became known as Halt and Catch Fire. In 1977, the magazine Byte published an article

Starting point is 00:05:16 by one Gary Wheeler. The title was Undocumented M6800 Instruction. The article described a number of secret instructions baked into Motorola's 6,800 microprocessor. Now, I'm going to give a little background as to why this is so interesting and so strange. To do that properly, I have to give you the op-code talk again. That seems to kind of be the theme lately, right? A computer encodes instructions, aka operations, as numeric codes.

Starting point is 00:05:51 That's why they're often called op codes, so we don't have to say operation all the time. Those codes are numbers with a set number of bits. We'll be disregarding more complex platforms that use variable-size-op codes. The 6800 encodes its operations as 8-bit numbers. That encoding sets a limit on the number of unique operations a computer can recognize. For the 6800, the limit is 256 different possible operations. Documentation for these older computers will lay out their operations in a grid. Think of it as a periodic table of op codes.

Starting point is 00:06:30 These grids are a handy quick reference for programmers, so you end up spending a lot of time staring at them. The 6800's grid has these gaping holes. It's actually kind of glaring. In total, 59 of those possible op codes are unused. So you have 59 little swatches of darkness in that periodic table of codes. Two immediate questions arise here. Why would you waste all that space?

Starting point is 00:07:02 And what does the 6,800 do when it encounters those unused op codes? As to why, well, that's pretty simple. Binary encoding is, by its nature, discrete, meaning you have to take data in certain specific sizes. The 6800 has 197 documented op codes. You have to use an 8-bit number to count that high. If you use 7 bits, you could only encode up to 128 instructions, which wouldn't be enough for the chip's design. So it's just a simple matter of sizing.

Starting point is 00:07:38 As to what happens, that's where we run into issues. Technically, these should all be treated as illegal instructions, basically if the process is asked to run Operation 9D, and it's not in the big table and its Silicon Heart, well, it should throw an error. More modern chips will do just that, raising some kind of illegal operation exception that can then be handled with some software. An easier solution, and one that shows up in earlier processors and earlier computers, is to make an illegal operation act as a no-op.

Starting point is 00:08:14 That is, an operation that does nothing. the nothing operation. It takes extra silicon to raise exceptions, so this kind of error handling is reasonable. Under that regime, if you asked the 6,800 to execute Operation 9D, it would see it's not in the table, and do nothing and then move on. Simple.

Starting point is 00:08:37 But what actually happens? Motorola's own documentation just says that these missing instructions are quote-unquote unassigned. What does that mean? Wheeler, it turned out, got a little obsessed with that ambiguity, to quote from bite. The mystery of those holds held my attention until the suspense was unbearable. To satisfy my gnawing curiosity, I executed those codes deliberately defying man and Motorola. And I got some interesting results, end quote.

Starting point is 00:09:12 He sat down and ran through the whole list of operations, only executing those that had no documented name. He found that Minnie did act as no-ops. However, some of these illegal operations would actually execute. Some were pretty mundane and maybe a little bit useful. Wheeler found a secret operation to and to accumulators and a few new forms of load in store. and then we get to 9D. That instruction breaks the 6,800. Okay, it's a little more complicated than that.

Starting point is 00:09:49 If you try and make a 6800 run-off code 9D, in exodicimal, of course, it halts. The machine stops running completely. No new instruction will be read. It's all 9D until the chip is reset. If this were to happen on, say, your computer, then you're screwed. You have to literally turn the thing off and on again. But that's not all this operation does. As Wheeler explains, the actual result of 9D are impossible to notice on a normal computer.

Starting point is 00:10:26 You have to actually have some way to read data directly off the pins. If you could, you'd see something weird. microprocessors use this thing called the address bus to tell memory which address it wants to access. It's actually just a pile of pins that the processor uses to send out a binary number. When 9D starts running, the 6800 starts incrementing that address on the bus, and it does so as fast as it can. It essentially just counts up really quick, and then once it hits the max value on the address bus, it resets to zero and continues counting up again.

Starting point is 00:11:05 Crucially, it does nothing else while it's incrementing that bus. In a normal computer, that serves no purpose. It just kills the machine until it's reset. Wheeler named this newfound instruction halt and catch fire, or HCF for short. That's because, you know, it halts. Then the chip starts running real fast.

Starting point is 00:11:30 It's a funny name, and it's stuck. ever since. But there are more mysteries here. The first is, of course, why does this even happen? What's going on over at Motorola? The second is, where exactly did this name come from? I'm going to play the name game first, well, because I feel like it. Plus, this gets us into the weird realm of rumor. The phrase halt and catch fire is actually older than the 6800. But it's origins, Its true origins are a mystery. To really set this up, I need to introduce one little rumor. That's that the IBM System 360 had a halt and catch fire bug.

Starting point is 00:12:12 Now, I've been beating my head against my keyboard, trying to find any corroboration of this rumor. I even resorted to scouring old Usenet posts looking for old heads talking about their days at Big Blue, but to no avail. That's frustrating because you can find many spots online, that claim there was some kind of bus issue with the System 360 that could lead to an HCF, but there's no sources I can find to back that up.

Starting point is 00:12:39 There's also an utter lack of specificity in that rumor. So, a little technical detail here, but the System 360 family of computers were implemented all with completely different hardware. They're just all software compatibles. So saying the System 360 had a hardware bug is really, not giving you much information. That just means one of a couple dozen IBM computers was a little buggy. What I can find, however, is the equivalent of one of those copy pasta memes. Let me give you the earliest example, because that should explain the template. This is from a 1967 issue of

Starting point is 00:13:20 datamation, quote, proposed system 360 instructions. Many suggestions have been received for additions to the System 360 instruction list. Most of the entries in the collection came from the San Francisco Bay Area ACM newsletter, the bit dropper. What follows is a list of funny machine code instructions.

Starting point is 00:13:42 Some of my favorites are EPI, execute programmer immediately. DMPK. Destroy memory protection key. EIOC. Execute invalid op code, and, of course, HCF,

Starting point is 00:14:00 halt and catch fire. The format is almost always the same. It's titled Some Proposed Instruction Set for a Computer or a Language, or a list of forgotten machine instructions. Then there's some blurb

Starting point is 00:14:16 saying it was copied from some other source and usually some kind of setup for the list. Then we have the list itself. The list changes and grows over time. but HCF and a few others, including EPI, my personal favorite, are always in the list. My favorite example is from a magazine that I keep running into lately. The zine is the NSA's cryptography.

Starting point is 00:14:42 It's a partly classified magazine that the NSA published internally back in the 60s and 70s. The scans that I have of this are pretty funny. One issue I was reading had some type-in programs with, redaction boxes over part of the code. Here's the same copy pasta from a 1975 issue of NSA's cryptography. Quote, new programming instructions. The following item is reprinted from the April, May 1975 issue of C-Liners, C-group machine process information bulletin.

Starting point is 00:15:19 The Stargate Advanced Programming Language Study Group has been holding regular planning sessions aimed at the development of advanced programming techniques for the forthcoming System 1776 computer network. Some of the proposed new instructions have been noted in the minutes of the session written on soggy beer mats and are presented for comments. The study group would welcome any additional instructions which should be sent to the editor, C-liners. End quote.

Starting point is 00:15:49 This is classic hacker humor, and it shows up all over the place. You can even find websites that have the same list. My best guess is that the list initially came from that ACM bit dropper newsletter. That's the earliest citation I can find, but if I could get my hands on that newsletter, it may well open with a citation to an earlier source. Halton Catch Fire starts as a joke. It spreads, and then Wheeler sees a phenomenon in the wild that looks like an HCF. I mean, it's a magic construction that comes. causes the computer to halt and catch on fire.

Starting point is 00:16:29 It must have helped that, despite being a joke, there wasn't anything specifically called HCF in the wild. So he applied the name, and it stuck. This is art becoming reality. Now, as for the why, why did the 6800 do this? Wheeler actually speculates that it was part of some internal test routine that was accidentally left on the chip. But that's not the case.

Starting point is 00:16:54 Here is, I think, my favorite twist in the story. In 1985, R. Gary Daniels and William C. Bruce of Motorola published an article titled Built in Self-Test Trends in Motorola Microprocessors. It turns out that Wheeler's obsession started a bit of a nightmare at Motorola. I'm going to pull an extended quote from that article because they just say it so well. Of note, the official term used by Daniels and Bruce is Halt and Catch on Fire, which they abbreviate as H-A-C-O-F. I'm just going to call it hack-off.

Starting point is 00:17:35 I think that's in spirit. Quote, in 1975, we moved the M-O-S operation from Phoenix, Arizona, to Austin, Texas. Although the date was set far in advance, we wound up moving in the middle of a recession, which almost put us out of the microprocessor business. not only were there no built-in self-test developments in that period there were almost no developments at all to add insult to injury we discovered that we had an illegal hack-off an instruction that our customers found on the mc6800

Starting point is 00:18:12 it was an unused op code an illegal instruction when executed inadvertently the program counter would increment indefinitely The problem, which was caused by incomplete op-code decoding, was a nuisance, because reset was the only means of terminating the instruction, end quote. That sounds like a horrible time. Imagine you're moving with your entire company. That includes all the employees. Profits are down, things look grim, and then one of your coworkers hands you a copy of Byte magazine for some nerd, wrote an article about an op code that breaks your microprocessor. That's a recipe for a large

Starting point is 00:18:59 bar tab, perhaps some notes scribbled on soggy beer mats, and maybe some gray hairs. What Wheeler attributed to smart planning was actually explained in more simple terms. Holt and Catch Fire was a straight-up bug. It was 100% unintended and 100% snuck past Motorola's poorly planned to test procedures. The team at Motorola had the chance to correct this bug. In 77, they were developing a revision of the 6,800,

Starting point is 00:19:29 and they had a solution ready to roll, but at the last minute, it was decided to keep Hackoff and make it useful. It turned out that that rapid address bus counting was perfect for testing RAM and ROM chips and kind of working out the bus.

Starting point is 00:19:46 Thus, Hackoff became an official part of Motorola, Motorola's internal test regime. We start with this copy-paste joke that circulates around the computer community. Then we get a guy in bite naming a bug in the 6800 after that joke. Then we get Motorola using that newly discovered bug officially. That's probably the coolest feature development lifecycle I can think of. That's the big brand name HCF.

Starting point is 00:20:17 It's a secret instruction that, if ever triggered, kills your computer for a little bit. You may be wondering, though, just how scary was this instruction? That's the thing. It would have been odd to actually trigger an HCF by accident. However, there are a few possible scenarios where a normal user could halt and, yes, perhaps catch on fire a little. The thing is, most failure modes that would result in an HCF are part of larger errors. On these old microprocessors, there's nothing to stop you from just executing junk code.

Starting point is 00:20:53 You can quite literally just tell them to jump off to a random point in memory, to jump off a cliff, if you will. If you jump to the wrong spot, you can encounter data that contains something the computer reads as an HCF. By the same token, a bug in a compiler or assembler could result in an HCF being generated. That jump off a memory cliff scenario would be especially common. It's very likely that folk had encountered an HCF well before Wheeler's article put a name to the op code. But that's still all accidental. A final question remains, could HCF be used for evil? In theory, yes.

Starting point is 00:21:38 But as far as the brand name HCF goes, probably not. The Motorola 6,800 was released in 1974. That's a period where computer viruses don't really exist yet. We have a few scattered examples, and we do have some funny hacks out there, but not really microcomputer viruses. We're just a little too early for that period. We can look ahead a little bit. Halt and Catch Fire isn't specific to the 6800.

Starting point is 00:22:08 This is a whole class of bugs that can happen on many machines. The MOS 6502, for instance, also has an illegal op code that halts the computer until it's reset. It's called jam or kill. And it doesn't have the counting side effect, but the user would see the same experience. If they hit one of these instructions, the computer would stop. Some processors in the x86 family, specifically the 8286, also have illegal off codes that cause an HCF-like scenario. An issue with the search is that more sophisticated processors actually handle illegal operations better. On newer chips, if an illegal op code is encountered, an error is raised.

Starting point is 00:22:54 That can be used to trigger an error handler so the computer can recover. That means we only get this slim amount of time where a virus could actually exploit an HCF op code. From my searching, I can't find any evidence of viruses using these illegal op codes in the 8-bit. era. I think there's one big reason for this. It was a very, very dumb idea to actually trigger an illegal op code. Why? Well, that has to do with a simple reality of the microprocessor. They were easy to revise and change. A lot easier than earlier computers. Micro processors were always meant to be mass produced, and I do mean mass. Earlier computers were made in large numbers. That's true, but not near the same volume as chips. A foundry

Starting point is 00:23:45 can turn out a wild amount of microprocessors. And that was just the business model. Make as many cheap computers as you can. By the same token, micros are also subject to iteration and kind of rapid iteration at that. As new technologies and new processes are developed, companies will revise and update their microprocessors. That's not to mention second sources or compatible chips. The 6502, for instance, had at least 30 variations over the years. Each of those would run 6502 code, but that has a very specific definition. When we're talking about compatibility, we mean documented features. HCF, Jam, Kill, or other illegal op codes are by their nature not documented. They're not in the spec sheet. The result is that different revisions of the same chip

Starting point is 00:24:38 don't necessarily react to illegal operations in the same way. There is zero guarantee that you can jam a 65C2 in the same way as a 6502. By the same token, there's no guarantee that a 6502 produced in 1975 is the exact same chip as a 6502 produced in 1978. The mask used to etch the silicon could have changed slightly. Bugs were often patched out silently. It wasn't a good idea to exploit these off-codes. There was no guarantee your sneaky and smart code would actually do anything.

Starting point is 00:25:16 On the flip side, this means you never exactly know if your computer is susceptible to halting and catching fire. All right, that brings us to the end of this spooky episode. A halt and catch fire is one of those interesting cases where a joke became a reality. This is a phenomenon that happens pretty often in computing. Personally, I think it's just a matter of statistics. Computer folk have created a huge corpus of jokes, folklore, and satire. The canon is truly massive. That gives us this very rich shared culture to draw from.

Starting point is 00:25:57 Sometimes a lucky programmer gets the chance to use that shared culture to name something new and exciting. We should all be so lucky to experience that excitement one day. On the flip side, halt and catch fire is a pervasive type of bug. Many computers, even some in the modern day, are susceptible. It's just that the trigger can vary widely and isn't always reliable. So go with caution and maybe be careful when you write machine code. Thanks for listening to Abdon of Computing. I'll be back next week with the final episode for Spook Month 2025.

Starting point is 00:26:36 Until then, you can find links to everything. over at advent ofcomputing.com.com. And as always, have a great rest of your day.

Advent of Computing - Episode 168 - Halt and Catch Fire

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.