Advent of Computing - Episode 67 - Debug!

Starting point is 00:00:00 The first 90% of the code accounts for the first 90% of the development time. The remaining 10% of the code accounts for the other 90% of the development time. That quote is traditionally attributed to Tom Cargill of Bell Labs, and if you've programmed in any kind of capacity, then I'm sure this calls forth some bad memories. Now, let me just preface this by saying that I love programming. It's a passion and a profession for me. You could say I'm very serious about my programming. That said, it can be just awful sometimes.

Starting point is 00:00:39 It can really suck. Projects can start off fun, easy, and exciting, but it's always that final run-up to completion where everything seems to fall apart. You realize that crucial parts of your program are suddenly broken, or that that last feature you need to add actually ruins some core functionality of your design. By the end, you don't even know how your code is still working sometimes. In most cases, I think there should be a rider slapped on Cargill's aphorism. That last 10% of code is usually actually spent debugging. A program of any size will have at least one bug in it, and as programs grow bigger, they make space for many, many more bugs to come home.

Starting point is 00:01:29 For me, it's the worst when I'm working solo on some project. I can only blame myself, and every time I uncover a new issue, it's like finding hate mail sent from my past self. Rooting out every bug can very easily take up that other 90% of development time. No matter how you look at it, bugs are scary. But debugging, trying to get rid of those bugs, that can be just as frightening.

Starting point is 00:02:04 Welcome back to Advent of Computing. I'm your host, Sean Haas, and this is episode 67, Debug. This is going to be part of our continuing spook month celebration here on the podcast. The scare on offer today is a dive into a programmer's least favorite task, debugging. If you talk to anyone really who works with computers, either professionally or as a hobby, bugs and debugging are bound to come up eventually. Bugs can show up in hardware, but I think just due to the fact that most computer folk are actually software folk, bugs are more commonly reported in code. In short, a bug is just some

Starting point is 00:02:44 kind of mistake or issue that causes something to function in a way that was not intended. The thing about bugs is that they just suck, and they tend to suck on a sliding scale. On the bottom, you get tiny and easily fixable mistakes. Think of something like messing up the order of operations in an equation. You can just fix that with some parentheses and a few seconds of typing. On the exact other side of the spectrum are bugs that are difficult to detect and difficult to fix. It's not unheard of to run into a bug that's so firmly rooted in your program that you end up having to rewrite big swaths of code. And, of course, there are bugs everywhere in between those two extremes. It's almost unsurprising, then, that the process of dealing with bugs

Starting point is 00:03:32 has become a core step in the process of programming. Some programmers even specialize in just hunting for bugs. It follows pretty logically that the process of removing bugs is, you know, it's a big deal. Bugs are just a fact of life for us digital denizens. As long as there's new code or new hardware, there will too be new bugs. Accordingly, folk have put a whole lot of time and effort into crafting tools and techniques specifically for rooting out and destroying bugs. The entire process is known as debugging, and eventually tools called debuggers were developed to help automate at least some parts of that task. Perhaps, also unsurprisingly,

Starting point is 00:04:17 there's a lot of mythologizing around computer bugs. I think this comes down to the shared experience aspect, plus the unexpected nature of a lot of bugs makes them into good stories. Every programmer will have to debug their code at some point. You run into this really weird situation where your code, which on some level you kind of have to think works, just doesn't do what it's supposed to. It's almost like the computer has decided to just disregard your carefully constructed incantations. It's this mix of a loss of control and a break from the expected logic of digital machines. By the end of the process, you usually learn that everything was your fault, like always. Truly, the programmer is a programmer's worst enemy. So today, to get

Starting point is 00:05:08 all spooked and scared in the spirit of the season, we're going to be breaking into the world of bugs and debugging. My approach here is going to be twofold. I want to figure out where the term bug comes from and how it started being used in the digital context. Then I want to look at how early programmers started to combat this self-induced menace. To start with, we need to burst some bubbles. It's no secret that I'm a huge fan of Rear Admiral Grace Hopper. She's probably one of my favorite historical figures in general. That said, she did not invent the term bug or debugging. It's a myth based off some bad interpretation of sources. But this myth has persisted and remained popular for decades, so I think we do need to address this. The whole story traces back to an event that happened at

Starting point is 00:06:00 Harvard on September 9th, 1947. That day, a group of researchers were working with the Harvard Mark II computer. At the time, Hopper was part of the research team, but it's a little bit ambiguous if she was in the lab on that exact day or somewhere else. Anyway, the crew noticed that the Mark II was having some kind of issues. This was a really early computer, so it suffered from a different set of ailments than modern machines. It wasn't long before the team started looking over the Mark II's hardware for anomalies. Now, the Mark II was an electromechanical computer. This mainly just meant that instead of transistors or vacuum tubes, it used relays. A relay works by using an electromagnet to physically close a spring-loaded contact. If treated like a black box, they function exactly the same as a

Starting point is 00:06:53 vacuum tube. It's just an automated switch. Tie enough together and you get logic gates. Add more to perform math, and then still more, you have a computer. Or, in this case, you have the Mark II. It's nice to ignore details sometimes, but this isn't a case where we can do that. The physical implementation of relays does, in fact, have an impact on their operation. There are moving parts inside a relay, specifically the little contact arm that slams open and shut. Over time, relays can wear out, parts can break, and they will eventually stop working if they're used enough. The other little detail here that's really important is that at least some of the relays in Mark II were open, meaning that they didn't have any casing covering them.

Starting point is 00:07:42 In other words, the mechanical bits were fully exposed to the open air. Hopper would later talk about the fated event in a number of interviews and lectures. In a 1968 oral history taken by the Smithsonian, she recounts the event like this, We were debugging Mark II. It was over in another building, and the windows had no screens on them, and we were working at night, of course, and all the bugs in the world came in. End quote. The solution to this bug was easy. Simply remove the carcass and get on with tests. But the team did one better. One of the researchers plucked out the moth and taped it into the

Starting point is 00:08:32 computer's logbook. So we have this lovely entry from 3.45pm on September 9th that starts with an actual moth taped down to the page. The rest of the entry reads, quote, Relay number 70, panel F, moth in relay. First actual case of a bug being found, end quote. The note isn't in Hopper's handwriting, so at most she was around while this disaster resolved itself. That said, she got enough of a kick out of the real-life bug that she would go on to spread the story around.

Starting point is 00:09:09 In a lot of ways, Hopper was the prototype of a modern programmer. She had this self-awareness of the absurdity of the discipline. Finding a recently living bug gummed up in a million-dollar machine must have had her in stitches. But this wasn't the first time bug was used to describe an issue with machinery. I mean, we can see that right in the log entry, and I think this is why I'm so bugged, if you'll pardon the pun, when people bring this up as the invention of the computer bug. The entry very clearly says it's the first case of a

Starting point is 00:09:47 quote, actual bug being found. It was funny to Hopper and to the Mark II team because bug was already somewhere in the lexicon. Otherwise, it would just be a kind of chuckle-worthy occurrence. That said, Hopper was a staunch supporter of the term bug as it applies to computers, and I think this is partly because she found the term so funny. Hopper even doodled little cartoon bugs in some of her notes. And I'm very serious about this one. I ran into a few pages of these drawings back when I was compiling my episode on COBOL. few pages of these drawings back when I was compiling my episode on COBOL. My favorite has to be a sketch of a paper tape worm. I think it's really likely that Bug entered the lexicon of a lot of young programmers thanks to Hopper. Throughout her career, she had a focus on

Starting point is 00:10:36 education and outreach, which put her in the perfect position to boost the popularity of the computer bug. And I mean, come on, it is funny to think of a bug destroying a big expensive computer. Now, it turns out that the word bug, at least when used to reference some type of issue, predates computers considerably. But the exact origins are somewhat unknown. The jargon file, my favorite compendium of computer folklore, gives us a good starting point for our bug hunt. Citing the file as reproduced in the New Hackers Dictionary, quote, use of bug in the general sense of a disruptive event

Starting point is 00:11:18 goes back to Shakespeare. In the first edition of Samuel Johnson's dictionary, one meaning of bug is a quote, frightful object, a walking specter. This is traced to bugbear, a Welsh term for a variety of mythical monster, end quote. The citation made here was to an early English dictionary printed in 1755, just a scant few centuries before a moth was beaten to death at Harvard. The full definition from Johnson's dictionary is a little longer. It reads, quote, a frightful object, a walking specter, imagined to be seen, generally now used for a false terror to frighten babes, end quote. Now, as fanciful as the early definition may be, that does really

Starting point is 00:12:07 capture the feel of a computer bug. I mean, we can just look at the Mark II moth as an example. It was a frightful object to be sure, but it was ultimately not a serious one. It was eventually found and repaired, then it was gone. In general, bugs have this transient aspect. Call it an ethereal existence if you want to be a little poetic with it. They're temporarily frightening, but they don't last forever. Even in the event of major bugs that cause serious damage, the bug is eventually found and exercised. They come as walking specters imagined to be seen. I know I'm going off track, but the etymology here is also pretty interesting to me. The Welsh bugbear mentioned isn't anything like the more modern fantasy conception of a furry bear-like monster. The original bugbear was a more

Starting point is 00:13:06 ill-defined bad-doer, something like a boogeyman or a monster under your bed. I just really like how that fits with the idea of a computer bug. A bug can very much be a boogeyman waiting to getcha. Bugs would make the cross from the abstract to the more concrete... eventually. Once again, since this is a more colloquial than technical term, it's hard to pin down an exact date. The most common story I see, after Hopper's Moth, is that Thomas Edison coined the term sometime in the 1870s to describe issues in telegraph systems. But once again, I want to throw in a lot of caveats. I'd rather put this as Edison adopted the term bug. The story goes that in 1873, Edison was contracted by Western Union to design a quadruplex telegraph system.

Starting point is 00:14:05 That is, a system that could send four telegraphs at once. If you've listened to my very recent episode on telegraphy, then that should sound familiar. Edison's system would eventually be superseded by Emile Badeau's more generalized approach. Anyway, Edison's master plan was to combine two existing technologies, diplex and duplex telegraphs, and by doing that he'd get a total of four channels of somewhat simultaneous communication. Now, those names are frustratingly similar, so let me break that down because I think the details here are important. Duplex telegraphs, that's with a DU, can send and receive two messages simultaneously. This is usually done

Starting point is 00:14:53 with time division multiplexing, where incoming and outgoing messages have assigned time slots on the line. This is a little hand-wavy, but it'll do for this. Diplex systems, that's with a DI, accomplish a very similar feat using frequency division multiplexing. This puts different signals in different frequency bands. Once again, really abbreviated explanation, but that's the gist of it. By using both techniques at once, Edison figured that you could send a total of four messages simultaneously. During the development process, Edison was running into a recurring issue. He was finding that diplexed signals had a habit of breaking early. The system was set to be transmitting Morse code, so an early break could turn a dash into a dot. Thus,

Starting point is 00:15:46 the message would be rendered useless. It's unclear if Edison called this issue a bug from the beginning, but by at least 1876, he started referring to bugs in his notebooks. Now, we don't have an explanation as to why Edison started calling these glitches bugs. We don't have an explanation as to why Edison started calling these glitches bugs. We don't have an explanation as to where he learned the term. But by the 1870s, it was already part of English slang, so he may have just been applying an everyday word to his particular niche. Whatever the case, Edison ran with it. Soon thereafter, he would start a fond tradition

Starting point is 00:16:26 that is carried over into the realm of programmers, blaming delays on bugs. In an 1878 letter to the president of Western Union, Edison wrote, quote, you were partly correct. I did find a bug in my apparatus, but it was not in the telephone proper. It was of the genus Colbellum. The insect appears Now, this letter is lacking some context. Mainly, I haven't been able to find the correspondence it was responding to. That said, the letter makes it clear that Edison had ran into some type of delays. Why? Well, he ran into some bugs. Even better, those bugs were no fault of his own, but an issue with the telephone system.

Starting point is 00:17:17 The specific instance is cited in a lot of places as an early use of the term bug, so I think it's at least worth mentioning. However, I think this quote isn't always addressed in the right way. Back in 1878, we can see Edison describing bugs in a way that any programmer should be familiar with. The whole blaming delays on bugs isn't just a joke. It's also a fact of life for programmers. Bugs are a very real thing that plague anyone working with any type of complex system.

Starting point is 00:17:53 And they're also something of a meme. If you have some unresolved problem, you can just blame it on a bug. Sure, you probably made or contributed to that bug, but naming it so helps you ignore some culpability. My code isn't full of mistakes, it just has some bugs that I'm working out. In that sense, a bug does work as a great technical boogeyman. The popularity of bug traveled out from Edison into telegraphy at large. Glitches on telegraph lines started to be called bugs around

Starting point is 00:18:26 that same time, around the 1870s or so. From there, its technical use became more mainstream. The IEEE Annals article, Stalking the Elusive Bug by Peggy Kidwell, gives a fine-grained account of this transition. To shorten things, the definition of bug in dictionaries starts to shift around the 1880s. Besides referring to some ambiguous insect or a spook, bug starts to also mean a quote flaw in a machine. Kidwell further points out that bug was being used in reference to computers well before 1947. In 1944, the Mark I computer was just starting to function at Harvard. This computer has a bit of a complicated origin story. It was designed by Howard Akin at Harvard, then built by IBM.

Starting point is 00:19:24 So even though it's associated heavily with Harvard, it was technically IBM hardware. Once IBM finished building the Mark I, they had to help install it back on campus. Logs were kept during this process, and one entry from April 1944 is of particular interest to us. Quote, Rand test problem. Mr. Durfrey from IBM was here to help us find bugs, end quote. The whole point that I'm trying to drive home is that bugs were never unique to computers. The concept was borrowed from an existing lexicon. That said, bug does fit really well for what it's come to describe.

Starting point is 00:20:06 Bugs are boogeymen hiding inside digital circuits. And it wouldn't be very far into the computing revolution before us cunning humans tried to find a way to do battle with our new foe. Perhaps it should come as no big surprise that the first attempts at debugging went hand in hand with some of the first attempts at programming. Rule number one for this episode, and for life as a programmer, is that where there is code, there will be bugs. The interesting part here is that early on, computers weren't really programmed in a familiar way. To talk about the roots of debugging, we gotta talk about ENIAC. Now, there's a whole lot you can say about ENIAC. It wasn't, strictly speaking, the first computer out there. I think there will always be debate

Starting point is 00:20:54 about the true first electronic digital computer. It also wasn't the most flexible machine, it only really did decimal arithmetic. And crucially for today's discussion, it wasn't a storage program computer. Now, this distinction has a number of really important implications. First being that ENIAC didn't really execute machine code, at least not in the sense that we would understand it. Programs were entered by chaining together parts of the computer with patch cables. Each component in the system accepted some kind of inputs, operated on that number, and then produced an output. Chain that together, an addition here, a comparison or multiplication there, and you can actually create a program.

Starting point is 00:21:47 there, and you can actually create a program. The other oddity here is that ENIAC was a parallel machine. Normally, computers work in serial, meaning each instruction you give it runs one after another after another. Ah, but ENIAC didn't deal in instructions. It was programmed by connections. Each operation was carried out by its own circuit, so there's no reason not to run a multiplication at the same time as a few additions. The only serial part of the computer was a clock cycle. Each time the computer's clock ticked, that signaled to all circuits to perform their respective operations. So while you could run two operations at once, you were still able to form chains and sequences of operations if you wired things correctly. At this point, there really wasn't any prior art to work off, so I think we can forgive ENIAC for its high strangeness. The result here is that programming ENIAC came with its own unique challenges.

Starting point is 00:22:48 This would also open up some unique opportunities. And it all comes down to the process of programming ENIAC. The machine had a dedicated programming staff. This team was initially composed of six so-called computers, humans that carried out mathematical computations prior to the advent of actual electronic computers. In 1944, some of these human computers had experience using mechanical calculators. Some even used early analog computers, but no one had ever tried programming one of these new digital machines.

Starting point is 00:23:25 No one had ever needed to program before, so it was all untrodden ground. Recoding Gender by Janet Abate really clued me into something that I never thought about in regards to the early era of programming. In the book, Abate pulls from a number of depositions that I just can't find scans of, so that's one reason for the secondary source here. But Abate's analysis also hits on a fact that I've just never thought of before, I've always overlooked. She explains that before ENIAC was ever put through its paces, before it started turning out actually new and exciting calculations, the small software team had to develop tools to aid in programming the machine.

Starting point is 00:24:11 Gene Jennings, one of the proto-programmers, explained it this way, We ran a number of problems to find out how to use the machine, to test what interval of integration gave us the most accuracy if you considered round-off error and truncation error, and we were trying to determine what interval we could use in order to minimize the effect of truncation error without building up the round-off error. The view was to establish some procedure that gave us the most overall accuracy, end quote. Being a human computer was all about working out numeric solutions and results for mathematical problems. For any useful equations, as in anything involving very complicated physics, you end up having to use

Starting point is 00:25:00 numeric approximations. Approximations introduce errors, and human computers were skilled in how to deal with and minimize those errors. ENIAC added a new variable into the mix, so the first programs were tools for figuring out how to deal with this new variable. You could almost call this kind of approach metaproprogramming. The ENIAC team was writing programs to help them write better programs. What's really intriguing here, and what I find so cool, is that we're seeing an adaptation of pre-computer techniques. Like I said, best practices for numeric approximations were already known, at least in the realm of existing technology.

Starting point is 00:25:45 The leap being made here is to using new techniques, in this case programming, to make a new set of best practices. I also like this because it foreshadows some later developments. So the zeroth step in programming ENIAC was profiling the machine. Let's check that off the list and move on to the next step. Flowcharts. Now, armed with this new information about how ENIAC could handle certain math methods, it was possible to start planning the actual program. No code and no other computers meant that this was a pen and paper project. Ruth Lichterman, another programmer

Starting point is 00:26:26 on the ENIAC staff, described it as diagramming. Essentially, they worked out how data would flow around ENIAC, which numbers needed to be passed into which circuits at which time. Some of these diagrams have been preserved and you can find them online. These are dense, to say the least. find them online. These are dense, to say the least. The process started with simple partial diagrams, then worked towards a full wiring diagram for the computer. For those versed in the strange ways of ENIAC, these diagrams were as close to code as you could get. It described the entire program completely. The next step in the process was to actually apply those diagrams to ENIAC. This was called plugging in, since, you know, you had to plug in a whole lot of cables over and over again. In this sense, the finished and wired up ENIAC was roughly analogous to an

Starting point is 00:27:21 executable. It was just waiting around to be run. And, as is the law of the land, there were bound to be some bugs lying in wait. So that meant long hours of debugging. But this wasn't an unguided affair. ENIAC's programming team had already developed tools for profiling their new machine. A similar approach was used for software debugging. Now, there is a difference here. We aren't looking at a full software-based debugging tool. We're still far too early for that. The team would end up developing techniques for effectively debugging ENIAC, and those specific techniques would end up becoming much more generally applicable over time.

Starting point is 00:28:06 Betty Hall Burton was another member of the programming staff, as, an aside, she would later write the first generative programs, which would go on to inspire some of the first compilers. Anyway, in a 1973 interview at the Smithsonian, she explained debugging ENIAC. Quote, when you're building up to an error where it might take a half minute to get to the point where you really want to stop it, you could look at the thing and have someone stop the machine at the point where it was, you know, getting close to going negative or whatever it was that you wanted to do. You could, you know, you could see it. End quote. There's this old joke about big computers just being panels covered in blinking lights. Well, that's partly true. What Halburton is referencing here, when she says you could see it,

Starting point is 00:28:56 is the fact that ENIAC was dotted with indicator lights. Accumulators and counters, the components that stored numbers inside ENIAC, all had readouts where their current values were displayed. At every step, programmers were able to very physically walk around ENIAC and see how calculations were progressing. And once things started to get off, you could quote-unquote pull the plug, so to speak, and try to figure out what was going wrong. This technique is called breakpoint debugging, and believe it or not, ENIAC actually had some hardware support for this, at least in a manner of speaking. A 1944 report describes this feature in some detail. Quote, frequently it is more convenient to proceed through a portion of the computation with the

Starting point is 00:29:46 ENIAC operating in its normal or continuous mode, and then switch to one addition time or one pulse time operation, than it is to progress through the entire computation non-continuously. This may be arranged by disconnecting the program cable which delivers the pulse used to initiate the programs which are to be examined non-continuously. We call this point where the program cable is removed a breakpoint. This gets back to the clock pulses that drive ENIAC's operation. All calculations occur on the edge of a clock pulse. Think of it like the computer's heartbeat. Every tick of the clock signals to ENIAC's circuits that it's time to perform their operation. Remove the cable feeding this pulse to the machine

Starting point is 00:30:32 or to just one component of the machine that you want to examine, and ENIAC will stop running. That's the breakpoint. When you reach a specific point in the program, then you break ENIAC's normal operations. From there, you can examine values, maybe switch around some cables, or even reconnect the clock. Specifically, and what's really crucial here, is, like I said, you could unplug a single component of ENIAC.

Starting point is 00:31:08 a single component of ENIAC. So if, for instance, you know you're storing a value in the fifth register, and once that value got to the fifth register, it's wrong. Before you start running, you unplug the program cable going to the fifth register. So once the data gets there, it stops. That's the breakpoint. That's where you can start doing a postmortem on your program. The other support for this feature set is what the report calls one addition time or one pulse time operation. This is more commonly known as stepping because it lets you step through your program one operation at a time. On ENIAC, this was implemented with, simply enough, just a big button. To start stepping, all you had to do was turn off the usual program clock, then just bang that button every time you want to step the program forward. That would manually generate a

Starting point is 00:31:59 clock pulse, allowing you to watch ENIAC run in this slow motion. It was this combination of physical breakpoints and stepping that really made debugging on ENIAC feasible. Recall that ENIAC didn't really do code. You didn't have any way to specify which line of your program you wanted to break at. I mean, you didn't really even have lines to break at. So the programmers had to rely on reflexes and experience. They had to see when the program was about to hit the crucial step and then pull the plug, or they had to follow the flow in just the right way that they could unplug a component to set a breakpoint. That wasn't guaranteed to always be at the right time. Since ENIAC could be switched to single-step mode,

Starting point is 00:32:49 that breakpoint could be initiated a little early just to be safe. Then the computer could be plugged back in and stepped up to the actual operation of interest. By breaking and stepping, it was even possible to track down bugs that weren't entirely pinned down. Now, we could spend all day talking about how ENIAC was deeply and profoundly different than modern computers. How it wasn't programmed like anything before or after. How it wasn't operated like anything else. All of that is true. But here's the wild part. This type of breakpoint

Starting point is 00:33:28 debugging that was developed on ENIAC is basically the same as debugging methods that are still used today. ENIAC's peculiarities make that all the more wild. In general, the modern approach is to set a breakpoint and then inspect variable states to see what went wrong. That's exactly what was being done here on ENIAC all the way back in 1944. The big bug question that remains for me is, was ENIAC where computer debugging really started? I don't always like to go looking for firsts without reason, but I think this question does have a good reason. To me, ENIAC seems uniquely set up to facilitate this very specific kind of debugging. Programmers could literally step around inside the machine and see what was happening. I can't stress enough how cool that is to me. Holberton was able to walk up to a register and physically look at its value,

Starting point is 00:34:35 then move down a few feet and see the value of the closest adder. These early programmers had a level of control and insight into the machine that you just can't have today. The patch-cable-based existence of ENIAC also made breakpointing simple, albeit a little finicky. Pulling and rerouting cables was how you programmed ENIAC, and pulling and rerouting cables was how you debugged ENIAC. There's a certain symmetry there that I really like. It makes debugging part of the same exact process as programming. And, you know, that's how it should be viewed. So it would be very convenient if ENIAC was the first example of debugging because it was somehow uniquely qualified for the task. That just leaves a good taste in your mouth. Now, finding sources that reference debugging computers prior to 1944 is a challenge, it turns out.

Starting point is 00:35:42 Weird, I know. The number of fully electronic digital computers that predate ENIAC are... well, there just aren't many. The other complication is that for our debugging search, I really want primary sources from as close to the time period as possible. I want the word debugging on a sheet of paper before 1944, basically. I want to find proof of the word, or at least something really close, being used to describe the specific activity in regards to computers. This is one of those situations where my particular requirements kind of make my life hard on myself.

Starting point is 00:36:22 This leaves us with really slim pickings. So slim, in fact, that I think we can pretty definitively say that the modern idea of debugging, at least with breakpoints and examining values, started on ENIAC. It's a bit of a fool's errand to try and explain a lack of evidence, but let me at least lay out my thought process here. evidence, but let me at least lay out my thought process here. We can basically run down a list of pre-INIAC computers looking for something like breakpoint debugging. We have the Harvard Mark I, which we mentioned earlier in passing. There is the April 1944 log entry that states IBM technicians helped deal with some bugs in the machine, but it doesn't say debug. I think the context here also makes it pretty clear that this wasn't some type of issue with software.

Starting point is 00:37:15 This was a hardware issue that cropped up as the machine was being set up. So this isn't regular debugging as some part of the practice of programming. Going back further, we get to the era of machines like the At-Nassoff-Berry computer, or Zeus's computers, Colossus, and any number of early machines built at Bell Labs. In my searching, I can't find any early reports on these machines that use the word bug or debug or describe some systematic approach to dealing with errors in any way. Some of this is due to lack of digitized primary sourcing, and some to language barriers, so I admit, there might be sources that I just can't access. Breaking out of the extremely early digital era, we get to analog machines.

Starting point is 00:38:03 Now, I initially thought this may be fertile ground for my search. ENIAC's interface is very similar to analog computers. In these early systems, programs, if you can call them that, were set up by reconfiguring how the machine was built. This could take the form of changing gear ratios, moving pulleys, or swapping patch cables, even. gear ratios, moving pulleys, or swapping patch cables even. Each component of the machine performed its own task, so programs were built up by combining those distinct components. Even more enticing is the fact that some of ENIAC's programmers had previously worked with analog computers. Now, I think this puts us in the most interesting territory by far. Once again, I have yet to find any source that talks about how errors with analog computers

Starting point is 00:38:50 were dealt with prior to 1944. There are reported issues with mechanical components due to breakdown. Gears and pulleys would wear out, bands would need to be replaced on occasion, but that's all upkeep and maintenance. That's not exactly debugging. Once we hit analog machines, we're out of recognizable territory, at least as far as I'm concerned. Sure, ENIAC's weird, but it still has most of the traits of a digital computer. Analog machines didn't work in steps in the same way that digital machines do. Operations were continuous on a spectrum of rotations or translations.

Starting point is 00:39:31 Analog machines were not discrete. I know we're getting into the weeds here, but I just want to fully establish that we don't get to a point where breakpoints make sense without some level of discrete operations. where breakpoints make sense without some level of discrete operations. Inspecting values at a specific step just doesn't make sense if there are not specific steps. At least for me, I'm pretty convinced that debugging in the recognizable sense started on ENIAC. That invention was partly thanks to the high strangeness of ENIAC's design. We've seen the birth of the bug.

Starting point is 00:40:12 Soon after computers hit the scene, we have the birth of debugging. And we're still debugging today. So what happened in the intervening years? Well, it all comes down to a slow process of adaptation. This starts basically as soon as more computers are built. The bottom line is that ENIAC's design just isn't very good. It was almost immediately superseded by stored program computers, and that's the model that we've stuck with ever since. A stored program computer does just what the name suggests. It runs programs that are stored in memory. Even within that general

Starting point is 00:40:45 prescription, there are some variations. The more popular convention is called the von Neumann architecture. These types of computers store data and executable code in the same memory space. Code and data are treated exactly the same. It's all just bits at some address in memory. The important piece here is that with a stored program computer, every instruction of a program resides at some unique address in memory. Same goes for every single variable you define. That's a world of differences away from how ENIAC was programmed. Gone are patch cables and discrete circuits. Instead, everything is data tucked away in nice little boxes. This might lead you to think that breakpoint debugging

Starting point is 00:41:32 a la ENIAC wouldn't be applicable, but that's not the case at all. Like I said earlier, once ENIAC's programming staff started breakpoints, the die was cast, as programmers still follow that ancient and mystical tradition. Of course, details did have to change to accommodate new technology. But maybe not as much as you'd initially think. In a stored program computer, a program will start executing at some set location in memory. We can just say that it starts at address 0 for simplicity's sake. After the operation in address 0 is executed, then the operation at the next valid address is fetched. This new operation at, say, address 1 is then executed. The flow is only broken when an instruction tells the computer to jump to some non-sequential address. Even then, the computer

Starting point is 00:42:26 is still running one instruction after another in a sequence. And just as with ENIAC, each of those steps is timed to the pulse of a central clock. We can take a look at EDZAC for a quick example. This was one of the first stored program computers coming into service in 1949. The big difference is in how data was inspected. ENIAC didn't really have memory in the conventional sense. It just had a smattering of registers and counters. EDZAC, on the other hand, did have normally addressable memory. There were too many addresses to have indicator lights for each bit, but there was a solution. A CRT display could be connected to EDZAC to display the contents of its memory

Starting point is 00:43:12 as a series of pixels. Very broadly speaking, this was the same process that was already practiced on ENIAC. Wait around for a bug, stop, step, and inspect. It takes a number of years before we see a significant change in debugging. The next big step in the evolutionary arms race was the so-called online debugger. Now, there are a few routes that we could take to examine this next evolutionary leap, and I've chosen to follow the path laid out by Tx0. The reason for this choice should be made clear eventually here. TX0, sometimes called TXO or TIXO, is itself a fascinating machine that I really need to devote some more time and coverage to. Built at MIT in the mid-1950s, TIXO was the

Starting point is 00:44:02 first fully transistorized computer. It was also one of the first machines to make heavy use of a text-based interface. Or, at least, it had a rudimentary text-based interface. Specifically speaking, Tixo was connected to a device called a flexorider, which worked something like a teletype. Programmers could type at a keyboard and the computer could respond on a paper feed. At the time, those programmers would be working with assembly language. I've been talking about assembly on the show a lot lately, I guess. I've been writing a lot in the language, so it's kind of been on my mind. Unlike higher-level languages such as C, assembly just gives some handy mnemonics to machine code.

Starting point is 00:44:48 Each line of assembly language can be translated directly into machine code instructions. But there are two exceptions to this rule. Comments and labels. Comments are just dropped. That should speak for itself. A computer doesn't really have an equivalent to, to do, finish my program. Labels are the part that really matters here. In most assembly languages, a label is just a way to name a chunk of your program. These are usually used to label chunks

Starting point is 00:45:19 of memory you plan to use for variables, or chunks of code that you can later call or jump to. As your program is assembled, all references to labels are translated into memory addresses, and the actual label names are dropped. Once again, a computer doesn't really have a point of reference for what some label like, my big fancy function, actually means. The bottom line is that assembly language and machine code are almost interchangeable. The only big difference comes down to labels. Any variable or function names have no meaning to a big bad machine. That's lame human stuff. This disparity between human and machine opened up a new frontier for debuggers to work towards.

Starting point is 00:46:06 Almost as soon as Tixo became operational, we're talking 1956 or so, programmers started working up new tools for their new machine. Understandably, those tools made pretty aggressive use of Tixo's FlexoWriter. Why wouldn't you? It's a really cool new interface. These tools were thrown onto a series of paper tapes that were called utility tapes. This was usually abbreviated as just UT followed by a number. I haven't seen much reference to UT1 or UT2 or maybe even a UT0, but UT3 would make a little legacy for itself. You see, UT3 contained a very early online debugger. Now, online in the 1950s doesn't mean the same thing as online today.

Starting point is 00:46:59 We aren't talking some secret 1950s-era internet. In this case, an online debugger just means that it runs on the computer you are debugging. A more updated phrasing would be it's a software debugger. But online is the period-accurate term. Anyway, I've kind of given away the big deal with UT3. It was one of the first software-based debuggers. Computers have generally retained the ability to run in single-step mode, just like ENIAC. However, over time, that's become relegated to more low-level debugging or diagnostic work. In general, halting the entire computer to debug

Starting point is 00:47:39 all software fell out of fashion. It's inconvenient, it's limiting, and also it's kind of a party foul. It's not cool to halt a machine. They don't like that. But perhaps most importantly, when you're debugging, you're already sitting in front of one of the most powerful tools ever imagined by humanity. Bringing that beast to a halt means you're wasting precious cycles. So software debugging solutions started to become more in vogue. Why not put that beast to a proper use? UT3 is unique because it makes aggressive use of TX0's fancy FlexoWriter. In really general terms, UT3 allowed a programmer to talk to Tixo about their program, but that conversation had to be kept within the rigid parameters of a computer. So let's say you want to debug your new assembly language program. First, you had to assemble it.

Starting point is 00:48:39 That's the conversion between your code and machine code instructions. Then you could sit down at Tixo, load in UT3, and then load in your bugful program. Of course, UT3 provided a utility to load in your code automatically, which made everything more convenient. The crucial part here is that at this point, your program is not running. UT3 is running the entire show. Since Tixo is based off the von Neumann architecture, we know that all data and code can be treated in the same way.

Starting point is 00:49:20 Code is just some type of data that happens to have meaning to the computer. This lets you pull some pretty slick tricks, and one of those happens to be debugging. You can't replug Tixo on the fly like with ENIAC, but UT3 lets you get pretty close. Once everything is loaded up, you can fire off your program from UT3. This actually passes control to your own code. That's the least useful mode here. The real meat of the operation is the trace command. This will allow you to roughly step through your program, although maybe I should use the term stomp. You can't go instruction by instruction. Instead, as a 1958 memo explained, quote, the program operations will be performed in an interpretive fashion until 79 transfers have been performed, end quote. Transfers are tixotoc for conditional jumps.

Starting point is 00:50:14 So while your program runs, UT3 keeps tabs on jumps. And once you hit that magic 79th jump, UT3 takes back control. I know, it's pretty awkward. In general, a program will have a lot of conditionals, but this would be annoying to deal with in practice. Anyway, once UT3 is back in control, we get to the actual useful part. You can keep executing your program, or you can start inspecting data. At your command, UT3 will dump out large regions of memory, or even single bytes in memory. You can check the value of processor registers, and even change register and memory values. This is all you need to figure out what your program was doing, and hopefully what was going wrong. The interface

Starting point is 00:51:05 here was also key, and it's a weird mix of user-friendly features and computer mumbo-jumbo. All memory locations had to be specified in octal. That's just the numbering system that Tixo used, so UT3 kind of had to follow convention. But since there's no label information, you can't easily pull up the location of your variables or functions. You just have to know the memory location of all your relevant data. If you aren't being careful, that's kind of like looking for a needle in a haystack. On the nice side, UT3 could actually disassemble code in real time. Recall that assembly language maps directly into machine code instructions. Well, UT3 had tables set up to map back the other way. If you were

Starting point is 00:51:53 curious what code you were looking at, UT3 would gladly turn the raw octal back into a snippet of assembly language. The final feature that made UT3 the complete package is that you were actually able to rewrite your software on the fly. UT3 let you override memory locations, which meant you could rewrite chunks of the very program you were debugging. Say, you find a subtract where you should have really had an add. You can splice that change in right then and there. Wrapping everything together, UT3 was also able to punch your modified program onto fresh paper tape. This feature set made for a pretty darn robust platform for debugging, but it definitely had room for improvement. The biggest sticking point was, by far, the issue of labels.

Starting point is 00:52:42 UT3 really did have a pretty spiffy interface for the time. I mean, you got to type instead of flipping switches. But a programmer still had to be able to speak in octal. You had to get down on the computer's level. The other side issue was breakpoints. Trace was a powerful tool for stepping through a program, but it only went so far. You just had to hope that the 79th jump was nearby your bug, or pad things out with extra jumps. The bottom line was that UT3 suffered from a lack of fine-grained controls. That was just the status quo on Tixo for a number of years. But by early 1960, UT3 became totally obsolete. In its place stepped the first symbolic debugger. This new program was developed by Jack Dennis and Thomas Stockham, two MIT

Starting point is 00:53:35 computer scientists who were working with TX0 at the time. The fun here starts off with the name. As Dennis recalls, quote, in the 50s, a substance called FLIT was used regularly around the house to get rid of flies. Thomas Stockham and I called the debugger program we wrote for TX0 FLIT, which meant FlexoWriter Interrogation Tape, end quote. What we have on our hands here is a digital bug killer named after a more physical equivalent. Perhaps not the most subtle approach, but I think we can all appreciate it. Before we move on, I want to call some attention to the I in FLIT. Interrogation.

Starting point is 00:54:19 Documents describe using both FLIT and UT3 to carry out a conversation with the computer via the FlexoWriter. For the time period, that was a new and exciting approach. In the docs for UT3, this is described as interpretive programming, but FLIT calls it an interrogation. I just really like the implication made by this name. The implicit coercion in play is a nice touch. It's framing the computer as a non-compliant party that's holding back some crucial secrets. I gotta say, I too would sometimes like to interrogate my own computer. Anyway, generally speaking, Flit does everything UT3 did, plus much more. It's that much more that brings us a lot closer to modern debugging tools.

Starting point is 00:55:10 FLIT was a symbolic debugger, meaning for the first time, you could talk about memory locations in terms of labels instead of numbers. I can't underline how big a deal this is. I can't underline how big a deal this is. Previously in UT3, you had to just know where all your variables were in memory. This meant either calculating that ahead of time, hunting and pecking for some distinct chunk of data, or keeping all your variables in a specific region of memory. With Flit, you could just ask for the value of x, or y, or widget price, or whatever, and the debugger would make it so. It sounds like magic, right?

Starting point is 00:55:51 Well, it was all possible thanks to the magic of having more memory around. Around the same time period, Tixo was expanded from its initial 4 kilobytes of RAM up to a more comfy 8K. That may not sound like a lot, but it gave programmers breathing room to start playing with more data. So Dennis and Stockham decided to see what they could do about a larger, more powerful debugger. The core of what made Flit so special was a little thing called a symbol table. This was a table that mapped memory locations to clear text names. There were some restrictions. Symbol names could only be three letters, but hey, it's better than nothing. The trick here was in how the symbol table was populated. It was generated from the

Starting point is 00:56:39 labels you used in the program you were debugging. This was done using an updated assembler that could grab and output the memory addresses of labels. Instead of just casting labels aside, they were remixed down into this new symbol table format. The immediate effects of this little change should be pretty clear. Flit made debugging much easier. Programmers didn't have to go all the way down to raw addresses to find out what was going wrong. Now, I posit this must have had a secondary effect. I can't prove this without looking at post-flit and pre-flit programs, but I'm going to throw this out there anyway. Without a symbolic debugger, you have to just know what part of memory you're using for variables. One strategy for this, and one school of thought for assembly language programming in general,

Starting point is 00:57:32 is to put all your variables in the same place. You basically build up a section for code and a separate section just for data. This method is also often cited as best practice since it's just a tidy way to do stuff. With flit in play, you only need to know the name of your variable. Then you can interrogate it. This is a huge convenience, but it may have also opened up some options. It may speak more to my character, but the first thing I thought of when I was reading about this feature of flit was, cool, you can just put variables anywhere you want.

Starting point is 00:58:10 That makes me think that Flit may have given programmers more flexibility in how they handled variables in their programs. You no longer had to group variables, you could just put them wherever you wanted as long as you could remember them by a three-letter name. Did this lead to messy code? Maybe. I have no way of telling without a whole lot of work, but I know that my sick, twisted mind immediately saw the more loose programming style that Flit could have helped support. But, of course, there were more concrete downsides to Flit. The other fantastic feature of this new debugger was software-based single-step tracing and breakpointing. UT3 could trace your code in huge stomps, while Flit was able to go one instruction at a time. Better still,

Starting point is 00:59:00 you could place a breakpoint at a specific address. Debugging was finally catching back up to what ENIAC was capable of, and it only took over a decade. The downside was that Flit was resource intensive. I mean, to use a debugger, you were basically running two programs at once. It wasn't quite multitasking, but it was in a similar vein. Intercepting and stepping through instructions made operations considerably slower. You were tacking on cycles every time you ran code. FLIT also took up memory. The debugger was loaded into the last few kilobytes of RAM, along with the symbol table. The 1960 version took up about 4 kilobytes. This made Flit unsuitable for certain programs. If the program you wanted to debug was more than 4k, then you were just out of luck. There wasn't enough room

Starting point is 00:59:52 for Flit and your code to co-mingle on the machine. If you were running really complex calculations, you'd also face issues. Flit slowed down operations, so it just could take too long to get up to the section you needed to debug. This is just a fact of life for debuggers. They're running alongside your program, so you need to have extra resources. Despite those caveats, we have arrived at the complete package. Flit was the first debugger to put it all together, and we've followed the general formula ever since. Breakpoints, single-stepping, and symbolic inspection. The wild thing is that most of those techniques were already possible using ENIAC. It would just take a little while for more general and safe solutions to come around.

Starting point is 01:00:41 solutions to come around. Alright, that brings us to the close of our discussion of bugs, where to find them, and how to stop them. The idea of the bug as some kind of error or mistake or glitch far predates computers. But unlike other loanwords that us technomancers have adopted, I think bug really does fit. I think everyone has ran into some walking specter, imagined or not. Bugs appeared in computers as soon as, well, as soon as computers started running. Debugging came very shortly thereafter. If I had to give a hard time frame, I'd wager that debugging methods started being developed a matter of minutes after the first bug was identified.

Starting point is 01:01:32 And that's part of what makes talking about bugs and debugging so difficult. Bugs are mundane, almost ephemeral. Programmers joke and complain about bugs because we've all seen more bugs than we can count. Most don't even bear mentioning or remembering in gritty detail. Most programmers don't even want to remember their bugs. The early history of debugging fits this same mold. The practice was there as just part of any work around computing. Of course there were issues, and of course there were strategies to deal with these issues. Once we get to ENIAC, which is admittedly really early in the grand computer chronology, we start to see familiar debugging practices emerge.

Starting point is 01:02:20 The specifics change, software comes into the picture in a much larger way, but debugging in the 21st century still follows the path laid out on ENIAC. If you ask me, that's a pretty frightening legacy. Now, before I sign off, I want to air out a small programming note. This isn't the last episode of Spook Month for 2021. My next regularly scheduled episode would fall on November 1st. And, you know me, that's just too close to October for me to let that stand. So I'm going to try and get that episode out a day early so it falls on Halloween.

Starting point is 01:03:00 Until then, thanks for listening to Advent of Computing. I'll be back soon with another piece of computing's past. And hey, if you like the show, there are now a few ways you can support it. If you know someone else who'd like the history of computing, then why not take a minute to share the show with them? You can also rate and review on Apple Podcasts. And if you want to be a super fan, then you can support the show through Advent of Computing merch

Starting point is 01:03:22 or signing up as a patron on Patreon. Patrons get early access to episodes, polls for the direction of the show, and bonus content. You can find links to everything on my website, adventofcomputing.com. If you have any comments or suggestions for a future episode, then go ahead and shoot me a tweet. I'm at Advent of Comp on Twitter. And as always, have a great rest of your day.

Your Ad Here

Advent of Computing - Episode 67 - Debug!

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.