Advent of Computing - Episode 117 - What's in a Byte?

Starting point is 00:00:00 I've been told that 8 is a weird number. Now, personally, I don't buy that. Eight seems very natural to me. It's a nice round number in multiple ways, actually. But my friends and non-digital colleagues insist that 8 is not normal. It's not a multiple of 5. It's also not a multiple of 10. It also looks kinda awkward if you count it out on your

Starting point is 00:00:26 fingers. Plus, it's hard to count by. Now, I insist you try this. This is the point where I started to maybe see the light a little bit. So try counting by fives and then try counting by eights. I end up going eight and then 16 and then I have to stop and do some math. So, yeah, maybe 8 is a little bit unnatural. This is one of those interesting little tidbits that I really enjoy. I personally spend a lot of time thinking about my digital assumptions. That's partly why I like producing Advent of Computing. It gives me a good reason to stop and think about why the computerized world is a certain way. I've heard it said that

Starting point is 00:01:12 programmers have to twist their minds to work with computers. This whole eight-based argument is perhaps a symptom of this larger disease. Computers, on their very basic level, are alien things. The digital world is, as the name suggests, a whole other world. A programmer, thus twisted into this realm, sees eights everywhere and doesn't bat an eye. On the other hand, an uninitiated sees the same sea of eights and, perhaps, recoils. I don't think I'm qualified to entirely uncoil a programmer's mind. As I've said, I'm already afflicted. I see most things through a thick pair of binary glasses. But maybe I can uncoil a small component.

Starting point is 00:02:00 Maybe we can figure out why eight is a lucky number for programmers. Welcome back to Advent of Computing. I'm your host, Sean Haas, and this is episode 117, What's in a Byte? This entire episode is going to revolve around one simple question. Why do we use 8-bit bytes? This, as with many of my simple questions, turns out to be a pretty expansive topic. Before I get into things, I want to give some context as far as where things stand in the 21st century. Most processors that we use today are in the x86 family. This just means that the bulk of CPUs all follow the same basic design and can all run the same code. There are two central parts to this design, memory and registers. Memory is perhaps the easiest part to explain.

Starting point is 00:03:01 In any computer, you need a place to store information that you want to work with. That's the role of memory. It's this big pile of bits that a program can access and manipulate. Memory in the x86 architecture is addressable, meaning that every piece of memory can be called up by some numeric address. Think of it like a call number in a library. Each of these addressable chunks is a byte. That's the smallest addressable unit of data in the x86 world. We can further subdivide this. Each byte is represented by eight binary digits. That means that you have eight slots for either ones or zeros to fit into. Registers are another form of memory, but they're more specialized. Registers sit inside the processor itself, and they're used for actual operations. If you're adding numbers together, then you have to store your numbers in registers. Or,

Starting point is 00:03:57 at least you have to store one number in a register. In the x86 architecture, registers are 16 bits wide. Those are composed of two bytes. You might hear it said that the x86 architecture has a 16-bit word. A word being a small unit of data that the processor operates on. Sometimes it's discussed as the natural size the processor wants to work with. So it's plain to see here that 8 is a very important number. The entire x86 architecture, which the world is based off of, works 8 bits at a time. But the question is, why that number? Why 8? Why not 4 or 16 or 24 or really any number for that matter. This is where our journey begins. In this episode,

Starting point is 00:04:49 we're following the trail of bits. I'm going to start by working through the history of how we talk about data, since that's going to be really important once we get to the byte itself. And I assure you that the byte was actually invented. The idea of a byte is actually pretty new compared to the computer. That's something that I wouldn't have bet on, personally. As we build up to the byte itself, I want to examine why 8 has become such an important number. By the end of this episode, I hope that we have a better understanding of how 8-bits became so popular, and why it's in dirt. Is it a good solution? Is it just tradition? Or is it some mix of the two? Before we get into the episode proper, I have a quick announcement to make.

Starting point is 00:05:39 This year, on November 4th, I'm going to be speaking at the Intelligent Speech Conference. November 4th, I'm going to be speaking at the Intelligent Speech Conference. This is an online conference where podcasters share talks under a certain theme, and the theme this year is contingency, or backup plans. I'm personally kind of excited for that because one of my favorite stories of all time has to do with Intel and some poor planning and backup plans they made. So I'm planning to be talking about a certain ship. Now, the conference is on November 4th, 2023. It is a paid-for event. Tickets currently are $20 until October 1st. Then they're going to jump to $30.

Starting point is 00:06:19 So if that sounds interesting, then I recommend getting a ticket. The lineup this year looks pretty good. sounds interesting, then I recommend getting a ticket. The lineup this year looks pretty good. To see the schedule and get tickets, you can go to intelligentspeechonline.com. Now, with that said, let's get into the episode itself. I will warn you, this episode is going to be a bit of a ramble. I initially wrote that we'd kick things off with the classic computer science corner, but in reality, most of this episode is going to be classic computer science corner. Now, I think the best place for us to start is by building up an understanding of how

Starting point is 00:06:56 computers encode numbers. From there, we're going to keep building and building and building. Historically speaking, numeric representation has been a bit of a wild west. The outline I gave at the top is only applicable to pretty modern computers, and by pretty modern, I mean computers derived from the Intel 8086, which itself is actually a pretty old chip, but I'm getting a little off track. Some very early computers stored numbers as decimal values. You know, base 10, the numbers between 0 and 9. Normal, human-readable kind of things. There's no reason you can't do this. On the surface,

Starting point is 00:07:39 it might even sound kind of nice. You can inspect readouts very easily this way. might even sound kind of nice. You can inspect readouts very easily this way. But the decimal setup quickly runs into problems. It's actually much easier, automation-wise, to store data in binary. As in, base 2, 1 and 0, on and off, true or false, however you want to call it. By working with binary values, you can use binary logic gates, which automatically loops you into this huge existing body of logic theory. You get piles and piles of papers going back to time immemorial. A voltage on a wire can be mapped out to true. A lack of voltage can be mapped into false. and then you just pull out one of those dusty old papers and you can build complex logic.

Starting point is 00:08:28 Binary is a very natural way to bridge the gap between logic theory and electronics. It kind of just works, there's really no better way to put it. So why did early machines use decimal? Or rather, why did some early machines use decimal? Or rather, why did some early machines use decimal? The ABC, the Atanasov-Berry computer, is, legally speaking, the first electronic digital computer in the world. That machine used binary internally. So technically, things get off on the right foot. As far as digital offenders, well, we can go to the usual dead horse. ENIAC stored data as 10-digit decimal numbers. The issue is that these numbers had to be

Starting point is 00:09:14 operated on with specialized circuits. Instead of using fancy binary circuits, ENIAC operated on registers using more, well, arcane means, let's just say. If given enough time, this could have been developed into a more elegant art, but the widespread adoption of binary made decimal computers kind of an evolutionary dead end. Just as a quick aside, I do want to throw in some unconfirmed speculation. ENIAC was built around these 10-digit numbers. Why 10? One theory goes that 10 was chosen because that's how many digits a standard desktop calculator of that era could handle. I think there is some weight behind this idea.

Starting point is 00:10:02 ENIAC was, in very large part, designed as an automatic mockery of contemporary desktop calculators. When you get down to it, ENIAC was basically a supercharged and programmable adding machine. And it is true that many desktop calculators in the 40s had 10 digit registers inside them. No good evidence for causality exists, but it's a neat theory to point out. Okay, so digression aside, once we disregard some oddities, we arrive at binary. In this encoding scheme, numbers are represented as groups of 1s and 0s. Now, this is just one way to represent a number. The same number can be represented in base 10 or base 2 or base 1 billion if you want it. Whatever encoding you choose,

Starting point is 00:10:54 you're still dealing with the same number. So let's take the number 3 as an example. In base 10, our friendly decimal encoding, 3 is, well, it's right there, it's 3. That's pretty cut and dry. In binary, 3 would be represented as 1, 1. This is, also, pretty cut and dry. At least once you know how it works. When we write out numbers in decimal, we start in the ones place, then we go to the tens place, then the hundreds place, and so on until we have the full number written out.

Starting point is 00:11:30 Binary works the same way, but instead of the one, ten, hundreds, we have the ones place, the twos place, the fours places, the eights place, and so on. So the binary number one, one is just saying we have a value of 1 in the 2's place and 1 in the 1's place. In other words, 2 plus 1, which is 3. Nice and sleazy, right? That's simple to think about for small numbers. 5 is 1, 0, 1, 8 is 1, 0, 0, 0, and 9 is 1, 0, 0, 1. But what if we want to represent a big decimal number in binary? Like, for instance, 10.

Starting point is 00:12:13 That's pretty big as far as numbers go. That may be one of the largest numbers I can think about. One way would be to just, you know, use binary like a normal computer person would. 10 in decimal is 1010 in binary. Done. Simple. Cut and dry. Another option is the so-called binary coded decimal, or BCD. This is, to put it simply, a very hateful thing that humans have inflicted on computers. In BCD, you represent each decimal digit of a number as a separate binary number. In this scheme, 10 in decimal would become 0001 0000. zero zero one zero one zero one. I may have summoned some kind of daemon doing that. Now, BCD is a weird compromise format. It's somewhere between

Starting point is 00:13:16 human decimal numbers and computer binary numbers. If you can count to nine in binary, then you can use BCD no problem. That said, you kind of lose a lot of elegance doing this. And, well, a lot of space. The bottom line is that BCD is wasteful. To represent decimal numbers between 0 and 9, you need 4 bits. 4 ones and zeros. But here's the thing.

Starting point is 00:13:51 9 in decimal is 1001 in binary. That leaves some wasted bits. With 4 bits, you can actually count all the way up to 16. I'm sure you could work out how much you're wasting by using BCD, but I'm not in the mood to run number theory stuff. BCD, but I'm not in the mood to run number theory stuff. Just know that BCD is a waste of space. You're always better off using binary instead of trying to warp it into some human readable gunk. That said, BCD does do something interesting. It works by breaking up binary data into groups of digits. Instead of an unending stream of ones and zeros, BCD represents four-digit chunks of data. Each chunk can stand alone or be read in relation to other chunks. Breaking numbers up into smaller chunks is something that we do naturally. Take, for instance, the humble comma.

Starting point is 00:14:47 Us humans tend to break larger numbers up into these three-digit groups. You don't write 10,000 without any commas. That looks weird. We would more often write 10, and then three zeros to represent that number, 10,000. Depending on your region, that separator might be a comma, space, or a period, but the idea is the same. In this instance, we're breaking up a large number to make it more readable, more manageable. One way we can look at BCD is a binary rendition of this decimal quirk.

Starting point is 00:15:21 It's a means to naturally group numbers. I have my problem with how the format works in practice, but it does have its uses. It's a pretty humanistic way to group binary data. It makes this in-human format a little more sensical to deal with. You may be asking yourself at this point, why have I been taking you on this BCD diversion? Well, I think this can serve as a touchstone as we travel deeper into the bowels of the computer. As we start dealing with honest to goodness machines, we need ways to manage data, which means we need ways to manage binary numbers. You can't really get away with using a million bits of storage for one huge number.

Starting point is 00:16:06 That's not very useful. BCD is one of the most easily understood ways to break up binary data into more manageable chunks. Simply portion off your storage into 4-bit regions and treat those regions as digits. Done. From here, we can start a few lines of inquiry. I will hazard that this is going to come down to a lot of boring etymology stuff. So first off, I need to address the whole bit thing. This is really basic stuff, but I think we should nail this one down before working up to larger

Starting point is 00:16:39 data structures. By bit, I mean a single binary digit. A bit can either be 1 or 0, which is electronically represented by on or off, and logically it's true or false. You know, binary. But where does that name come from? The term first appears in print in 1948 in a paper by Claude Shannon. first appears in print in 1948 in a paper by Claude Shannon. That paper, A Mathematical Theory of Communication, isn't actually about computing at all. Instead, it's about signaling and data representation.

Starting point is 00:17:17 In part, this paper is concerned with pulse code modulation, which is something I always like to see. In the opening page, Shannon is explaining that different numerical bases can be used for encoding messages. The choice of base will have an impact on how the message can be propagated and handled. To quote, the choice of a logarithmic base corresponding to the choice of a unit for measuring information. If the base 2 is used, the resulting units may be called binary digits, or, more briefly, bits, a word suggested by J.W. Tukey. A device with two stable positions, such as a relay or a flip-flop circuit, can store one bit of information. End quote. Two things to note here. Bit is actually presented as a portmanteau of binary and digit. Also, Shannon is citing a colleague, so bit may be an older term than 1948.

Starting point is 00:18:16 The usual theory I've seen goes something like this. John Tukey, one of Shannon's colleagues at Bell, used to work with John von Neumann. At some point, supposedly in 1946, Tukey suggested the portmanteau. This is a nice little story. It ties the origin of the bit to the early age of computing, and it even includes our main man Johnny. He is always nearby, after all. Now, here's the thing about this. It's actually really hard for me to validate this theory. There isn't some chain of public letters where

Starting point is 00:18:54 Tukey and von Neumann are cracking jokes about bits. What I can verify is that the term binary digit was in use during this period. I can actually find this phrase in print as early as 1942. Even newspapers are talking about binary digits by 1947. Perhaps most relevant is the fact that the first draft of a report for EDVAC, a report written by von Neumann himself, uses the term binary digit to talk about, well, binary digits. So von Neumann was using the phrase. That side of the story seems plausible then. I just want to leave a little space for other possibilities here, since I don't have rock-solid evidence for the

Starting point is 00:19:41 origin of bit, especially since, you know, bit is just a very common word. It could just as easily have been used since each binary digit is just a little bit of data, you know, just a little guy. That's probably enough about the bit, so let's kick things up to the next level. As we saw with BCD, it can be very useful to group bits together. As we saw with BCD, it can be very useful to group bits together. So, what should we call these groups of bits? We can look at the historic context for this. At this point, we get to jump from the archaic machines of the early 40s all the way up to the weird and wild machines, the futuristic computers of 1946.

Starting point is 00:20:24 Futuristic Computers of 1946. By that, I mean we need to turn to the earliest writings on stored program computers. You know, computers with actual memory. A good place to start would, perhaps, be the first draft of a report for EDVAC. At least, you'd be tempted to think that if you hadn't read the paper. This is, full stop, the first paper to describe a stored program computer. It's of huge historic importance, and it sets dozens of precedents. But it's also kind of strange.

Starting point is 00:21:01 It's very much an early paper. von Neumann had a particular way with words. In the draft report, he attempts to explain a computer using a biological analogy. He calls different components of machine organs. He explains connections as neurons. He even sometimes calls binary digits stimuli. So it should come as no surprise that von Neumann's choice of words is a little weird. That said, the EDVAC paper does describe a grouping of binary digits. Von Neumann most often uses the term unit as a synonym for binary digit. He calls a grouping of 32 units a, quote, minor cycle. Now, perhaps this sounds a little silly to our modern ears. I'd personally be a little tickled if someone showed me a computer with 4,000 minor cycles of memory. Maybe that would be better termed for kilominer cycles at that point. From my reading,

Starting point is 00:22:08 it seems that von Neumann is using this minor cycle term because he was primarily discussing delay line memory. This gets a little weird to think about. At this point, there wasn't a real programmable computer with any kind of practical memory. There were some computer-like machines that used recirculating or delay line memory. I think there were a few IBM punch card machines that had drum memories, and some radar systems were using very early signal delays that looked like memory, at least if you squint. Von Neumann does hedge a little, saying that memory doesn't necessarily have to come in the form of a delay line, but his language gives himself away. The cycle part fits this idea.

Starting point is 00:22:55 Delay memory works by shunting bits off into something. Early machines like EDZAC used tubes of hot mercury and sent in bits as acoustic waves. Some later machines used drums coated in magnetizable materials. Either way, you end up with these time-dependent forms of memory. You have to wait for waves to propagate through mercury, or for a drum to spin. Once that bit hits the other end, or the drum spins to the right position, you get your bit back. In that context, calling a group of bits some kind of cycle makes a lot of sense. It's a very physical way to describe data. When am I going to get my chunk of information? Well, you have to

Starting point is 00:23:41 wait for the proper minor cycle. I think that just kind of works. That addresses some of the confusion here, but it leaves another fun bit of weirdness. Why 32 bits? I think this is a good place to address a misconception. All computers weren't very capable. I mean, they were a lot better than mechanical calculators, but EDVAC can't really hold a candle to even the most weak modern machine. I bring this up because sometimes bittedness, the bit size of a machine, or whatever you want to call it, can be confused for a measure of a machine's power or performance.

Starting point is 00:24:24 I've even fallen for this trap myself. I remember looking at a new computer at Fry's back in the day with a big 64-bit AMD sticker on it and, pretty quickly, checking my wallet to see how much cash I was carrying. The version of EDVAC described by von Neumann is 32 bits, insofar as that's the size of its minor cycles. That's how many bits EDVAC works with at a time. More specifically, that's the size of operands that EDVAC's math circuits work on. When you tell that computer to add two numbers, it expects two 32-bit numbers, two minor cycles. EDVAC's memory is designed to match that.

Starting point is 00:25:10 It stores data as 32-bit chunks, as minor cycles. But, once again, this is a really early computer, so things act a little unexpectedly. This is going to be a small aside, but I think it's interesting to note. EdBack technically only had 30-bit long numbers. The extra 2 bits were reserved for rounding after multiplication and division operations, since those could result in longer numbers. That means that, in practice, a minor cycle had 30 usable bits. So, nitpicking aside, why 30? Partly, it was just a guess at a reasonable size. Von Neumann admits that he's open to change, or rather, the ENIAC team that developed EDVAC is open to change. 30 is just a starting point for discussion.

Starting point is 00:26:06 He explains that 30 is based off his experience with differential equations. This would have been informed by work done on the Manhattan Project. Von Neumann defines a standard number in heavy quotes as having eight significant digits. In other words, a number with up to eight digits after the decimal place. That can be represented by a binary number with 27 bits, and von Neumann rounds that up to 32, get an extra significant digit. Add 2 for rounding, and you have the nice 32-bit minor cycle. I want to highlight one other thing that von Neumann kind of glosses over.

Starting point is 00:26:47 EDVAC uses a few different ways to represent numbers. Von Neumann mainly discusses a form of floating point representation that uses exponents. That's cool, but I want to look at the simple integer case. 30 bits is the minimum required to represent a 10 decimal digit number. Do with that what you will. The EDVAC report, through a strange twist of fate, ends up becoming the blueprint for many of the earliest stored program computers. Personally, I believe this is just because the draft report was really the first full and publicly accessible description of a computer. This is where we reach the next stage of our story. Thankfully, the minor cycle name is dropped.

Starting point is 00:27:35 In its place, something a little more confusing is adopted. This is the point where I get to introduce something frustrating. The word is word. That's spelled W-O-R-D. During the early era of computing, let's just call that the late 40s into the early 50s, this is the term used to describe a grouping of bits. More specifically, a computer's memory in this period is subdivided into words.

Starting point is 00:28:10 So you run into this fun shuffle where an old paper will say that such and such computer has four kilowords of memory. Word size wasn't standardized in this period, so a word on one computer might be 17 bits, on another it might be 24 or even 64 bits. In general, this word size is matched up to register size. In other words, it works the same as von Neumann's minor cycles. For instance, take the Manchester Baby. This was actually the first operational stored program computer. It had memory where a program could be stored and it could run code from that memory. The baby became operational in 1948.

Starting point is 00:28:52 It was a 32-bit Word computer, just as prescribed in the EDVAC report. This meant that each address in the baby's memory was 32 bits wide. Its internal registers were also 32 bits wide. In fact, the Manchester Baby gives us the first use of the term word for describing a group of bits in a computer. This comes from a 1948 letter to the journal Nature. At this point, computer science didn't really exist as a field, and we don't have anything like a comp sci journal, so articles on computers just kind of show up wherever. Part of this letter discusses the so-called data store of the Manchester baby. That includes registers and memory. Quote, the capacity of the store is at present only 32, quote, words, each of 31

Starting point is 00:29:48 binary digits to hold instructions, data, and working. End quote. Disregard the 31 thing here, either the baby changed to 32 bits after the letter, or it may have been a case where an extra bit wasn't actually accessible at that time. The point is the term word, and the fact that the letter puts it in quotes, as in quote-unquote words. Come along with me as I get a little unhinged trying to figure this out. Here we have one of the first published scraps of information on the first functioning stored program computer. The first computer that we could recognize as part of our current lineage of machines. This letter, this informal note sent off to nature, uses words to describe groups of bits. But crucially, that term's in quotes. The term isn't defined explicitly, it's just

Starting point is 00:30:48 used with scare quotes around it. In my head, that says something about how new the term is in 1948. My gut feeling is that word was new, but it was already seeing some kind of use, hence the lack of an explicit definition. That's the start of the trail, but where does this one lead? The most immediate thread comes from the memory storage methods used on the baby. This computer employed a form of CRT-based memory called a Williams-Kilburn tube, more commonly just referred to as a Williams tube. In fact, the letter to nature was co-authored by the Williams and Kilburn in question,

Starting point is 00:31:32 so this looks like a good lead. Through a little searching, we can find a paper published in 1948 by the same duo, titled A Storage System for Use with Binary Digital Computing Machines. It's a full description of the tube-based memory used in the baby. The specifics are a little out of scope for the discussion today. I've explained these tubes in past episodes on memory, so you can check out episode 77 in the archive for more details. In short, these tubes store binary data as a grid of charges on the face of a cathode ray tube. It allows for true random data access, and it's relatively reliable, if a little low on density. What matters for us is how data is organized on these tubes. Each tube is, for all intents and purposes, just a CRT display. During normal operation,

Starting point is 00:32:27 a lid is kept over the CRT for a number of reasons, but you can actually flip up that lid to inspect memory. When you do, you see a uniform grid. At each location of the grid is either a dot or a dash. A dot represents a 1 and a dash represents a 0. The 49 paper describes this grid as being 32 dots wide by 32 dots tall. To borrow from its wording, you have a grid of 32 lines, each being 32 binary digits wide. The paper calls these lines words. Now, to set up the actual passage that defines what a word is, I have to give a little more context. The tube paper describes storage as part of a larger computer system, but there is some antiquated language at play. Or, at least, the paper is written for a lay audience. At this point, almost any audience would be of lay people, I guess.

Starting point is 00:33:29 Anyway, the passage refers to a program as a table. I think that's because of the storage context. In practice, a chunk of memory would look like a table. Here's a summary of the Williams-Kilburn rendition of memory. to another. To every address, a digit combination will be assigned, so that an instruction will consist of two digit combinations and is indistinguishable from a number in appearance. Instructions and numbers, which are collectively termed, quote, words, are therefore similar, the only difference between them being their function in the machine, end quote. This uses the term address, in quotes, to refer to discrete locations in memory. Each of these discrete locations holds

Starting point is 00:34:33 a, quote, word, also in quotes. Now, here's the fun wrinkle. This paper is sent to print in 1948. It's part of a larger paper trail surrounding the Williams-Kilburn tube. In 1946, scant months after the EDVAC paper leaks, William and Kilburn submit a patent application for their memory device. That patent does not use the term word. So either the term word gets coined by Williams and Kilburn sometime between 1946 and 48, or it's some older term that I have no hope of ever sourcing. I'm going to assume option one, that the term is coined in the late 40s by Williams and Kilburn. I like this option because it saves me from, you know, trying to scour more

Starting point is 00:35:25 data, and it also gives us a possible story. Now, this is ultra speculation, so hold on to your pants. On these memory tubes, each row is called a line. By the time the Manchester Baby is up and running, each line is 32 bits wide, the word size of the machine. But during development, these memory tubes had different sized lines. William and Kilburn tried out a number of different line lengths before settling on 32-bit. The Manchester Baby also went through at least one word size change. In that letter to Nature, it was describing a 31-bit word, but the final machine used 32-bit words. That's, of course, assuming that change was intentional and not some technical issue that got sorted out.

Starting point is 00:36:17 So here's my theory. At some point, a line on these memory tubes was subdivided into multiple groups. What would you call part of a line of data? Well, a word would make some kind of sense. Think of a line like a sentence, a line of text. Those lines are composed of words. I'd wager that the term was suggested and used internally. At some point, the word became 32 bits wide, which matched up with the width of a single line on a tube. The term then stuck around since it was handy and, more than likely, it's a little funny. It's a neat little joke.

Starting point is 00:36:59 This theory seems reasonable to me because we run into these kind of legacy terms all the time. I mean, in Linux, you still use the term teletype to refer to a virtual terminal. No one has seriously used a teletype in many decades at this point, but the term persists. I'd totally buy that word came up as a useful term during development of the Williams-Kilburn tube and then just stuck around. Now, this theory could be totally wrong. I mean, the term word is so commonplace as to be pretty much impossible to research. The only other inkling of an idea I have comes down to the phrase, quote, two-bit word. This is an old-timey

Starting point is 00:37:46 phrase used to refer to fancy terms, to big, cool words. It actually shows up in a lot of print material around the 1940s. So maybe there's some long-lost joke about a memory address being a 32-bit word? Ah? But I honestly have no idea. It can also be that word was just a natural choice for some reason I can't comprehend. The term very quickly enters the lexicon. I have big books from the 1950s, these big printed tomes, that are already using word to describe chunks of memory. By the end of the 50s, there are printed tomes that are already using Word to describe chunks of memory. By the end of the 50s, there are books that don't even index for Word, since it's just so common. If you did run an index, it would probably just be a listing of every page in the text. The term was in widespread use, but it had a slightly different meaning on every computer. There was no standard word size.

Starting point is 00:38:47 It depended on machine. EDVAC recommended a 32-bit word. The Manchester Baby followed that recommendation, but another machine named EDZAC used a 17-bit word. The IAS machine, often seen as the spiritual successor of the EDVAC report, well, that set a word length at 40 bits. What's infuriating is that folk talk about memory sciences in terms of words back in the day. So you might read that IBM was coming out with a cool new machine, and it came stock with 4 kilowords of memory. That means 4,000 words. But how much memory is that, really? Is it more or less than last year's Univac that supported 12 kilowords? How many bits are we

Starting point is 00:39:36 actually talking? Word size also breaks down once computers reach, well, honestly, any level of sophistication. It all makes good sense when each chunk of memory is one word, and each register is one word wide. You're only dealing with words, so it's a nice little term. This is especially the case during the first generation of machines. Here's something that I haven't really thought about before, but makes a whole lot of sense in retrospect. Long-time listeners will know that I often't really thought about before, but makes a whole lot of sense in retrospect. Long-time listeners will know that I often call memory the most complex part of a computer. Since day one, memory has been hard to build.

Starting point is 00:40:20 It's taken us decades and many failed attempts to create reasonable forms of computer memory. Now, let us consider the humble register. A register is actually just a small chunk of memory that's connected directly to a computer's math and logic circuits. It's the immediate working space of the machine. Here's something that never hit me until I was working on this episode. A register is itself a form of memory. That means that registers are actually subject to all the technical problems of computer memory. In a modern computer, registers are implemented totally separately from large storage memory. You have your fancy sticks of RAM that live in slots on your motherboard, while registers are quite literally etched into the CPU

Starting point is 00:41:05 itself. In these early days, however, these two forms of memory were linked. In fact, some early computers even call memory locations registers. So, add a little more confusion to the memory mix. Let me explain this a little more. The Manchester Baby makes for a good example here. That computer implemented registers as a few lines of storage on one of its Williams-Kilburn tubes. Thus, registers were really just another region of memory. The IBM 704, another contemporary machine, used drum memory as its data store, and registers were implemented as a special region on that drum. In these cases, the line between register and memory is blurred. This also means that word size between registers and memory was kept consistent.

Starting point is 00:42:01 In some cases, you might have a computer with registers that were two words wide, or could get fancy and have a register that's half a word wide, but no matter the setup, there was still a direct relation. It still made sense to talk about computers in terms of words because that was the core unit of data for everything. It wasn't just ideological, it was a very physical reality. If your registers are all composed of words, the same physical words that make up memory, then why would you need another term? You have a four-kiloword memory space and a one-word accumulator. It all falls in line very naturally as an extension of the computer's design. It all falls in line very naturally as an extension of the computer's design.

Starting point is 00:42:51 While this may be natural in some cases, it kind of falls apart in others. Not all computers used in-memory registers. Even amongst those early computers that mapped memory into deeper circuits, not all registers were nice, even words. The LGP-30 is an interesting example of this. That was another drum-based computer, like the IBM 704. The LG P30 stored data in memory as stripes running down a spinning drum. Registers were stored as a set of bands running around the drum.

Starting point is 00:43:26 So while the same physical device was used for both registers and memory, the implementation, the specific layout, differed. A word meant something different in different contexts here. Another example of where this breaks down is the UNIVAC-2. This computer had normal-ish memory. It was broken up into words. Registers were implemented as their own circuits. So there was no limitation on how registers could be composed. Some registers were one 36-bit word in size, but not all followed the convention. UNIVAC2 had at least one optional register that was 15 bits wide. In the documentation, what do you call that? Do you say that it's a 0.411 word register? There's another place where the word word loses its meaning. So far, we've only been discussing internal data representation. That is, how the computer stores its own numbers for its

Starting point is 00:44:26 own secret and perhaps evil uses. Let us now consider the flip side of the coin, input and output encoding. In other words, how the computer communicates with the lowly world outside. Any discussion here must start with the venerable punched card. Just to be clear, this is pre-computer technology. Punch cards as we know them date all the way back to the 1890s. IBM codified the card format in the early 20th century. That meant that somewhat standardized punch cards were well-established prior to the advent of computing. It's little wonder that as computers appeared, punch cards became a de facto method for reading in data and outputting results.

Starting point is 00:45:13 There are a number of different encoding schemes that have been used on punch cards over the years. At the most basic, you could just encode raw binary on a card, as simple as just punching a pattern of holes. That's workable, but it's not really what the IBM format was intended for. The premier encoding scheme back in the day was called zoning, or zone encoding. In this format, each column was used to encode a single character. Now, I should be clear here. Character doesn't necessarily mean a written character. Now, I should be clear here. Character doesn't necessarily mean a written character.

Starting point is 00:45:47 A column could hold a numeric value, or it could be holding a text character. It could also be holding special digital signals. Whatever we're talking, the data is encoded using a pattern of punches in one column. Each column of a punch card has 10 positions plus two overpunch positions on top, so call it 12 bits. That gives us up to 4096 possible values, assuming binary encoding for each column, or at least that each possible combination is utilized. This doesn't really match up with any common word length. Despite that, punch cards were one of the most common ways to talk with early computers. You might have a machine with a 32-bit word length, but you would still have to control it using 12-bit punch cards. So then, how are people talking about data stored on these cards?

Starting point is 00:46:44 So then, how were people talking about data stored on these cards? The technology predates computers, and it's not really a word-sized means of storage. Traditionally, cards were broken up into fields. Each field was a single, combined value, like a string of text or a number. Fields were composed of individual columns. These columns didn't have any standardized names outside of columns. However, they were sometimes called characters, since each column could encode one text character. This is the thread that will lead us to the byte itself. Early IBM computers were, in large part, highly advanced punch-card munchers.

Starting point is 00:47:28 You could call them computerized tabulators if you're talking to serious people. The IBM 305, one of the company's earliest machines, was undoubtedly a computer. It could be programmed, it could run calculations, the whole nine yards. But it was one of a special class of machines. You see, all IBM hardware was meant to deal with punch cards, at least at some point. That could lead to some awkward situations with encoding. The 305, as well as a number of later IBM machines, had variable word lengths. This is another place where the idea of a word breaks down. It also gets a little weird since technically the 305 was a character-based machine, so it's not

Starting point is 00:48:14 storing data in binary but rather in base 12 characters. But don't worry about that part. In practice, a programmer could tell the 305 to use any word length. This could range from a single character up to, I guess technically, the full memory size of the machine. This was convenient because of the whole fields thing. A punch card could be configured with all kinds of encodings. Instead of trying to enforce reasonable encoding standards on older technology, machines like the 305 were able to bend to the whims of cardstock. In this context, a word has almost no meaning. It could be a single character. It could be an actual English word. It could be a sentence or two. It could just be a pile of numbers. We're in a weird character computer, so maybe we can

Starting point is 00:49:07 give this a pass as an oddity, right? IBM made a few of these decimal and character computers that had variable word lengths. So we are stretching the idea of a word, but this is far enough removed that we may be tempted to disregard these examples. But what if there existed an honest-to-goodness binary computer, a real machine that had variable words? Well, allow me to introduce the IBM 7030, aka Stretch. Now, this is a very serious computer. This is actually IBM's first supercomputer. Its history is something that I need to cover in full sometime, probably sometime soon. It actually has a weird connection to Los Alamos and Enrico Fermi, so needless to say, it's in line with my recent rut. Anyway, the 7030 project started in 1955, with the first machine shipping in 1961.

Starting point is 00:50:08 Along the way, something interesting happened. Stretch was a full-on binary computer. It encoded data as 1s and 0s, just like anyone else. It also used a 64-bit word, so nice and big. Pretty spacious. bit word, so nice and big, pretty spacious. The weirdness, though, is that stretch could be configured to group bits into chunks smaller than a word. Maybe you could call it a sub-word, but as early as 1956, internal memos were already calling it by a better name. They called these smaller groupings bytes, spelled B-Y-T-E-S.

Starting point is 00:50:46 One of the designers on the stretch team, Werner Buchholz, is credited with coining the term. In a 1977 letter to the titular Byte magazine, Buchholz has this to say about the word. A byte was described as consisting of any number of parallel bits from 1 to 6. Thus, a byte was assumed to have a length appropriate for the occasion. Its first use was in the context of input-output equipment of the 1950s, which handled 6 bits at a time. The possibility of going to 8-bit bytes was considered in August 1956 and incorporated in the design of Stretch shortly thereafter. End quote. The idea of a character is key here. Folk will often call computers fancy calculators. I even fall into this trap myself. But a computer, in reality,

Starting point is 00:51:40 is a lot more than that. From the earliest days, there were systems that could handle textual data. Even some of IBM's computers were built specifically with text in mind. So in that context, text encoding was of prime concern. Of the multitude of ways to represent characters, IBM primarily stuck with a tried-and-true method, 6-bit encoding. In this scheme, a character is represented by a 6-bit number, a value ranging in decimal between 0 and 64. That allows for the entire English alphabet, usually only in one case, numbers 0 through 9, and common punctuation marks. By 1956, this type of character encoding was standard, if not standardized. At least the size had become a more prevalent option. You could see 6-bit characters in

Starting point is 00:52:34 telegraph systems, paper tape, and even IBM's own punch cards. They were also used in a lot of computers. Once again, we're looking at a chunk of data smaller than a word. I think it's only natural that a new term should be minted. According to Lore, that word was initially byte, spelled B-I-T-E. The I soon became a Y in order to prevent confusion, hence B-Y-T-E. Now, I don't know if that's entirely true, but it's a nice little story. There is another reason that 6-bits was set as the largest byte size early on. At first, Stretch was planned to be a 60-bit machine, as in, it would use a 60-bit wide word. As Buchholz explained in a 1956 memo, If longer bytes were needed, 60 bits would, of course, no longer be ideal. With present applications, 1, 4, and 6 bits are the really important cases.

Starting point is 00:53:51 In other words, 60 is a convenient number for variable-sized bytes. I want to focus on that final part of the quote for a minute, where Buchholz says that 1, 4, and 6 bytes are what matter. I think this really speaks to why it was advantageous to have a variable bit machine. When it comes to inputs and outputs, you want flexibility. It's nice to be able to handle as many types of data as possible. By supporting variable byte lengths, Stretch could talk with more types of devices. So, let's count the ways that Stretch was flexible. 6-bit bytes is easy.

Starting point is 00:54:34 That's for devices that talk characters. 6 is also nice since it's half of 12. So, two 6-bit bytes can represent a punch card column. 4-bit is also easy to explain. That's the right size for binary coded decimal. It's just BCD. So 4-bit bytes let Stretch talk to BCD devices. 1-bit is perhaps the easiest. That's just Boolean. Yes-no kind of stuff. Heck, you could even wire it up so that your IBM supercomputer could turn some lights on and off. The jump to 8 is where we hit a little bit of trouble.

Starting point is 00:55:10 The crux of the explanation comes down to two things, the English language and the realities of binary. These two factors are a little more intertwined than you might initially suspect. For this part, I'm working off Buchholz's contemporary stretch memos and a book called Planning a Computer System, Project Stretch, from 1962. I'm kind of remixing the arguments presented in both to give a coherent explanation. So while I'm not going to lay out a timeline, know that I am giving a historically grounded explanation. Okay, check this out. It's easy to calculate the largest number that can be represented with a group of bits. I've been

Starting point is 00:55:52 doing this throughout the episode, so I might as well lift the curtain and explain it. You just take the number of bits you're dealing with and raise 2 to that power. A 6-bit byte can represent 2 to the 6 numbers, or 64. A 4-bit byte can do 2 to the 4, so 16. An 8-bit byte, 2 to the 8, gives us 256 possible values. 6 bits is all you need for passable text encoding, But it's a no-frills affair. You don't get lowercase letters, for instance. 64 different characters just isn't quite enough space. If you add one more bit, taking you up to 7 bits, you get double the space, 128 possible values.

Starting point is 00:56:40 That is actually plenty. You can pack in numbers, both cases of letters, all symbols on a keyboard, and still have space for fancy things like signals and formatting characters. Then, all text should be encoded in beautiful 7-bit data. It follows that any computer that's expected to support text needs facilities to handle 7-bit bytes. The only issue is that little number. You see, 7 actually sucks for computers. Now, this is one of those cases like at the top where I have to be introspective for a moment.

Starting point is 00:57:17 To me, this all makes instinctive sense. 7 just isn't a computer number. Simple as. But that is not a very satisfying answer. That only works for digital dwellers like myself. So let's go back to basics here. All the nice bit sizes that we've been discussing share a few things in common. They're all round numbers, they're all positive numbers, of course, and they're all powers of two. That is to say, they can all be calculated by taking two to some power. The result is a class of nice computer

Starting point is 00:57:53 numbers. 1, 2, 4, 8, 16, 34, 64, 128, 256, on and on. These numbers are nice because they're easy to represent electronically. Put another way, computers like powers of two for a very simple reason. They're easier to deal with on the lowest levels. These are all values that can be cleanly represented in binary. To go through the same series, it's 1, 1, 0, 1, 0, 0, 1, 0, 0, 0, and so on. Notice that in binary, these numbers are actually just shifted over. By working with powers of 2, with bits grouped by powers of 2, that is, you can use all kinds of binary tricks. You can use different shortcuts to save time and circuit complexity. In that context, 7 is awkward. It doesn't fit into any nice patterns.

Starting point is 00:58:50 You don't get any tricks. You have to work for it. Powers of 2 also nest cleanly. I know there's some smart math word for this, but I'm going with nesting. A 64-bit byte can be broken down into two chunks of 32 bits, or four chunks of 16 bits, or eight chunks of 8 bits. Each division is clean, you don't have any wasted bits. Breaking things up by 7, conversely, sucks. There's such an aversion to these awkward numbers that folk tend to pad them up to the nearest power of two, so the lame

Starting point is 00:59:26 seven becomes the slick and cool eight. This does in theory waste a bit, but the results are worth it. By late 1956, the designs of Stretch are amended to support 8-bit bytes. Internal registers are bumped from 60 bits wide to 64 in order to make 8 a more natural fit. It may sound silly, but as Stretch was under development, 8-bit character encoding was experimental. It was a Wild West, so to speak. That said, there were a lot of experiments going on. There was a concerted drive to move away from 6-bit encoding. If nothing else, this was just to add in lowercase

Starting point is 01:00:06 letters. Planning a computer system goes into this in detail if you want more context. Basically, there were a number of proposed character encodings supported on Stretch. These all made use of 8-bit bytes, some to better effect than others. There's at least one table that clearly shows the empty void left by the extra rounding bit. This puts the creation of the byte itself sometime in 1956. By that time, we had programmers using 8-bit bytes. At least, sometimes. This was still the upper range of the new unit, and it was still only inside IBM. Stretch hit the market in 61. From there, the byte would enter the mainstream. Alright, that closes out the episode. We've built up from flops to bits to words to bytes.

Starting point is 01:01:06 I'll be the first to admit that there are some holes in the overall story. The exact reasoning behind all of these terms is, well, it's up to some interpretation. Bit is probably the most closed case. It makes sense as a portmanteau of binary digit, but we don't have a perfect paper trail to go off of. I also think we're ready to answer the final big question. Why has the 8-bit byte endured? Why do we still default to that size today? There are two final steps to ubiquity.

Starting point is 01:01:39 That's EBCDIC and ASCII. The first, the tongue-twisting EBCDIC, is a character encoding system developed at IBM and codified in the early 1960s. EBCDIC is the result of experiments in 8-bit text encoding, some of which were conducted during the development of Stretch. This becomes the dominant encoding standard for IBM hardware, which necessitates the 8-bit byte. ASCII is, well, it's a better name. This is another character encoding standard also developed in the early 60s. Like IBM's standard, ASCII uses 8-bit bytes to represent a character. This becomes a dominant standard on many non-IBM mainframes. So once we reach the 1960s, there are two big reasons for computers to support 8-bit data structures.

Starting point is 01:02:37 And the idea of a byte is already circulating as a nice unit of measurement. Thanks to that drive, plus really the convenience of a power of two, eight turns out to be a very lucky number indeed. Thanks for listening to Advent of Computing. I'll be back in two weeks' time with another piece of computing's past. If you like the show, there are a few ways you can support it and help it grow.

Starting point is 01:03:02 If you know someone else who'd be interested in the history of computing, then please take a minute to share the show with them. You can also rate and review on Apple Podcasts. If you want to be a super fan, you can support the show directly through Advent of Computing merch or signing up as a patron on Patreon. Patrons get early access to episodes, polls for the direction of the show, and bonus episodes. The donations also really help me keep going and help me get access to new sources to improve the Thank you for watching!

Your Ad Here

Advent of Computing - Episode 117 - What's in a Byte?

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.