Advent of Computing - Episode 117 - What's in a Byte?
Episode Date: September 24, 2023Byte has to be one of the most recognizable parts of the digital lexicon. It's an incantation that can be recognized by even the uninitiated. But where does the byte come from? Has it always existed,... or did it more recently come into being? And, more specifically, why is a byte 8 bits? Is it some holdover from long ago, or is there some iron clad rule of 8's? Â Selected Sources: Â https://archive.org/details/byte-magazine-1977-02/page/n145/mode/1up?view=theater - Buchholz on the "byte" in BYTE! Â https://sci-hub.se/10.1049/pi-3.1949.0018 - A STORAGE SYSTEM FOR USE WITH BINARY-DIGITAL COMPUTING MACHINES Â https://ia600208.us.archive.org/32/items/firstdraftofrepo00vonn/firstdraftofrepo00vonn.pdf - The First Draft of a Report on EDVAC
Transcript
Discussion (0)
I've been told that 8 is a weird number.
Now, personally, I don't buy that.
Eight seems very natural to me.
It's a nice round number in multiple ways, actually.
But my friends and non-digital colleagues insist that 8 is not normal.
It's not a multiple of 5.
It's also not a multiple of 10.
It also looks kinda awkward if you count it out on your
fingers. Plus, it's hard to count by. Now, I insist you try this. This is the point where
I started to maybe see the light a little bit. So try counting by fives and then try counting by
eights. I end up going eight and then 16 and then I have to stop and do some math.
So, yeah, maybe 8 is a little bit unnatural.
This is one of those interesting little tidbits that I really enjoy.
I personally spend a lot of time thinking about my digital assumptions.
That's partly why I like producing Advent of Computing. It gives me a good reason
to stop and think about why the computerized world is a certain way. I've heard it said that
programmers have to twist their minds to work with computers. This whole eight-based argument is
perhaps a symptom of this larger disease. Computers, on their very basic level, are alien things. The digital world is, as the name
suggests, a whole other world. A programmer, thus twisted into this realm, sees eights everywhere
and doesn't bat an eye. On the other hand, an uninitiated sees the same sea of eights and,
perhaps, recoils. I don't think I'm qualified to entirely uncoil a programmer's mind.
As I've said, I'm already afflicted.
I see most things through a thick pair of binary glasses.
But maybe I can uncoil a small component.
Maybe we can figure out why eight is a lucky number for programmers.
Welcome back to Advent of Computing.
I'm your host, Sean Haas, and this is episode 117, What's in a Byte?
This entire episode is going to revolve around one simple question. Why do we use 8-bit bytes? This, as with many of my simple questions, turns out to be a pretty expansive topic.
Before I get into things, I want to give some context as far as where things stand in the
21st century. Most processors that we use today are in the x86 family. This just means that
the bulk of CPUs all follow the same basic design and can all run the same code. There are two
central parts to this design, memory and registers. Memory is perhaps the easiest part to explain.
In any computer, you need a place to store information that you want to work with. That's the role of memory. It's this big pile of bits that a program can access
and manipulate. Memory in the x86 architecture is addressable, meaning that every piece of memory
can be called up by some numeric address. Think of it like a call number in a library. Each of these addressable chunks
is a byte. That's the smallest addressable unit of data in the x86 world. We can further subdivide
this. Each byte is represented by eight binary digits. That means that you have eight slots for
either ones or zeros to fit into. Registers are another form of memory, but they're more
specialized. Registers sit inside the processor itself, and they're used for actual operations.
If you're adding numbers together, then you have to store your numbers in registers. Or,
at least you have to store one number in a register. In the x86 architecture,
registers are 16 bits wide.
Those are composed of two bytes.
You might hear it said that the x86 architecture has a 16-bit word.
A word being a small unit of data that the processor operates on.
Sometimes it's discussed as the natural size the processor wants to work with.
So it's plain to see here that 8 is a very important number. The entire x86 architecture, which the world is based off of, works 8 bits at a time.
But the question is, why that number? Why 8? Why not 4 or 16 or 24 or really any number for that matter. This is where our journey begins. In this episode,
we're following the trail of bits. I'm going to start by working through the history of how we
talk about data, since that's going to be really important once we get to the byte itself. And
I assure you that the byte was actually invented. The idea of a byte is
actually pretty new compared to the computer. That's something that I wouldn't have bet on,
personally. As we build up to the byte itself, I want to examine why 8 has become such an important
number. By the end of this episode, I hope that we have a better understanding of how 8-bits became
so popular, and why it's in dirt. Is it a good solution? Is it just tradition? Or is it some
mix of the two? Before we get into the episode proper, I have a quick announcement to make.
This year, on November 4th, I'm going to be speaking at the Intelligent Speech Conference.
November 4th, I'm going to be speaking at the Intelligent Speech Conference. This is an online conference where podcasters share talks under a certain theme, and the theme this year is
contingency, or backup plans. I'm personally kind of excited for that because one of my favorite
stories of all time has to do with Intel and some poor planning and backup plans they made. So I'm planning to be talking about a certain ship.
Now, the conference is on November 4th, 2023.
It is a paid-for event.
Tickets currently are $20 until October 1st.
Then they're going to jump to $30.
So if that sounds interesting, then I recommend getting a ticket.
The lineup this year looks pretty good.
sounds interesting, then I recommend getting a ticket. The lineup this year looks pretty good.
To see the schedule and get tickets, you can go to intelligentspeechonline.com.
Now, with that said, let's get into the episode itself.
I will warn you, this episode is going to be a bit of a ramble. I initially wrote that we'd kick things off with the classic computer science corner,
but in reality, most of this episode is going to be classic computer science corner.
Now, I think the best place for us to start is by building up an understanding of how
computers encode numbers.
From there, we're going to keep building and building and building.
Historically speaking, numeric representation has been a bit
of a wild west. The outline I gave at the top is only applicable to pretty modern computers,
and by pretty modern, I mean computers derived from the Intel 8086, which itself is actually a
pretty old chip, but I'm getting a little off track. Some very early computers
stored numbers as decimal values. You know, base 10, the numbers between 0 and 9. Normal,
human-readable kind of things. There's no reason you can't do this. On the surface,
it might even sound kind of nice. You can inspect readouts very easily this way.
might even sound kind of nice. You can inspect readouts very easily this way. But the decimal setup quickly runs into problems. It's actually much easier, automation-wise, to store data in
binary. As in, base 2, 1 and 0, on and off, true or false, however you want to call it.
By working with binary values, you can use binary logic gates,
which automatically loops you into this huge existing body of logic theory. You get piles
and piles of papers going back to time immemorial. A voltage on a wire can be mapped out to true.
A lack of voltage can be mapped into false. and then you just pull out one of those dusty
old papers and you can build complex logic.
Binary is a very natural way to bridge the gap between logic theory and electronics.
It kind of just works, there's really no better way to put it.
So why did early machines use decimal?
Or rather, why did some early machines use decimal? Or rather, why did some early machines use decimal? The ABC, the
Atanasov-Berry computer, is, legally speaking, the first electronic digital computer in the world.
That machine used binary internally. So technically, things get off on the right foot.
As far as digital offenders, well, we can go to the usual
dead horse. ENIAC stored data as 10-digit decimal numbers. The issue is that these numbers had to be
operated on with specialized circuits. Instead of using fancy binary circuits, ENIAC operated on registers using more, well, arcane means, let's just say.
If given enough time, this could have been developed into a more elegant art,
but the widespread adoption of binary made decimal computers kind of an evolutionary dead end.
Just as a quick aside, I do want to throw in some unconfirmed speculation.
ENIAC was built around these 10-digit numbers.
Why 10?
One theory goes that 10 was chosen because that's how many digits a standard desktop calculator of that era could handle.
I think there is some weight behind this idea.
ENIAC was, in very large part, designed as an automatic mockery
of contemporary desktop calculators. When you get down to it, ENIAC was basically a supercharged
and programmable adding machine. And it is true that many desktop calculators in the 40s had 10
digit registers inside them. No good evidence for causality exists, but it's a
neat theory to point out. Okay, so digression aside, once we disregard some oddities, we arrive
at binary. In this encoding scheme, numbers are represented as groups of 1s and 0s. Now,
this is just one way to represent a number. The same number can be
represented in base 10 or base 2 or base 1 billion if you want it. Whatever encoding you choose,
you're still dealing with the same number. So let's take the number 3 as an example.
In base 10, our friendly decimal encoding, 3 is, well, it's right there, it's 3.
That's pretty cut and dry.
In binary, 3 would be represented as 1, 1.
This is, also, pretty cut and dry.
At least once you know how it works.
When we write out numbers in decimal, we start in the ones place, then we go to the tens place,
then the hundreds place, and so on until we have the full number written out.
Binary works the same way, but instead of the one, ten, hundreds, we have the ones place,
the twos place, the fours places, the eights place, and so on.
So the binary number one, one is just saying we have a value of 1 in the 2's place and 1 in the 1's place.
In other words, 2 plus 1, which is 3. Nice and sleazy, right?
That's simple to think about for small numbers.
5 is 1, 0, 1, 8 is 1, 0, 0, 0, and 9 is 1, 0, 0, 1.
But what if we want to represent a big decimal number in binary?
Like, for instance, 10.
That's pretty big as far as numbers go.
That may be one of the largest numbers I can think about.
One way would be to just, you know, use binary like a normal computer person would. 10 in decimal is
1010 in binary. Done. Simple. Cut and dry. Another option is the so-called binary coded decimal,
or BCD. This is, to put it simply, a very hateful thing that humans have inflicted on computers.
In BCD, you represent each decimal digit of a number as a separate binary number.
In this scheme, 10 in decimal would become 0001 0000. zero zero one zero one zero one. I may have summoned
some kind of daemon doing that. Now, BCD is a weird compromise format. It's somewhere between
human decimal numbers and computer binary numbers. If you can count to nine in binary,
then you can use BCD no problem.
That said, you kind of lose a lot of elegance doing this.
And, well, a lot of space.
The bottom line is that BCD is wasteful.
To represent decimal numbers between 0 and 9, you need 4 bits.
4 ones and zeros.
But here's the thing.
9 in decimal is 1001 in binary. That leaves some wasted bits. With 4 bits, you can actually count all the way up to 16. I'm sure you could work out
how much you're wasting by using BCD, but I'm not in the mood to run number theory stuff.
BCD, but I'm not in the mood to run number theory stuff. Just know that BCD is a waste of space.
You're always better off using binary instead of trying to warp it into some human readable gunk.
That said, BCD does do something interesting. It works by breaking up binary data into groups of digits. Instead of an unending stream of ones and zeros, BCD represents four-digit chunks of data.
Each chunk can stand alone or be read in relation to other chunks.
Breaking numbers up into smaller chunks is something that we do naturally.
Take, for instance, the humble comma.
Us humans tend to break larger numbers up into these three-digit groups.
You don't write 10,000 without any commas.
That looks weird.
We would more often write 10, and then three zeros to represent that number, 10,000.
Depending on your region, that separator might be a comma, space, or a period,
but the idea is the same.
In this instance, we're breaking up a large number to make it more readable, more manageable.
One way we can look at BCD is a binary rendition of this decimal quirk.
It's a means to naturally group numbers.
I have my problem with how the format
works in practice, but it does have its uses. It's a pretty humanistic way to group binary data.
It makes this in-human format a little more sensical to deal with. You may be asking yourself
at this point, why have I been taking you on this BCD diversion? Well, I think this can serve as a
touchstone as we travel deeper into the bowels of the computer. As we start dealing with honest
to goodness machines, we need ways to manage data, which means we need ways to manage binary numbers.
You can't really get away with using a million bits of storage for one huge number.
That's not very useful.
BCD is one of the most easily understood ways to break up binary data into more manageable chunks.
Simply portion off your storage into 4-bit regions and treat those regions as digits.
Done.
From here, we can start a few lines of inquiry.
I will hazard that this is going to
come down to a lot of boring etymology stuff. So first off, I need to address the whole bit thing.
This is really basic stuff, but I think we should nail this one down before working up to larger
data structures. By bit, I mean a single binary digit.
A bit can either be 1 or 0, which is electronically represented by on or off, and logically it's true or false.
You know, binary.
But where does that name come from?
The term first appears in print in 1948 in a paper by Claude Shannon.
first appears in print in 1948 in a paper by Claude Shannon.
That paper, A Mathematical Theory of Communication, isn't actually about computing at all.
Instead, it's about signaling and data representation.
In part, this paper is concerned with pulse code modulation, which is something I always like to see.
In the opening page, Shannon is explaining that different numerical bases can be used for
encoding messages. The choice of base will have an impact on how the message can be propagated
and handled. To quote, the choice of a logarithmic base corresponding to the choice of a unit for
measuring information. If the base 2 is used, the resulting units may be called binary digits, or, more briefly, bits,
a word suggested by J.W. Tukey. A device with two stable positions, such as a relay or a flip-flop
circuit, can store one bit of information. End quote. Two things to note here. Bit is actually presented as a portmanteau of binary and digit.
Also, Shannon is citing a colleague, so bit may be an older term than 1948.
The usual theory I've seen goes something like this.
John Tukey, one of Shannon's colleagues at Bell, used to work with John von Neumann.
At some point, supposedly in 1946, Tukey suggested the portmanteau.
This is a nice little story.
It ties the origin of the bit to the early age of computing, and it even includes our main man Johnny.
He is always nearby, after all.
Now, here's the thing about this. It's actually
really hard for me to validate this theory. There isn't some chain of public letters where
Tukey and von Neumann are cracking jokes about bits. What I can verify is that the term binary
digit was in use during this period. I can actually find this phrase in print as early as 1942.
Even newspapers are talking about binary digits by 1947.
Perhaps most relevant is the fact that the first draft of a report for EDVAC,
a report written by von Neumann himself,
uses the term binary digit to talk about, well, binary digits.
So von Neumann was using the phrase. That side of the story seems plausible then. I just want to
leave a little space for other possibilities here, since I don't have rock-solid evidence for the
origin of bit, especially since, you know, bit is just a very
common word. It could just as easily have been used since each binary digit is just a little
bit of data, you know, just a little guy. That's probably enough about the bit, so let's kick
things up to the next level. As we saw with BCD, it can be very useful to group bits together.
As we saw with BCD, it can be very useful to group bits together.
So, what should we call these groups of bits?
We can look at the historic context for this.
At this point, we get to jump from the archaic machines of the early 40s all the way up to the weird and wild machines, the futuristic computers of 1946.
Futuristic Computers of 1946.
By that, I mean we need to turn to the earliest writings on stored program computers.
You know, computers with actual memory.
A good place to start would, perhaps, be the first draft of a report for EDVAC.
At least, you'd be tempted to think that if you hadn't read the paper.
This is, full stop, the first paper to describe a stored program computer.
It's of huge historic importance, and it sets dozens of precedents.
But it's also kind of strange.
It's very much an early paper.
von Neumann had a particular way with words. In the draft report, he attempts to explain a computer using a biological analogy. He calls different components of machine organs.
He explains connections as neurons. He even sometimes calls binary digits stimuli.
So it should come as no surprise that von Neumann's choice of words
is a little weird. That said, the EDVAC paper does describe a grouping of binary digits.
Von Neumann most often uses the term unit as a synonym for binary digit. He calls a grouping of 32 units a, quote, minor cycle. Now, perhaps
this sounds a little silly to our modern ears. I'd personally be a little tickled if someone
showed me a computer with 4,000 minor cycles of memory. Maybe that would be better termed for kilominer cycles at that point. From my reading,
it seems that von Neumann is using this minor cycle term because he was primarily discussing
delay line memory. This gets a little weird to think about. At this point, there wasn't a real
programmable computer with any kind of practical memory. There were some computer-like
machines that used recirculating or delay line memory. I think there were a few IBM punch card
machines that had drum memories, and some radar systems were using very early signal delays that
looked like memory, at least if you squint. Von Neumann does hedge a little, saying that memory doesn't necessarily have to come in the form of a delay line,
but his language gives himself away.
The cycle part fits this idea.
Delay memory works by shunting bits off into something.
Early machines like EDZAC used tubes of hot mercury and sent in bits as acoustic waves.
Some later machines used drums coated in magnetizable materials.
Either way, you end up with these time-dependent forms of memory.
You have to wait for waves to propagate through mercury, or for a drum to spin.
Once that bit hits the other end, or the drum spins to the right position, you get your bit
back. In that context, calling a group of bits some kind of cycle makes a lot of sense. It's a
very physical way to describe data. When am I going to get my chunk of information? Well, you have to
wait for the proper minor cycle. I think that just kind of works.
That addresses some of the confusion here, but it leaves another fun bit of weirdness.
Why 32 bits? I think this is a good place to address a misconception. All computers weren't
very capable. I mean, they were a lot better than mechanical calculators,
but EDVAC can't really hold a candle to even the most weak modern machine.
I bring this up because sometimes bittedness, the bit size of a machine,
or whatever you want to call it,
can be confused for a measure of a machine's power or performance.
I've even fallen for this trap
myself. I remember looking at a new computer at Fry's back in the day with a big 64-bit AMD
sticker on it and, pretty quickly, checking my wallet to see how much cash I was carrying.
The version of EDVAC described by von Neumann is 32 bits, insofar as that's the size of its minor cycles.
That's how many bits EDVAC works with at a time.
More specifically, that's the size of operands that EDVAC's math circuits work on.
When you tell that computer to add two numbers, it expects two 32-bit numbers, two minor cycles.
EDVAC's memory is designed to match that.
It stores data as 32-bit chunks, as minor cycles.
But, once again, this is a really early computer, so things act a little unexpectedly.
This is going to be a small aside, but I think it's
interesting to note. EdBack technically only had 30-bit long numbers. The extra 2 bits were reserved
for rounding after multiplication and division operations, since those could result in longer
numbers. That means that, in practice, a minor cycle had 30 usable bits. So, nitpicking
aside, why 30? Partly, it was just a guess at a reasonable size. Von Neumann admits that he's
open to change, or rather, the ENIAC team that developed EDVAC is open to change. 30 is just a starting point for discussion.
He explains that 30 is based off his experience with differential equations.
This would have been informed by work done on the Manhattan Project. Von Neumann defines a
standard number in heavy quotes as having eight significant digits. In other words,
a number with up to eight digits after the decimal place.
That can be represented by a binary number with 27 bits,
and von Neumann rounds that up to 32, get an extra significant digit.
Add 2 for rounding, and you have the nice 32-bit minor cycle.
I want to highlight one other thing that von Neumann kind of glosses over.
EDVAC uses a few different ways to represent numbers. Von Neumann mainly discusses a form
of floating point representation that uses exponents. That's cool, but I want to look at
the simple integer case. 30 bits is the minimum required to represent a 10 decimal digit number.
Do with that what you will.
The EDVAC report, through a strange twist of fate, ends up becoming the blueprint for many of the earliest stored program computers.
Personally, I believe this is just because the draft report was really the first full and publicly accessible description of a computer.
This is where we reach the next stage of our story.
Thankfully, the minor cycle name is dropped.
In its place, something a little more confusing is adopted.
This is the point where I get to introduce something frustrating.
The word is word.
That's spelled W-O-R-D.
During the early era of computing,
let's just call that the late 40s into the early 50s,
this is the term used to describe a grouping of bits.
More specifically, a computer's memory in this period is subdivided into words.
So you run into this fun shuffle where an old paper will say that such and such computer has
four kilowords of memory.
Word size wasn't standardized in this period,
so a word on one computer might be 17 bits, on another it might be 24 or even 64 bits.
In general, this word size is matched up to register size. In other words, it works the
same as von Neumann's minor cycles. For instance, take the Manchester Baby. This was actually the
first operational stored program computer. It had memory where a
program could be stored and it could run code from that memory. The baby became operational in 1948.
It was a 32-bit Word computer, just as prescribed in the EDVAC report. This meant that each address
in the baby's memory was 32 bits wide. Its internal registers were also 32 bits wide.
In fact, the Manchester Baby gives us the first use of the term
word for describing a group of bits in a computer. This comes from a 1948 letter to the journal
Nature. At this point, computer science didn't really exist as a field, and we don't have
anything like a comp sci journal, so articles on computers just kind of show up wherever.
Part of this letter discusses the so-called data store of the Manchester baby. That includes
registers and memory. Quote, the capacity of the store is at present only 32, quote, words, each of 31
binary digits to hold instructions, data, and working. End quote. Disregard the 31 thing here,
either the baby changed to 32 bits after the letter, or it may have been a case where an
extra bit wasn't actually accessible at that time. The point is the term word, and the fact that the letter puts it in quotes, as in quote-unquote
words. Come along with me as I get a little unhinged trying to figure this out. Here we have
one of the first published scraps of information on the first functioning stored
program computer. The first computer that we could recognize as part of our current lineage
of machines. This letter, this informal note sent off to nature, uses words to describe groups of
bits. But crucially, that term's in quotes. The term isn't defined explicitly, it's just
used with scare quotes around it. In my head, that says something about how new the term is in 1948.
My gut feeling is that word was new, but it was already seeing some kind of use, hence the lack
of an explicit definition.
That's the start of the trail, but where does this one lead?
The most immediate thread comes from the memory storage methods used on the baby.
This computer employed a form of CRT-based memory called a Williams-Kilburn tube,
more commonly just referred to as a Williams tube. In fact,
the letter to nature was co-authored by the Williams and Kilburn in question,
so this looks like a good lead. Through a little searching, we can find a paper published in 1948
by the same duo, titled A Storage System for Use with Binary Digital Computing Machines. It's a full description
of the tube-based memory used in the baby. The specifics are a little out of scope for
the discussion today. I've explained these tubes in past episodes on memory, so you can
check out episode 77 in the archive for more details. In short, these tubes store binary data as a grid of charges on
the face of a cathode ray tube. It allows for true random data access, and it's relatively reliable,
if a little low on density. What matters for us is how data is organized on these tubes.
Each tube is, for all intents and purposes, just a CRT display. During normal operation,
a lid is kept over the CRT for a number of reasons, but you can actually flip up that lid
to inspect memory. When you do, you see a uniform grid. At each location of the grid is either a
dot or a dash. A dot represents a 1 and a dash represents a 0. The 49 paper describes this grid
as being 32 dots wide by 32 dots tall. To borrow from its wording, you have a grid of 32 lines,
each being 32 binary digits wide. The paper calls these lines words. Now, to set up the actual passage that defines
what a word is, I have to give a little more context. The tube paper describes storage as
part of a larger computer system, but there is some antiquated language at play. Or, at least,
the paper is written for a lay audience. At this point, almost any audience would be of lay people, I guess.
Anyway, the passage refers to a program as a table.
I think that's because of the storage context.
In practice, a chunk of memory would look like a table.
Here's a summary of the Williams-Kilburn rendition of memory. to another. To every address, a digit combination will be assigned, so that an instruction will
consist of two digit combinations and is indistinguishable from a number in appearance.
Instructions and numbers, which are collectively termed, quote, words, are therefore similar,
the only difference between them being their function in the machine, end quote. This uses the term
address, in quotes, to refer to discrete locations in memory. Each of these discrete locations holds
a, quote, word, also in quotes. Now, here's the fun wrinkle. This paper is sent to print in 1948.
It's part of a larger paper trail surrounding the Williams-Kilburn tube.
In 1946, scant months after the EDVAC paper leaks, William and Kilburn submit a patent
application for their memory device. That patent does not use the term word.
So either the term word gets coined by Williams and Kilburn sometime
between 1946 and 48, or it's some older term that I have no hope of ever sourcing.
I'm going to assume option one, that the term is coined in the late 40s by Williams and Kilburn.
I like this option because it saves me from, you know, trying to scour more
data, and it also gives us a possible story. Now, this is ultra speculation, so hold on to your
pants. On these memory tubes, each row is called a line. By the time the Manchester Baby is up and running, each line is 32 bits wide,
the word size of the machine. But during development, these memory tubes had different
sized lines. William and Kilburn tried out a number of different line lengths before settling
on 32-bit. The Manchester Baby also went through at least one word size change.
In that letter to Nature, it was describing a 31-bit word, but the final machine used 32-bit words.
That's, of course, assuming that change was intentional and not some technical issue that
got sorted out.
So here's my theory.
At some point, a line on these memory tubes was subdivided into multiple groups. What would you call part of
a line of data? Well, a word would make some kind of sense. Think of a line like a sentence, a line
of text. Those lines are composed of words. I'd wager that the term was suggested and used internally. At some point, the word became 32 bits wide,
which matched up with the width of a single line on a tube.
The term then stuck around since it was handy and,
more than likely, it's a little funny.
It's a neat little joke.
This theory seems reasonable to me
because we run into these kind of legacy terms all the time. I mean,
in Linux, you still use the term teletype to refer to a virtual terminal. No one has seriously
used a teletype in many decades at this point, but the term persists. I'd totally buy that word
came up as a useful term during development of the Williams-Kilburn tube
and then just stuck around. Now, this theory could be totally wrong. I mean, the term
word is so commonplace as to be pretty much impossible to research. The only other inkling
of an idea I have comes down to the phrase, quote, two-bit word. This is an old-timey
phrase used to refer to fancy terms, to big, cool words. It actually shows up in a lot of print
material around the 1940s. So maybe there's some long-lost joke about a memory address being a 32-bit word? Ah? But I honestly have no idea. It can also be
that word was just a natural choice for some reason I can't comprehend. The term very quickly
enters the lexicon. I have big books from the 1950s, these big printed tomes, that are already
using word to describe chunks of memory. By the end of the 50s, there are printed tomes that are already using Word to describe chunks of memory. By the
end of the 50s, there are books that don't even index for Word, since it's just so common. If you
did run an index, it would probably just be a listing of every page in the text. The term was
in widespread use, but it had a slightly different meaning on every computer. There was no standard word size.
It depended on machine. EDVAC recommended a 32-bit word. The Manchester Baby followed that
recommendation, but another machine named EDZAC used a 17-bit word. The IAS machine, often seen
as the spiritual successor of the EDVAC report,
well, that set a word length at 40 bits.
What's infuriating is that folk talk about memory sciences in terms of words back in the day.
So you might read that IBM was coming out with a cool new machine, and it came stock with 4 kilowords of memory.
That means 4,000 words. But how much memory is that, really?
Is it more or less than last year's Univac that supported 12 kilowords? How many bits are we
actually talking? Word size also breaks down once computers reach, well, honestly, any level of
sophistication. It all makes good sense when
each chunk of memory is one word, and each register is one word wide. You're only dealing
with words, so it's a nice little term. This is especially the case during the first generation
of machines. Here's something that I haven't really thought about before, but makes a whole
lot of sense in retrospect. Long-time listeners will know that I often't really thought about before, but makes a whole lot of sense in retrospect.
Long-time listeners will know that I often call memory the most complex part of a computer.
Since day one, memory has been hard to build.
It's taken us decades and many failed attempts to create reasonable forms of computer memory.
Now, let us consider the humble register. A register is actually just a small chunk of
memory that's connected directly to a computer's math and logic circuits. It's the immediate
working space of the machine. Here's something that never hit me until I was working on this
episode. A register is itself a form of memory. That means that registers are actually subject
to all the technical problems of computer memory. In a modern computer, registers are implemented
totally separately from large storage memory. You have your fancy sticks of RAM that live in slots
on your motherboard, while registers are quite literally etched into the CPU
itself. In these early days, however, these two forms of memory were linked. In fact, some early
computers even call memory locations registers. So, add a little more confusion to the memory mix.
Let me explain this a little more. The Manchester Baby makes for a good example here. That computer
implemented registers as a few lines of storage on one of its Williams-Kilburn tubes. Thus,
registers were really just another region of memory. The IBM 704, another contemporary machine,
used drum memory as its data store, and registers were
implemented as a special region on that drum. In these cases, the line between register and memory
is blurred. This also means that word size between registers and memory was kept consistent.
In some cases, you might have a computer with registers that were two words wide,
or could get fancy and have a register that's half a word wide, but no matter the setup,
there was still a direct relation. It still made sense to talk about computers in terms of words
because that was the core unit of data for everything. It wasn't just ideological, it was a very physical reality.
If your registers are all composed of words, the same physical words that make up memory,
then why would you need another term? You have a four-kiloword memory space and a one-word
accumulator. It all falls in line very naturally as an extension of the computer's design.
It all falls in line very naturally as an extension of the computer's design.
While this may be natural in some cases, it kind of falls apart in others.
Not all computers used in-memory registers.
Even amongst those early computers that mapped memory into deeper circuits,
not all registers were nice, even words.
The LGP-30 is an interesting example of this.
That was another drum-based computer, like the IBM 704.
The LG P30 stored data in memory as stripes running down a spinning drum.
Registers were stored as a set of bands running around the drum.
So while the same physical device was used for both registers and memory, the implementation, the specific layout, differed. A word meant something different in different
contexts here. Another example of where this breaks down is the UNIVAC-2. This computer had
normal-ish memory. It was broken up into words. Registers were implemented as their own circuits.
So there was no limitation on how registers could be composed. Some registers were one
36-bit word in size, but not all followed the convention. UNIVAC2 had at least one optional
register that was 15 bits wide. In the documentation, what do you call that? Do you say that it's a 0.411
word register? There's another place where the word word loses its meaning. So far, we've only
been discussing internal data representation. That is, how the computer stores its own numbers for its
own secret and perhaps evil uses. Let us now consider the flip side of the coin, input and
output encoding. In other words, how the computer communicates with the lowly world outside.
Any discussion here must start with the venerable punched card.
Just to be clear, this is pre-computer technology.
Punch cards as we know them date all the way back to the 1890s.
IBM codified the card format in the early 20th century.
That meant that somewhat standardized punch cards were well-established prior to the advent of computing. It's little wonder that as
computers appeared, punch cards became a de facto method for reading in data and outputting results.
There are a number of different encoding schemes that have been used on punch cards over the years.
At the most basic, you could just encode raw binary on a card, as simple as just punching
a pattern of holes.
That's workable, but it's not really what the IBM format was intended for.
The premier encoding scheme back in the day was called zoning, or zone encoding.
In this format, each column was used to encode a single character.
Now, I should be clear here.
Character doesn't necessarily mean a written character. Now, I should be clear here. Character doesn't necessarily mean a written character.
A column could hold a numeric value, or it could be holding a text character. It could also be
holding special digital signals. Whatever we're talking, the data is encoded using a pattern of
punches in one column. Each column of a punch card has 10 positions plus two overpunch positions on top,
so call it 12 bits. That gives us up to 4096 possible values, assuming binary encoding for
each column, or at least that each possible combination is utilized. This doesn't really match up with any common word length.
Despite that, punch cards were one of the most common ways to talk with early computers.
You might have a machine with a 32-bit word length, but you would still have to control it
using 12-bit punch cards. So then, how are people talking about data stored on these cards?
So then, how were people talking about data stored on these cards?
The technology predates computers, and it's not really a word-sized means of storage.
Traditionally, cards were broken up into fields.
Each field was a single, combined value, like a string of text or a number.
Fields were composed of individual columns. These columns didn't have
any standardized names outside of columns. However, they were sometimes called characters,
since each column could encode one text character. This is the thread that will lead us to the byte
itself. Early IBM computers were, in large part, highly advanced punch-card munchers.
You could call them computerized tabulators if you're talking to serious people. The IBM 305,
one of the company's earliest machines, was undoubtedly a computer. It could be programmed,
it could run calculations, the whole nine yards. But it was one of a special class of machines.
You see, all IBM hardware was meant to deal with punch cards, at least at some point.
That could lead to some awkward situations with encoding.
The 305, as well as a number of later IBM machines, had variable word lengths.
This is another place where the idea of a word breaks down.
It also gets a little weird since technically the 305 was a character-based machine, so it's not
storing data in binary but rather in base 12 characters. But don't worry about that part.
In practice, a programmer could tell the 305 to use any word
length. This could range from a single character up to, I guess technically, the full memory size
of the machine. This was convenient because of the whole fields thing. A punch card could be
configured with all kinds of encodings. Instead of trying to enforce reasonable encoding standards on older technology,
machines like the 305 were able to bend to the whims of cardstock. In this context, a word has
almost no meaning. It could be a single character. It could be an actual English word. It could be a
sentence or two. It could just be a pile of numbers. We're in a weird character computer, so maybe we can
give this a pass as an oddity, right? IBM made a few of these decimal and character computers that
had variable word lengths. So we are stretching the idea of a word, but this is far enough removed
that we may be tempted to disregard these examples. But what if there existed an honest-to-goodness
binary computer, a real machine that had variable words? Well, allow me to introduce the IBM 7030,
aka Stretch. Now, this is a very serious computer. This is actually IBM's first supercomputer. Its history is something
that I need to cover in full sometime, probably sometime soon. It actually has a weird connection
to Los Alamos and Enrico Fermi, so needless to say, it's in line with my recent rut. Anyway,
the 7030 project started in 1955, with the first machine shipping in 1961.
Along the way, something interesting happened.
Stretch was a full-on binary computer.
It encoded data as 1s and 0s, just like anyone else.
It also used a 64-bit word, so nice and big.
Pretty spacious.
bit word, so nice and big, pretty spacious. The weirdness, though, is that stretch could be configured to group bits into chunks smaller than a word. Maybe you could call it a sub-word,
but as early as 1956, internal memos were already calling it by a better name.
They called these smaller groupings bytes, spelled B-Y-T-E-S.
One of the designers on the stretch team, Werner Buchholz, is credited with coining the term.
In a 1977 letter to the titular Byte magazine, Buchholz has this to say about the word.
A byte was described as consisting of any number of parallel bits from 1 to 6.
Thus, a byte was assumed to have a length appropriate for the occasion.
Its first use was in the context of input-output equipment of the 1950s, which handled 6 bits at a time.
The possibility of going to 8-bit bytes was considered in August 1956 and incorporated in the design of Stretch
shortly thereafter. End quote. The idea of a character is key here. Folk will often call
computers fancy calculators. I even fall into this trap myself. But a computer, in reality,
is a lot more than that. From the earliest days, there were systems that could handle textual data.
Even some of IBM's computers were built specifically with text in mind. So in that context,
text encoding was of prime concern. Of the multitude of ways to represent characters,
IBM primarily stuck with a tried-and-true method, 6-bit encoding. In this scheme, a character is
represented by a 6-bit number, a value ranging in decimal between 0 and 64. That allows for the
entire English alphabet, usually only in one case, numbers 0 through 9, and common punctuation marks.
By 1956, this type of character encoding was standard, if not standardized.
At least the size had become a more prevalent option. You could see 6-bit characters in
telegraph systems, paper tape, and even IBM's own punch cards. They were also used in a lot
of computers. Once again, we're looking at a chunk of data smaller than a word. I think it's only natural that a new term should be minted.
According to Lore, that word was initially byte, spelled B-I-T-E.
The I soon became a Y in order to prevent confusion, hence B-Y-T-E.
Now, I don't know if that's entirely true, but it's a nice little story. There is
another reason that 6-bits was set as the largest byte size early on. At first, Stretch was planned
to be a 60-bit machine, as in, it would use a 60-bit wide word. As Buchholz explained in a 1956 memo, If longer bytes were needed, 60 bits would, of course, no longer be ideal.
With present applications, 1, 4, and 6 bits are the really important cases.
In other words, 60 is a convenient number for variable-sized bytes.
I want to focus on that final part of the quote for a minute,
where Buchholz says that 1, 4, and 6 bytes are what matter.
I think this really speaks to why it was advantageous to have a variable bit machine.
When it comes to inputs and outputs, you want flexibility. It's nice to be able to handle as
many types of data as possible. By supporting variable byte lengths, Stretch could talk with more types of devices.
So, let's count the ways that Stretch was flexible.
6-bit bytes is easy.
That's for devices that talk characters.
6 is also nice since it's half of 12. So, two 6-bit bytes can represent a punch card column.
4-bit is also easy to explain.
That's the right size for binary coded decimal. It's just BCD. So 4-bit bytes let Stretch talk to BCD devices.
1-bit is perhaps the easiest. That's just Boolean. Yes-no kind of stuff. Heck, you could even wire
it up so that your IBM supercomputer could turn some lights on
and off.
The jump to 8 is where we hit a little bit of trouble.
The crux of the explanation comes down to two things, the English language and the realities
of binary.
These two factors are a little more intertwined than you might initially suspect.
For this part, I'm working off Buchholz's contemporary stretch memos
and a book called Planning a Computer System, Project Stretch, from 1962. I'm kind of remixing
the arguments presented in both to give a coherent explanation. So while I'm not going to lay out a
timeline, know that I am giving a historically grounded explanation. Okay, check this out. It's easy
to calculate the largest number that can be represented with a group of bits. I've been
doing this throughout the episode, so I might as well lift the curtain and explain it. You just
take the number of bits you're dealing with and raise 2 to that power. A 6-bit byte can represent 2 to the 6 numbers, or 64. A 4-bit
byte can do 2 to the 4, so 16. An 8-bit byte, 2 to the 8, gives us 256 possible values.
6 bits is all you need for passable text encoding, But it's a no-frills affair.
You don't get lowercase letters, for instance.
64 different characters just isn't quite enough space.
If you add one more bit, taking you up to 7 bits,
you get double the space, 128 possible values.
That is actually plenty.
You can pack in numbers, both cases of letters, all symbols on a keyboard,
and still have space for fancy things like signals and formatting characters.
Then, all text should be encoded in beautiful 7-bit data.
It follows that any computer that's expected to support text needs facilities to handle 7-bit bytes.
The only issue is that little number.
You see, 7 actually sucks for computers.
Now, this is one of those cases like at the top where I have to be introspective for a moment.
To me, this all makes instinctive sense.
7 just isn't a computer number.
Simple as.
But that is not a very satisfying answer. That
only works for digital dwellers like myself. So let's go back to basics here. All the nice
bit sizes that we've been discussing share a few things in common. They're all round numbers,
they're all positive numbers, of course, and they're all powers of two. That is to say,
they can all be calculated by taking two to some power. The result is a class of nice computer
numbers. 1, 2, 4, 8, 16, 34, 64, 128, 256, on and on. These numbers are nice because they're easy to represent electronically.
Put another way, computers like powers of two for a very simple reason. They're easier to deal with
on the lowest levels. These are all values that can be cleanly represented in binary.
To go through the same series, it's 1, 1, 0, 1, 0, 0, 1, 0, 0, 0, and so on. Notice that in binary,
these numbers are actually just shifted over. By working with powers of 2, with bits grouped by
powers of 2, that is, you can use all kinds of binary tricks. You can use different shortcuts
to save time and circuit complexity. In that context, 7 is awkward.
It doesn't fit into any nice patterns.
You don't get any tricks.
You have to work for it.
Powers of 2 also nest cleanly.
I know there's some smart math word for this, but I'm going with nesting.
A 64-bit byte can be broken down into two chunks of 32 bits,
or four chunks of 16 bits, or eight chunks of 8 bits. Each division is clean, you don't have
any wasted bits. Breaking things up by 7, conversely, sucks. There's such an aversion
to these awkward numbers that folk tend to pad them up to the nearest power of two, so the lame
seven becomes the slick and cool eight. This does in theory waste a bit, but the results are worth
it. By late 1956, the designs of Stretch are amended to support 8-bit bytes. Internal registers
are bumped from 60 bits wide to 64 in order to make 8 a more natural fit.
It may sound silly, but as Stretch was under development, 8-bit character encoding was experimental.
It was a Wild West, so to speak.
That said, there were a lot of experiments going on.
There was a concerted drive to move away from 6-bit encoding.
If nothing else, this was just to add in lowercase
letters. Planning a computer system goes into this in detail if you want more context. Basically,
there were a number of proposed character encodings supported on Stretch. These all
made use of 8-bit bytes, some to better effect than others. There's at least one table that clearly shows the
empty void left by the extra rounding bit. This puts the creation of the byte itself
sometime in 1956. By that time, we had programmers using 8-bit bytes. At least, sometimes. This was
still the upper range of the new unit, and it was still only inside IBM.
Stretch hit the market in 61. From there, the byte would enter the mainstream.
Alright, that closes out the episode. We've built up from flops to bits to words to bytes.
I'll be the first to admit that there are some holes in the overall story.
The exact reasoning behind all of these terms is, well, it's up to some interpretation.
Bit is probably the most closed case.
It makes sense as a portmanteau of binary digit, but we don't have a perfect paper trail to go off of.
I also think we're ready to answer the final big question.
Why has the 8-bit byte endured?
Why do we still default to that size today?
There are two final steps to ubiquity.
That's EBCDIC and ASCII. The first, the tongue-twisting EBCDIC, is a character encoding system developed
at IBM and codified in the early 1960s. EBCDIC is the result of experiments in 8-bit text encoding,
some of which were conducted during the development of Stretch. This becomes the dominant encoding standard for
IBM hardware, which necessitates the 8-bit byte. ASCII is, well, it's a better name.
This is another character encoding standard also developed in the early 60s. Like IBM's standard,
ASCII uses 8-bit bytes to represent a character.
This becomes a dominant standard on many non-IBM mainframes.
So once we reach the 1960s, there are two big reasons for computers to support 8-bit data structures.
And the idea of a byte is already circulating as a nice unit of measurement.
Thanks to that drive, plus really the convenience of a power of two,
eight turns out to be a very lucky number indeed.
Thanks for listening to Advent of Computing.
I'll be back in two weeks' time
with another piece of computing's past.
If you like the show,
there are a few ways you can support it and help it grow.
If you know someone else who'd be interested
in the history of computing, then please take a minute to share the show with them.
You can also rate and review on Apple Podcasts. If you want to be a super fan, you can support
the show directly through Advent of Computing merch or signing up as a patron on Patreon.
Patrons get early access to episodes, polls for the direction of the show, and bonus episodes.
The donations also really help me keep going and help me get access to new sources to improve the Thank you for watching!