Advent of Computing - Episode 134 - Beyond the Punch
Episode Date: June 23, 2024This episode I'm opening up my research vault to present some interesting pre-digital technology. Back before computers us humans used to write everything down on paper. Over time that lead to some or...ganizational issues. By 1890 punch cards show up to solve one aspect of this problem, but that technology had it's limitations. We will be looking at other paper-based approaches to data management, as I slowly try and explain a realization I've come to about the early history of hypertext.
Transcript
Discussion (0)
In 1968, Doug Engelbart stunned the world with a demonstration of a new program called NLS.
That event has come to be known as the mother of all demos.
In it, he unveils the mouse, a graphical user interface, networking, telecommuting, and hypertext.
It struck as a bolt from the blue.
Most of these technologies were completely new, or had
never actually been implemented on a computer. For those in attendance, it must have been like
stealing a glimpse of the future. But the demo didn't come out of a vacuum. If you've learned
anything from my long-winded ramblings, it's that few things truly come from a vacuum. It took years
of research for Engelbart and his colleagues to reach this point. Engelbart himself was inspired
by an earlier paper, Vannevar Bush's Seminal As We May Think. The core idea of that paper
was that better user interfaces needed to be developed, and that data needed to be organized as chains
of connected information. We can go deeper still. In 1962, Engelbart publishes a paper titled
Augmenting Human Intellect. It's the blueprint for NLS, which really makes it the blueprint for
modern computing. If you listen to the show, then I'm sure you've heard this story before, but I'm going to tell it again because it's fundamental to one of the greatest puzzles
in computing. In an appendix to this paper, Engelbart describes a cardstock-based hypertext
system that he used during his research. He used the system while writing Augmenting Human Intellect. The system had links, connections between ideas.
It could be sorted, organized, and ideas could even be stitched together to form larger ideas.
And the medium?
It was the enigmatic, edge-notched card.
It's a technology that, I think, may form a missing link between pre-digital ideas about data and the modern computer.
Welcome back to Advent of Computing.
I'm your host, Sean Haas, and boy am I happy to be back home.
host, Sean Haas, and boy am I happy to be back home. It's been about a month since I've actually sat in my office studio in front of a microphone, so I'm very, very pleased to be back in the saddle
as it were. This is episode 134, Beyond the Punch. This time, we're dipping back into my
super-secret research. This is an episode that I've been picking away at during my long travels, and I finished
up now that I've been home.
It's something that I think is impactful, and it's something that's been puzzling me
for a long time.
There's a reason I named the podcast Advent of Computing and not Advent of Computer.
The history of the computer is so much more than
just the history of the electronic digital computer itself. There's a pile of context
both around the computer and prior to the computer. Hence, I cover computing and not just computers.
This is one of those episodes where we'll be in the larger category of computing.
Today, we won't even really touch a computer.
Rather, we're going to be talking fundamentals.
The technology behind the punch card is patented by Herman Hollerith in 1884.
The punch card itself, in primitive form, appears in 1890 during that decade's US census.
One view of computing puts this event as the start of the
digital age. You can trace a line from Hollerith's punch cards up to modern day computers. But hey,
you know me, it's Sean. I love to introduce complications. I love to ruin a nice story.
I personally believe in this larger tides and forces view of history, that over time,
certain inexorable forces, certain ebbs and flows, lead to progress, or at least lead to change.
Call it psychohistory if you're a big Asimov fan. One of the outcomes of this view of history
is that you start seeing weird connections between events. One cluster that's fascinated me for a while occurs around the time of the punch card.
There's this strange liminal time period between this punch card era and the first digital electronic computers.
It spans about 50 or 60 years, from the 1880s up into the 1940s.
or 60 years, from the 1880s up into the 1940s. During this period, there are a number of paper-based storage mediums very similar to the punch card. At least, they appear to be similar. Some are
derived from the punch card, some are extensions to the card, but many are totally independent from
whatever Hollerith was doing. Today, we're going to be living in this
liminal space, looking at pre-digital paper storage, and I swear, this is all going to start
to make sense and come together as we go. The key connecting fiber here is data. That is the mantra
of the day. Punch cards are the best-known paper data medium, but they aren't really a very good format.
That was even the case back in this early time period.
There are certain things that punch cards simply can't do.
They're a tool that only works for a few very specific problems.
We're going to see how other kinds of cards got around those limitations,
how other formats lead to different possibilities, and how all these mediums are connected. This is all leading up to a larger
idea that's been brewing in my head. If you've listened to the show for a while, then you know
I have some frankly wacky and out there ideas about early hypertext. Well, I think I've recently
put together another part of that puzzle, but we're going to have to get there slowly.
You're going to have to buckle in for this one, I think.
The entire framing for this episode is going to hinge on a bit of an anachronism.
Put another way, I'm going to be pulling a mean-spirited trick in order to make things
easier to understand, to give us a nice little word instead of a sentence to explain things.
Today's anachronism is the information problem.
It's the lens we're going to be using to examine everything this episode, so I gotta explain it.
I talk about the information problem at length whenever I cover hypertext, so I'm just going
to give a short explanation.
That's all we'll need, after all.
It's this idea that was first formulated by Vannevar Bush during the Manhattan Project,
and then explained in 1945 in his article As We May Think.
The core of the idea is that there will reach a point where there's so much information
that it becomes impossible for you to find what you're looking for.
On its face, that sounds like an annoyance.
It's like having too many channels on TV, right?
Where this becomes an issue is in research.
Imagine you're trying to do something totally new.
Like, let's say, splitting the atom.
You know, random example, unconnected to history.
To get to that point, you have to figure out the current state of the art. You have to survey all
relevant works and figure out where things stand. Otherwise, you're going to go down dead ends that
have already been examined, or reinvent all kinds of old technology. Reinventing the wheel is one
thing. Reinventing a method of isolating isotopes
in a centrifuge, well, that's another. That's a much more costly reinvention. Eventually, we'll
reach a point where there is so much information, such a glut of data, that research will become
impossible. It will take too much time to search for sources, to dig up the background needed.
It will become impossible to find the shoulders for sources, to dig up the background needed.
It will become impossible to find the shoulders of giants that you need to stand on.
Thus, progress will end.
We'll enter this paradoxical dark age where the sheer volume of information will make any information meaningless.
The information problem has been stated and formulated in many different ways over the years,
but I'd really like to go back to Bush's version. It's simple, concise, and it falls
nicely on the modern timeline. As you may think, as I stated in the very beginning,
has a direct influence on the development of hypertext in the mind of the 20th century.
So we can use Bush as something like a key to understanding information
management. I think this makes it acceptable for me to use this later lens to discuss earlier
developments. It also, I think, helps that Vannevar Bush wasn't a digital computer person.
He wrote, as we may think, before the first digital computers came to life, and he never worked with digital machines.
So, a closer reading kind of shows that he's viewing the data problem here in a very old way.
But that's a totally different discussion.
We can start by applying this idea of the information problem to punch cards as they existed in 1890. The state of the art at
this point was, well, it was rough. Hollerith had initially developed punch cards specifically for
the US census. They were something of a special purpose medium, at least roughly speaking. In
general, all punch cards are composed of a grid, with each position being either a whole or an intact region of cardstock.
A census card broke data up into categories, which were fit onto the card wherever there was space.
You get these uneven groupings of numbers, a tiny male-female chunk, regions for states and languages spoken.
The entire card is almost
gerrymandered to find holes. This is fine for special purpose or custom applications,
but the specialization has some knock-on effects. It means that equipment has to be specialized.
Initially, this is really just the tabulator, the machine that reads the punch card and acts on its data.
Census-era tabulators were basic. They simply incremented counters on each category as they read cards. Readings were taken down manually into paper ledgers. There's also the complication of
sorters, but we'll get to that in a bit. Punch card technology does improve over time, very much so.
But this is the primordial state of the medium.
In order to use a punch card for something other than census data,
a custom card format has to be designed.
A custom tabulator has to be wired.
I think it's clear to see the downsides here, so I'm not going to dwell on them.
Rather, let's look at the utility. The first huge advantage
here is the unambiguous representation of data. This had existed in earlier eras, perhaps most
notably in the prehistoric punch cards used in the automatic looms of the 18th century and
spools of music used by player pianos. All these formats allow for good and proper digital data
encoding. A 1 is a 1, a 0 is a 0, with no wiggle room. That's one huge thing to love about these
mediums. But there is something crucial about these earlier card formats. Loon cards and piano
rolls are missing a critical part of the digital equation. Their data
isn't fully discrete. Loom cards were chained together to form a pattern. That meant you
couldn't grab a single card out of the chain. You would lose context and destroy the chain in the
process. The same is true for a piano roll. To pull out a note, you'd have to take scissors to
the thing, and you'd be left with,
you know, maybe the whole to play a single chord. It might be a nice chord, but pulling out that
discrete component would ruin the song. Punch cards, on the other hand, are totally discrete.
Each card can stand alone as an indivisible grouping of data. That has its pros and its cons.
On the positive side, you can actually have a smallest unit of data. That means you can
shuffle things around without destroying your dataset. You can have unordered information.
You can actually do something like manipulate your dataset. You can have it stored and sorted
in different ways, making groupings of similar data,
or you could even pluck out a single card. But this is a double-edged sword. Since punch cards
are discrete, there's no inherent order or context to their data. That's fine for something like a
census. It's great for raw datasets. It breaks down when we get to more complex information, to digested information,
or information you intend to handle more than once or twice. In that sense, punch cards are
a great medium for raw information or for long-term storage. They can be a fantastic
medium to work with or to express datasets in. But you don't want to live in the world of punch cards. It robs your data of context,
of a whole other dimension of information. By the 1900 census, Hollerith had refined his design.
He designed a general-purpose encoding scheme, which meant all cards for any task would represent
data in the same way. That would solve one major issue with the medium,
but the context issue still remained. A partial solution was completed in 1901,
an automatic card sorter. Now, there were ways to sort cards earlier than that, but they hinged on
running them through a tabulator, which would automatically open a trap door on a sorting box
that you could then drop a card into. It was fully manual. The new automated sorter was a device that
sorted cards into categories based on their content. You would put in a stack of unsorted
cards, they'd be fed through a hopper, through drums and brushes, and then eventually be sorted
into a number of pockets on the far
side of the machine. In this way, you could, say, automatically pull all census cards for males,
or sort roughly by age. This helps, but it's only a partial solution. It's also a clumsy solution.
A card sorter, at least in this design, only lets you categorize and order your cards.
It also exists outside the medium. By that I mean, punch cards don't have any inherent way
to order or group themselves. There is no part of the card's encoding that accounts for
order or context. You have to have this separate machine that knows how punch cards
are encoded, how data is stored, how to read them, and how to move them around. You also need a way
to configure that machine. You need all this extra artifice to get your data into a meaningful
context. In this way, punch cards solve the information problem, but only really a sliver
of it. They're able to take a large amount of data
and find a way to quickly answer questions about that data. How many men were living in New York
in 1890? What was the median age of a factory worker in Pittsburgh? But there are all kinds
of questions, all kinds of aspects to the information problem that they can't solve.
Crucially, this is not a modern critique.
Folk took note of this limitation as early as the 1890s.
There were attempted solutions in that very decade.
Many of these modifications and expansions came from the world of accounting.
When you strip away the entire facade, all the technical details,
tricks, and twists, computing has always been about automation. Programming languages were
developed to automate away the task of programming. Computers themselves are developed to automate
away tedious mathematics. Punch cards were developed to automate away record-keeping
and simple data processing tasks. This simple reality leads to kind of a funny outcome.
Many advances are actually made on the shoulders of very boring problems.
Automation is created out of sheer boredom with a task.
The information problem is a prime example.
Hypertext is developed because record-keeping and data retrieval,
literal librarian work, is too repetitive and too boring to do by hand. Then it should come
as no surprise that we can find exciting developments by searching for boring,
repetitive work. Accounting fits this mold to a T. There are two boring aspects of accounting in this period that
were ripe for automation. The first is the mathematical side of the equation. The discipline
required long and tedious sums, which were already somewhat automated by mechanical calculators.
Punch cards further automated this process and were a huge boon to the industry.
They allowed information to be stored and then summed at a later date.
But punch cards and early machines couldn't really categorize data.
In 1901, Hollerith did introduce the horizontal sorter, but that was a very limited machine.
It could sort cards one column at a time.
It was a workable solution, but with many caveats.
Thus, there was room for improvement, room for automation.
One of the accountants to take up this dangerous task was one Charles H. Talmadge.
Born in 1857, Talmadge would never make it to the digital revolution.
He would, however, live through the first bits of data automation. An accountant by trade, Talmadge would first come into contact
with automation in the 1890s. In 1893, he co-founded the Automatic Timestamp Company.
The company's first product was, as you may be able to guess, an automatic timestamp. It was a very simple
clockwork device that could stamp the current date and time on a slip of paper. By 1900,
Talmadge is out on his own. It's from here that things get interesting. In 1906, he files a patent
for a primitive computer. Now, I did say this episode would be computer-free,
and I still mean it. This isn't a programmable computer, or any computer in the true sense.
It's a so-called accounting mechanism. On the surface, this sounds pretty out of place.
And yeah, honestly, this is one of those times where we just seem to get a huge jump in
technology. Talmadge just kind of shows up in 1906 with this almost-computer. What makes this
almost-sensical, to me, is some interesting context explained in the book Before the Computer
by James Cortada. In that text, Cortada explains that automation technology in this period should be seen as a collection of similar mechanisms.
Stick with me here because I think this argument actually pulls a lot of interesting ideas together.
To quickly paraphrase Cortada's argument, it all starts with simple mechanisms like typewriters and, yes, even automatic timestamp machines. To make something
like a typewriter requires a certain level of mechanical know-how, both on the design and
manufacturing side. You need gears and levers and cast parts with certain tolerances to them.
That means once you get into these automation machines, you fall onto a slippery slope.
Adding machines and typewriters use very similar parts.
At least, you can make all the same parts using all the same manufacturing equipment.
Tabulators just require a few electronics and relays on top of existing mechanical parts.
So many times, a company would make all kinds of automation machines,
typewriters, adding machines, tabulators, and even these almost computers.
So, back to the specific accounting mechanism.
This is a machine that's very much in this mold of mechanical automation devices.
It's a machine to automate double-entry bookkeeping, a very particular form of accounting.
It has a set of memory cells, each of which contain a number.
That number represents an account in, say, a bank.
Those cells each have mechanisms for adding and subtracting from that register.
A teller runs a transaction from a central console.
For, say, a transfer between accounts, they select a transaction from a central console. For, say, a transfer between
accounts, they select a source and a receiving account, then input the transfer amount. The
mechanism physically connects the corresponding memory elements, then grinds some gears to
complete the transfer. It even makes a little receipt slip for you. The accounting mechanism isn't programmable. It's not really
configurable either, but it handles inputs and outputs. It stores a state. It has addressable
memory. And there are state transitions. It's super, super close to a computer.
What's really weird about the whole thing, at least to me, is that Talmadge even shows
memory as a grid of cells. Many mechanical machines just kind of kept storage around,
since there were physical limitations to keep in mind. But the block diagram that Talmadge has in
his patent actually looks pretty modern. You can look at it and point out memory and I.O. channels very easily,
something that is common on a block diagram for a computer. Something interesting to note here
is that the accounting mechanism uses internal storage only. At no point does it pop out a
punch card or some kind of data medium. Inputs are only possible via a little keyboard. Its internal state is simply
kept consistent on rotors and gears, not stored on reams of cards. The output can be a little
receipt, but that's not a data storage medium. This may seem strange, and it is strange in the
context of computing, but this isn't entirely a computer. It's meant as an automation machine.
But this isn't entirely a computer.
It's meant as an automation machine.
In that sense, it's basically a highly specialized adding machine with built-in memory.
The idea here was that an accounting machine could be used to replace ledgers.
Good old-fashioned pen-and-paper accounting requires that kind of long-term storage.
Data stays in a ledger forever. Therefore, account totals would
have to stay inside the accounting machine. A bank would be able to drop this in place,
maybe inside a vault, and use it to fully automate a large swath of their accounting needs.
It would stay in place, running transfers forever. In that capacity, this is less and
less like a computer, and much closer to something like
factory automation machinery. But Talmadge didn't stop there. In 1908, the same year of this mystical
accounting machine, he filed another patent. This was very cryptically titled Indexing and
Assorting Means. If you know me, then you can probably guess how excited that
title makes me. This patent is one of the earliest examples of an edge-notched card.
It's also this wild mashup of technologies. The actual filing is simple. Townwood's cards
are a modified punch card that includes space on their edges where notches can be cut
and a registry hole in one of the corners. Data could be added to the edge of the card as notches,
but crucially, this wasn't normal data. This was metadata, or indexing information.
It was just little bits of information used to describe what kind of data
was punched on the front of the card, or, as Talmadge called it in the patent, category data.
This notching allowed for two cool tricks. The first is grouping. It's possible, through a
pretty fast process, to select all cards that fit a certain category. This was done using either
two needles or a needle and some kind of edge on a table. So let's say you have a stack of cards
where each card represents a task. Completed tasks have a notch in some position and incompleted
tasks are notchless. To select completed tasks, you set one needle down on a flat surface, or take some kind of little protrusion on a flat surface.
You then take your stack of cards, line up the lower needle with the position of the completion notch, and push down.
Cards with a notch will fall all the way down to the table surface, since the notch fits around the lower needle.
Cards without the notch will stay on top of the needle.
The upper corner of the cards, where the single registry hole is, will now be accessible.
You can take your second needle, thread it through the registry hole, and pull out all the cards without notches.
All cards that were pushed up by the lower needle. Thread it through the registry hole and pull out all the cards without notches.
All cards that were pushed up by the lower needle.
Then you are left with two stacks of cards.
One for complete tasks, notch and all, and the other for incomplete tasks.
By using this operation, you can index and organize data.
The second trick is in maintaining that organization. If you group cards by category,
then you will see, visually, when a card is not in the right place. When cards are organized, all the notches line up nicely. At a glance, you can tell if a pile of cards is organized or not,
and you can see which cards are not in the right spot. The question is, why would you want to do this?
Why would you want these manually sorted cards?
Well, that gets a little complicated to answer.
Let's say you're a Hollerith shop.
Your office is decked out with the latest tabulators,
punch pantographs, and horizontal swords.
It's a very expensive setup, but it's the state of the art. The main thing you
could gain by implementing Talmadge's notches all come down to categorization and convenience.
Automated sorters can only handle so many categories, and they can only categorize by
data. Notches on the card's edge don't necessarily need to duplicate data on the card's
face. They can be pure metadata, information that only describes information. Maybe you have a field
that you want to sort by, but you never want to actually tabulate. That could be the case for any
number of reasons. Card space was limited, so you couldn't cram every possible data point onto a card.
You might want to organize cards by, say, collection month, but not want to waste precious
whole space on that data point.
This notching system also allows for fast checks of organization.
Is your stack of cards sorted?
Using Hollerith's system, you can't check.
You have to throw them into a sorter and run the full operation again.
Each card had to be read one after another, but with notches, well, that's a different story.
Metadata was on the card's edge, so you can actually see the information on the edge of
stack of cards. You can quickly check for organization. The same schema also supports ad hoc sorts. Maybe you
want to very quickly try something out without tying up hardware. Just grab your needles and
get to work. In that way, more exploratory analysis was possible than what you could do
using normal punch cards. One final feature is selection. Most of these other features have been improvements to the
usual punch card workflow. Pure selection, however, is totally new. I already explained
how that operation works. It's the whole two-needle boogie. You could use that to manually
sort or create groups of cards, but you could just as easily use it to select a single
card. This really depends on what data you're encoding on the card's edge. So let's say you
have notches for collection month, tax status, sex, and employment status. You could, with a short
series of operations, select all records for working men that are delinquent on their taxes.
You could further look for just those deadbeats that were identified in March. You could do the
same with normal punch cards, at least in theory. You would have to set up this order to carry out
four different categorizations. It would take a long time, but it was doable.
categorizations. It would take a long time, but it was doable. The key difference here is that notching approaches random access. You can actually pluck something from a stack using
that card's characteristics. At least up to a point, large datasets get physically problematic.
But with punch cards, at least on their own, there's no random access.
They're closer to sequential access.
You have to actually scan through a pile of cards one at a time.
That is automated by a machine, but that's still slow and very limited.
Talmadge would continue to file patents up through the 19-teens.
These were for increasingly complex accounting machines,
which began to incorporate punch cards as data input and storage media.
However, the notch would never return.
For Talmadge, this may have just been an idle idea, but there's something fascinating hiding there.
With that, I think we have a jumping off point to go further back and look at technology
that's contemporary to the earliest punch cards.
Now, we're going to be talking about some cards that, while contemporary, are perhaps
unrelated.
This episode is, at least vaguely focused on the punch card, but we're going to be talking
around the punch card.
Allow me to explain that a
little bit. Punch cards as a medium don't appear in a vacuum. This is a mistake that I often see
when discussing computers. It's easy to put the start of the digital age at the feet of either
Hollerith or maybe Babbage, depending on how you view data or math. One thing I've learned time and time again is that the origins of computing aren't where you expect them.
Humanity's problems are very old at this point, and our solutions are equally ancient.
In the 1890s, Paul Otley, a Belgian lawyer, started work on a project called the Mundanium.
This was planned to be a collection and accounting of all
human knowledge. To make this effective, he would have to solve the information problem.
Put another way, he needed a way to organize data effectively. He needed some kind of way
to select and retrieve information from a limitless amount of data points. His solution was to keep records for
all information on note cards, and to organize those cards using a filing system of his own
design. Each card had a unique identifier, something like the Dewey Decimal System,
but much, much more sophisticated. Oatley designed the system using the latest technology available to him,
and some of the most modern ideas in data management. In this capacity, Outlay was
facing the same problem as Hollerith. Both had a mountain of data that had to be dealt with
quickly and efficiently. However, the slightly different requirements of each problem led to
vastly different solutions.
Hollerith focused on a system for accounting and tabulating data, the ability to turn piles of records into meaningful information.
It's a transformative process.
Oatley focused on selecting and indexing, the ability to find a single data point and
select that.
The ability to find a single data point and select that.
We can see these two schools of thought, and really these two types of problems, echoing through the history of computing.
The earliest machines that come out of the Second World War are all focused on answering
concrete data questions.
Under what conditions will atomic fusion or fission occur?
How do I need to aim my artillery piece given certain conditions? What is the content of this encrypted message? Later systems, more augmentative systems, exist to answer very different questions.
Vannevar Bush describes these types of associative systems as being used for research.
Something like a vast automatic library, a la Paul Olet's vision.
Doug Engelbart and Ted Nelson expand on this, describing massive systems for organizing and
personalizing new information. The modern internet is the direct outgrowth of this school of
questions. How do I take an unlimited amount of information and retrieve one idea?
Talmadge's cards are interesting because they show how the second path, the path of indexing,
can coexist with the data path. But Talmadge was not the first to automate this indexing process.
For that, we need to step back from the punch card itself and look at its low-tech cousin.
That is, of course, the index card. Outlay wasn't the only one to use and abuse index cards.
The banking, accounting, and insurance industries were all early adopters of this format. Once again,
progress shows up where there are boring problems to solve.
That said, I think this makes a lot of sense. Accounting is a very data-heavy industry,
as is banking and insurance. Outside of research, these would be the places most likely to encounter
the information problem. We see the exact same pattern during the digital revolution.
problem. We see the exact same pattern during the digital revolution. Machines appear in research and then in these data-rich industries. This is where we find our next key player.
Henry Stamford was one of the few to brave the dangerous waters of life insurance. His family
immigrated from Ireland to America shortly after his birth in 1847. So once again, we're dealing
with the same rough generation of people here. As a young man, he served for a brief time in the
Union Army during the Civil War. Stanford wouldn't see combat. Instead, he worked as a paymaster's
clerk for just over three years. In this capacity, he would have spent his days processing payroll
for soldiers and handling
accounting for the unit he was attached to.
It was while working with the army that he would have learned his way around a ledger
book.
Once discharged, Stamford's experience made it relatively easy for him to jump from payroll
to insurance.
He got a job with New York Life, at the time one of the largest insurance firms in the country.
Over the years, he worked up the ranks, eventually becoming a supervising accountant in 1893.
The information problem comes in all sizes, and Stanford would come into contact with it at a small office in New York.
In some ways, keeping an insurance company running is like working on a continuous census.
Every insurance policy represents a person, and each policy has certain data points that
need to be tracked.
Date of birth, smoking vs. non-smoking, even facts like when the policy's payment is due.
Up until a certain point, it's fine to keep all that data in a form like loose-leaf sheets
of paper, or scurried away as index cards in drawers.
But like we saw with the US Census, there comes a time when plain paper just won't cut it.
In 1896, Stanford files a patent for a new storage medium.
Now, we have to keep in mind that 1896 is the date of filing.
The idea could have been older, or it could have been in use for a number of years before
Stanford put it down on paper.
Crucially here, we don't know if Stanford's bright idea was inspired or in reaction to
the punch card and the 1890 census.
So, what was the bright idea?
The patent is simply titled information card.
It's a new kind of index card. But actually, it's kind of two kinds of index cards and
matching drawers to use them. The trick comes down, once again, to very carefully punched holes.
We're still in this weird world of patents. If you aren't familiar, then let me explain a few things.
Reading patents kind of sucks.
They are super verbose.
They use very structured and almost ceremonial language.
And I personally find them very grating to read.
This means that their information is 100% correct, but not entirely useful.
As such, I'm going to start us off with the simple card and then attempt to discuss the more complex one.
The other thing to note about patents is you don't have to prove that the patent works.
You just have to prove that there isn't a patent for similar technology.
So, just because a patent exists doesn't necessarily
mean that that technology is even real. With that aside, let me explain the first card.
In short, it's something like a tabbed index card. You know how you can get those little index cards
with the tabs on the top so you can flip to the right spot in a box? These cards look like that. The improvement, the automation feature, is that those tabs each have
a punch in their center. Perhaps you can see where this is going. These tabs are used to encode
metadata. The example Samford uses is the month a record was generated in, so I'm going to stick to that month tab idea.
That means each card has room for up to 12 tabs on the top.
If you have a card that represents data from January, you would have one tab in the first position, for instance.
The cards are paired up with a special drawer that has notches on its top edge.
Those notches line up with the 12 possible tab positions on your set of cards.
The patent even shows little labels by those notches.
A selection, then, is as simple as threading a long needle through the proper notch and into the card's tabs.
Pull up, get cards, repeat as desired. That already gives us
a lot to discuss. If we drop the whole physical constraints for a second, this is a very neat
and elegant solution to the information problem. It lets you select and sort data from an unordered
set of cards, and it does it pretty easily. It's even more convenient than Talmadge's cards.
and it does it pretty easily. It's even more convenient than Talmadge's cards.
The other huge benefit here is the card's face itself. Punch cards, however you slice or dice them, are a very data-poor medium. Here I mean poor in a very specific way. You can store numbers
or data encoded as numbers, and you can only store a small amount of data on a punch card. We're
talking tens of bytes. An index card, on the other hand, is a very rich data medium, as in,
it can have all kinds of information on its face. We're talking numbers, drawings, the sky's the
limit. You can even do wild, freeform things like pasting photos or news
clippings onto a card. You could have a card that's just a collage or an idea board with little
images you like. The trade-off here is that automation aspect. You can't read an index card
with a tabulator. The formatting isn't standardized in that sense. It's just freeform information.
The formatting isn't standardized in that sense.
It's just freeform information.
I propose that we look at this on a sliding scale.
On one end, you have fully free expression and fully manual operation.
On the other end, you have rigid structured data and fully automatic operation.
One side isn't better than the other.
Rather, each side has its uses, and we can come up with examples of things that would fall somewhere in between those extremes.
But let's bring back in those physical constraints.
Because this is really where we see the issue with Stanford's idea.
The first is the tabs themselves.
These cards have to be a very custom shape.
They need to have extra protrusions on the top. That could lead to some weird production considerations. Maybe you just make blank cards
with all possible tabs on their perimeter, then the end user has to cut off the tabs they don't
want. That's fine, but it seems kind of wasteful and time-consuming to set up.
The other option is to have sticky tabs that adhere to a card, but that could lead to registry
issues if the user doesn't place the tab in just the right spot. This is, actually, another little
secret about the punch card's success. Registry, or the lining up of holes is a huge issue for all these card formats.
Punch cards are a very simple shape, just a rectangle with a single corner cut at an angle.
That registry cut makes it easy to keep cards facing the same way. Punch registry, actually
making sure that each punch lines up, has always been handled by some type of machine or jig.
Once again, there is this intentional dependence on automation. You have to have very specific and
precise machinery, made by Hollerith, to make punch cards actually work. For Stanford's tabbed
cards to function, you would also need some solution to these registry issues.
You would need something like a jig or a simple machine for making holes and cutting tabs,
even if that machinery only existed in a factory somewhere. Not terrible, but something to take
note of. Now that's the first of Stamford's cards. The second card described in the patent is a little more complex. They use internal
punches that are each different lengths, as in ovals. The card has this weird line of ovals
at the top and or bottom. How does this help? What does this even do? Well, it allows for the same kind of selection operation as the tabs do,
but in its own kind of unique way. This is the more confusing option than the tabbed version,
but it ends in the same place, so you can tune out from this part if you want to.
Okay, here's how I think it works. On one side of the card is a line of holes.
We start in one corner with a fully circular hole,
then the next hole over is a little longer,
and the next is longer still until you reach the middle of the card.
That's where you have the longest hole, a very stretched oval.
Then the holes start to shrink once again,
and we reach the far corner where we have a normal circle.
These are bottom aligned, so the bottom of each hole lines up with its neighbor.
This pattern appears on the top and bottom of each card.
These cards are paired with a box that has matching oval holes cut along its bottom
and notches on the top that line up at the top of the upper ovals. The patent is a
little vague about how this works, and the diagrams don't really help. The trick here is, as far as I
can tell, leaving some ovals as circles when you punch up a card. That would suck in practice. It's
super finicky to do, but let's just go with it. That's the data encoding, is if you have an oval or a circle.
To do a selection, you first insert a so-called pivot pin.
This is a needle that would go through one of the corner holes.
Those holes act to ensure registry and provide a pivot point.
To select a class of cards, you thread a needle into the bottom of an oval and pull up.
Any card with an oval in that slot would let the needle rise up with no resistance.
Once again, it's more convoluted and less convenient, but it is a selection operation
for unsorted cards.
Now, we know precious little about Stanford outside of what I've just said here.
Around 1906, there's a lawsuit about tabbed index cards that involve Stanford. He subsequently dies
in 1918. That said, what we do have is fascinating on its own. We have a mostly manual medium. It has the flexibility of blank paper, while also sporting a dash of automation.
Maybe you can see the shape I'm starting to sketch out.
All these cards we've discussed have a mix of automatic and manual features,
some mix of flexibility and rigid standardization.
They all solve very similar problems, but they do so using different approaches.
Maybe my earlier explanation of a spectrum isn't enough. Maybe we should be thinking of this as a
big two-dimensional plot, with one axis for automated vs. manual operation, and another for
structured vs. free-formed data. You may have noticed there's still runtime in this episode. That's
because there's one more card I want to cover, and it should land on a weird point in that 2D graph.
This next part is one of those stories that I just can't resist. Allow me to introduce you to
William B. Hargrave. Socialist, adventurer, librettist, real estate speculator, accountant, and inventor extraordinaire.
Hargrave is one of those wild people in history that I wish we knew more about.
He fell into roughly the same generation as the rest of the card inventors we've discussed,
born sometime around the Civil War.
The first concrete record we get is actually a newspaper article about one of Hargrave's operas.
In 1897, he writes an opera titled Merry Students.
It's put to music and then performed in his hometown of Colfax, Washington.
It was, according to this newspaper, a roaring success. Now, already, kind of a weird
story. Somehow this gets him into accounting and then he files some patents, right? Well,
not yet. First, he takes a trip. The next year, 1898, he leaves for Alaska to prospect for gold.
He was on the tail end of the Klondike gold rush.
This is where we actually get a splash of color and personality.
So far, the only writings we've had from people this episode are from patents.
Those don't really give us much color to the story.
While in the Klondike, Hargrave becomes friends,
lifelong friends actually, with Jack London, the author. London called Hargrave either Bert or
Kid, which made finding this connection a little more difficult. I actually only recently put
together this part of the story. After Jack London's death, his wife, Charmaine, wrote his biography.
That book talks about Hargrave in some detail, including direct quotes from letters and
recollection. Let me pull in a small quote just to illustrate something here. From the book of
Jack London, volume 1, as excerpt from a letter written by Hargrave, quote,
one as excerpt from a letter written by Hargrave, quote,
There were not many of us that winter in the little mining camp on the Yukon,
but the isolated group of cabins housed some lovable and adventurous souls. I will tell you about them because it was about them that Jack London wrote, and because there is hardly one
of them whom he has not immortalized in his writings. End quote.
First, I think this gives a small feel for Hargrave's way with words.
That kind of makes the whole opera thing make sense.
He's hanging out with poets and authors, and he himself is an author.
But second, he claims that London wrote about he and his fellow adventurers.
It's likely that there's much more here than I know of, and I'm going to need to read at least a few of London's books about the Klondike to try and figure this out.
What I do know is that Hargrave's time in Alaska was short.
The entire party had been fighting scurvy for most of their expedition.
Fresh food was hard to come by, so one after another, folk fell ill.
Hargrave was the first to become too sick from Scurvy to continue prospecting.
Before the year was out, he was shipped back to Washington to recover.
Upon his return, Hargrave became deputy county auditor.
This is the first time we have any record of Hargrave as a clerical worker. And honestly, it seems to come out of left field. But I guess he found a
career path. In 1901, he would join up with the Whitman Abstract Company, also in Colfax,
as trustee and eventual assistant manager. There, he would have been handling
accounting and filing work, day in and day out, while still writing operas. In 1903,
he files his first patent for a device that can remove facial wrinkles. And he keeps his day job,
so I can only imagine that that doesn't actually go anywhere.
In 1908, he runs for state office under the Socialist Party.
I've seen some reports that he became a socialist after meeting Jack London.
But Hargrave was actually a registered Republican as late as 1901.
I think that and the letters with Charmaine London point to some continued communication between the two.
Then, in 1911, he files a patent titled Filing and Indexing Appliance.
It's also noted in the Society papers of the time that he took at least one trip in relation to this new filing invention.
Although, that could have simply been a trip to a lawyer to actually deal with the paperwork. Once again, the trail is thin, but the context around this is wild.
So, what is this patent? Simply put, it's an honest-to-goodness edge-notched card.
We've reached my area of expertise and the connecting point up to hypertext.
reached my area of expertise and the connecting point up to hypertext. But, and here's the crucial part, there are caveats to this technology. After World War I, in the 1920s, we start seeing real
edge-notched cards, mass-produced cards that can be used to index data. These are the same designs
that Doug Engelbart uses decades later when he's developing augmenting
human intellect.
Their indexing is fully manual.
The only tools you need are a hand punch and a needle.
They function off the same principles we've been discussing all episode.
Metadata encoded on the side of the card that is then used for selection, sorting,
and connections.
Hargrave's cards are the first that follow the
more modern design. His patent describes cards that are perforated around the entire perimeter.
Data is encoded by cutting those perforations into notches. Selection is done by threading
a needle into one of those positions and then catching any cards with full holes, while rejecting any cards
with notches. With that, you can select, you can sort, you can do all kinds of wild indexing tricks.
The interesting caveat is that Hargrave's cards are mechanically automated. At least,
a little bit. Hargrave, like Stanford, pairs his cards with a special cabinet.
This newer cabinet, however, is much more complex. The patent describes this box that has a
compartment that can slide to the right. On the face of the box are two rows of holes. On the
patent, they're labeled as A through Z, but Hargrave assures us you can
use these holes to encode any data you want. One row is firmly attached to the stationary side of
the box, all the way on the left. The other row is attached on the right to the sliding side.
There's also a trick to indexing that makes this contraption work. The left and right sides of the cards have to be cut as an inversion of the other.
Put another way, if you notch out a G on the left-hand side,
you have to have a closed hole in the position on the right side.
They have to match.
Selection works using two needles.
One is inserted into the left-hand hole,
and the other is inserted into the corresponding hole on the right.
Once inserted, you slide out the drawer.
The right-hand needle holds firm to any closed holes,
while the left-hand side releases any cards notched in that position.
releases any cards notched in that position. That means that, in theory, selection is handled without any fiddly manual operation. You can then perform a second selection. The top and bottom
edge of the cards are also perforated. You can actually encode another index on the top. When
the cards are slid out, a frilled edge is revealed,
making registry a breeze. Hargrave describes using this system to encode the initials of a first name
on the left-right side of the card, and the initial of the second name on the top edge of the card.
This would actually make selecting a card by name very quick. But don't worry, the bottom edge is also used for encoding.
Hargrave was not one to waste space. This part is particularly interesting in context, so check it
out. The bottom edge provides registry marks to make sure cards are inserted into the drawer
correctly. It can also be used to encode a separate index, what
I often think of as a rejection index. Hargrave describes inserting and leaving rods in the bottom
of the drawer that align with a set of notches on the bottom edge of cards. That way, you can only
insert cards that have the right data encoded on their bottom edge.
You could encode, say, a year on this edge.
Then you would set up a number of drawers to only accept cards punched for certain years.
That would be useful for something like tax or accounting documents.
You could have a cabinet for 1911 records that would only accept cards that were indexed as being from 1911.
There are a number of other patents from roughly this time period, the 1890s to 1910s, for very similar rejection indexes.
They work in the same way. You have a filing cabinet with a series of rods in the bottom, and you notch cards to align with those rods.
Cards that don't have the right pattern of notches can't be placed in the cabinet.
These patents are also filed by accountants and insurance agents. Once again, there's this fascinating and rich history of data management that exists totally outside of any digital lineage.
Hargrave's system represents this middle ground between automated
and manual operations. You're still heavily dependent on this machine, but the machine is,
well, it's almost too simple to call a machine. It's a very, very simplistic device that aids in
automation. It also lands at this weird middle ground between free expression and strict standard.
The face of these cards is simply blank paper.
Hargrave even says in the patent that the cards can be anything,
as long as they're punched up correctly.
He claims the system could even work with folders, which, I mean, fair, I think it could,
I've just never seen edge-notched folders. But
maybe they're hiding somewhere waiting to be discovered. The data stored in these cards is
totally freeform, completely generalized. The edges, the metadata, have one hard and fast rule.
The left and right hand indexes have to be inversions of one another. If you don't match a
slot with a hole, then the entire system is ruined. You can actually jam the cabinet shut if you don't
follow the rules. What we get is a mix of features that I think are pretty congruent when taken
together. That's all I've been able to dig up in relation to Hargrave's cards, but there is a wild coda to this story.
Once again, Hargrave is just a fascinating figure that I wish we knew more about.
Here's a clipping from a 1929 article in the Pittsburgh Sun-Telegraph.
Gustave Davidson, poet, author, and explorer, today came back from the South Sea Isles with
the tale of how his two scientific conferers, W.B. Hargrave of Colfax, Washington, and P.E.
Haskovich, former resident of Paris, France, disappeared.
They left the island of Rive-A-Veille in a native catamaran boat last April, bound for
Toubouaille and Papeete, said Davidson.
End quote.
In 1929, Hargrave left for French Polynesia.
He and two companions set up camp on a remote island,
made contact with the locals, and started surveying the flora and fauna.
According to Davidson, the only surviving member of that expedition, Hargrave and Haskovich set out on a very makeshift boat constructed from two canoes and a small hut.
They were subsequently never seen again. Most of the articles that covered the disappearance claimed the duo were eaten by sharks,
but there was never any evidence of what happened to the two.
Alright, that does it for this episode.
But where do we stand in the larger story?
We've seen three distinct and wacky card filing systems. Talmadge's cards show us how
folk were trying to extend and work past the limitations of the punch card. A simple set of
indexing notches allowed for quick and easy selection and sorting. Looking at Talmadge's
patent alone, however, could lead to some faulty assumptions. It's simple to look at the
progression of computers, identify related technologies, and assume a similar progression.
Computers went from glorified calculators, number crunchers really, to data management systems.
It's a total change of character. Number crunching machines can answer totally different questions
than data management machines can answer. They use different tools and different techniques, but at the end of the day,
both types of machines offer solutions to the information problem.
Computers made that jump over the latter half of the century, moving from systems like ENIAC
and large Fortran machines into hypertext and document systems, and eventually the internet.
into hypertext and document systems, and eventually the internet. So it would be pretty slick to see a similar transition in pre-digital technology. Talmadge's cards seem to point towards that
hypothesis, right? But there are counterexamples to that. Oatley's Mundanium was a large-scale
data management system that was conceived of prior to the adoption of punch cards,
prior to the number-crunching medium. Stanford's card system represents a contemporary of the
punch card, but it takes a fully free-form approach. It's a data management system,
not a data-crunching system. Hargrave's cards present a similar picture. They are freeform, with some automation and some strict enforcement of data.
But crucially, these were developed well after the punch card.
Hargrave's cards don't attempt to improve or update the punch card.
They don't even use any of the technology developed by Hollerith.
punch card. They don't even use any of the technology developed by Hollerith. I think it's especially telling that these card systems were, in large part, developed by accountants
and clerical workers. These were the exact people punch cards were developed for, yet some reached
for a different technology altogether. So here's the payoff. Here's what's been bugging me, and the conclusion that I've
been starting to reach. I think it's pretty clear that these indexing systems existed
separately from punch cards. They're related technologies, for sure, but only in the most
tenuous sense. They all solve the information problem, and they use paper, but the connection stops around there. Call them cousins, perhaps.
Automated data indexing, a la the edge-notched card, is its own tradition. Digital computers
first develop out of data-crunching traditions, of machines like punch-card tabulators or
analog computers. As machines become more complex, the other lineage, the
lineage of indexing and real data management, gets folded in. I like this idea because it
solves a few issues I've had. As we may think is cited as this revolutionary text in the history
of computing, but that article doesn't even talk about numbers or math or
computation at all. It talks about automating a library. It goes on to inspire a younger generation
to create hypertext. It's not that As We May Think shows a next stage in the evolution of computers,
but rather it's a different path, One that eventually converges with the digital
computer. Thanks for listening to Advent of Computing. I'll be back in two weeks' time
with another piece of computing's past. And hey, if you like the show, there are a few ways you
can support it. If you know anyone else who'd be interested in the history of computing,
please take a minute to share the show with them. You can also rate and review the show on Apple
Podcasts and Spotify. If you want to support the show directly, you can sign up on Patreon Thank you.